0% found this document useful (0 votes)
245 views

Rajan Object-Oriented Numerical Methods Via C++

Uploaded by

fede127wild
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
245 views

Rajan Object-Oriented Numerical Methods Via C++

Uploaded by

fede127wild
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 541

Object-Oriented

Numerical Methods
via C++ - 2 Edition
nd

S. D. Rajan
O B J E C T - O R I E N T E D N U M E R I C A L M E T H O D S

Object-Oriented Numerical
Methods via C++

S. D. Rajan
School of Sustainable Engineering & the Built Environment
Arizona State University

S. D. Rajan, 2000-24 iii


O B J E C T - O R I E N T E D N U M E R I C A L M E T H O D S

Object-Oriented Numerical Methods


Via C++

This book is a copyrighted document. It is against the law to copy copyrighted material on any medium except as
specifically allowed in a license agreement. No part of this book including computer programs may be reproduced or
transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or information storage
or retrieval systems, without the express written permission of the author.

©2000-24, S. D. Rajan

School of Sustainable Engineering and the Built Environment


Arizona State University
Tempe, AZ 85287-5306
e-mail [email protected]

Last Printing: August 1, 2024


Second Edition: July 2021
First Edition: July 2017

S. D. Rajan, 2000-24 iv


O B J E C T - O R I E N T E D N U M E R I C A L M E T H O D S

To my family – Vanitha, Varun and Rohit


and
all the students at ASU who have inspired me to write this text.

S. D. Rajan, 2000-24 v


O B J E C T - O R I E N T E D N U M E R I C A L M E T H O D S

S. D. Rajan, 2000-24 vi


O B J E C T - O R I E N T E D N U M E R I C A L M E T H O D S

Preface to the Second Edition


The first edition of the book was written with ease of understanding and use of C++ for numerical analysis. C++11 version
was mostly used to explain the language use. Since 2011, C++ has undergone enormous changes with C++14, C++17, and
most recently, C++20. Keeping in mind that this book is more about numerical analysis via C++ than C++, changes have
been made to the contents with the following features in mind:
 Use of language constructs that promote safety.
 Use of language constructs that promote ease of writing the code and code maintenance.
Features from C++14 and C++17 have been used where these features can be used effectively. All programs have been suitably
modified to use the most appropriate C++ constructs and have been compiled, linked, and executed using Microsoft Visual
Studio 2019 (see Appendix A). Almost all chapters have been revised and it is hoped that the readers will find the new changes
useful.
S. D. Rajan
Chandler, Arizona
July 2021

Chapter 9 has been expanded to introduce and show the move constructor. Chapter 18 now discusses and shows how Fast Light
Toolkit (FLTK) can be used to draw simple graphical objects. Most projects have been moved to MSVS 2022.
S. D. Rajan
Tempe, Arizona
July 2023

S. D. Rajan, 2000-24 vii


O B J E C T - O R I E N T E D N U M E R I C A L M E T H O D S

Preface to the First Edition


Numerical methods constitute an important part of all scientific or engineering curricula. However, by themselves, numerical
methods have limited applicability if they are not implemented as computer programs. Engineering educators have realized the
important role that numerical methods play in engineering analysis and design. With increasing sophistication of the analysis
methods and tools, the enormous amount of data to be handled, and the need for robust, fast, easy-to-use computer programs,
one cannot look at numerical analysis and computer programming as two different entities. This text bridges that gap.
The focus of the proposed text is three-fold.
 Understanding and practicing numerical techniques commonly encountered in scientific and engineering applications.
 Understanding the role of objects in describing and managing scientific and engineering data.
 Using C++ to develop engineering applications by linking data management with numerical methods.
The book is aimed at the junior level (3rd year in a 4-year curriculum) course in a typical engineering curriculum, or as an
introductory graduate course. The material in the text is largely introductory, with some advanced topics that can be used at the
discretion of the instructor. Knowledge of computer fundamentals, calculus, differential equations, and matrix algebra is
required. The know how in using numerical analysis packages such as Excel, Matlab, Matcad, Maple, Mathematica etc.
will be a definite plus. Prior experience with writing computer programs is not assumed, but those with programming experience
will find the learning curve with C++ less steep.

Why C++?
There are several choices one could make in selecting a programming language – FORTRAN, Basic, C, Java and C++. There
are several reasons for selecting C++ as the vehicle for implementing the numerical methods. C++ is a mature language with
a ready availability of mature Integrated Development Environment (IDE) on a variety of computer platforms. Students find
these environments intuitive and useful in writing and debugging programs. The student versions of these IDEs are relatively
inexpensive and sometimes free (https://www.visualstudio.com/vs/community/).
C++ supports the use of objects and all the advantages that objects have. It is easy to write engineering applications – accuracy,
speed and problem size are non-issues. Programs written in ANSI C++ can be readily extended by the addition of (system-
dependent!) graphical-user interfaces (GUIs) including computer graphics. Lastly, there are hundreds of books on programming
with C++ and perhaps, millions of man-hours of experience in programming with C++. This continuously evolving language
has provided and will continue to provide the bulk of engineering computer programs in the future.
Additional resources connected with the book are available here: http://structures.asu.edu/rajan/object-oriented-numerical-
analysis/ and they include the following:
(1) The electronic version of the book as an Adobe pdf file.
(2) All the computer programs (with ISO-compliant C++ source code), discussed in the book arranged in separate directories
and combined into a single compressed file (.rar file).
(3) Additional computer programs such as 1DBVP©, weDraw©, ASUTruss©, EDO-GUIWB© and GS-USA Frame©.

S. D. Rajan
Tempe, Arizona
July 2017

Chapters 4, 9 and 13 have been suitably modified to add material on exception handling that are explained via new example
programs.
S. D. Rajan
Tempe, Arizona
December 2018

S. D. Rajan, 2000-24 viii


O B J E C T - O R I E N T E D N U M E R I C A L M E T H O D S

S. D. Rajan, 2000-24 ix


O B J E C T - O R I E N T E D N U M E R I C A L M E T H O D S

Contents
Chapter 1 Introduction 1-1
1.1 What is a Computer? 1-3
1.2 What is a Computer Program? 1-3
1.3 Programming in C++ 1-4
1.4 What is Numerical Analysis? 1-6
1.5 What are Objects? 1-8
1.6 Why Object-Oriented Numerical Analysis? 1-8
1.7 Tips and Aids 1-9

Chapter 2 Programming in C++ 2-13


2.1 Introduction 2-14
2.2 Variables and Data Types 2-16
2.3 Expressions 2-17
2.4 Assignment Statements 2-19
2.5 Simple Input and Output 2-19
2.6 Vector Variables 2-23
2.7 Troubleshooting 2-25
2.8 The namespace Concept 2-30

Chapter 3 Control Structures 3-35


3.1 Selection 3-36
3.2 Repetition 3-40
3.3 Other Control Statements 3-44
3.4 Tying Loose Ends 3-56

Chapter 4 Modular Program Development 4-65


4.1 Functions 4-66
4.2 More about Functions 4-73
4.3 Scope of Variables 4-79
4.4 Developing Modular Programs 4-82
4.5 Function Templates 4-91
4.6 Case Study: A 4-Function Calculator 4-93
4.7 Exception Handling 4-98

Chapter 5 Numerical Analysis: Introduction 5-109


5.1 Approximations and Errors 5-110
5.2 Series Expansions 5-121

Chapter 6 Root Finding, Differentiation and Integration 6-129


6.1 Roots of Equations 6-130
6.2 Numerical Differentiation 6-135
6.3 Numerical Integration 6-142
6.4 User-Defined Functions 6-152

Chapter 7 Classes: Objects 101 7-161


7.1 A Detour – Why OOP? 7-162
7.2 Components of a Class 7-166
7.3 Developing and Using Classes 7-174

S. D. Rajan, 2000-24 x


O B J E C T - O R I E N T E D N U M E R I C A L M E T H O D S

7.4 Storage with std::vector class 7-180


7.5 String Manipulation with std::string class 7-182
7.6 What is struct? 7-188
7.7 Object-Oriented Solution 7-188

Chapter 8 Pointers 8-205


8.1 Memory Management 8-206
8.2 Pointers 8-208
8.3 Dynamic Memory Allocation 8-210
8.4 Case Study: A Simple Vector Class 8-214

Chapter 9 Classes: Objects 202 9-225


9.1 Operator Overloading 9-226
9.2 More about Classes 9-233
9.3 Template Classes 9-236
9.4 Arrays 9-241
9.5 Command Line Arguments 9-254
9.6 Exception Handling 9-255

Chapter 10 Matrix Algebra 10-265


10.1 Fundamentals of Matrix Algebra 10-266
10.2 Solving Linear Algebraic Equations 10-269
10.3 Case Study: A Matrix Toolbox 10-286

Chapter 11 Regression Analysis 11-293


11.1 Building a Model 11-294
11.2 Least Squares Fit 11-295
11.3 General Least Squares Fit 11-298

Chapter 12 File Handling 12-305


12.1 File Streams 12-306
12.2 File Input and Output 12-306
12.3 Advanced Usage 12-311

Chapter 13 Classes: Objects 303 13-329


13.1 Software Engineering 13-330
13.2 Inheritance 13-339
13.3 Polymorphism 13-344
13.4 Function Pointers and Functors 13-353
13.5 Numerical Toolbox with Exception Handling 13-358

Chapter 14 Ordinary Differential Equations 14-369


14.1 Ordinary Differential Equations 14-370
14.2 Runge-Kutta Methods 14-371

Chapter 15 Partial Differential Equations 15-381


15.1 Background 15-382
15.2 The Element Concept 15-390

S. D. Rajan, 2000-24 xi


O B J E C T - O R I E N T E D N U M E R I C A L M E T H O D S

15.3 One-Dimensional Boundary Value Problem 15-401


15.4 Solid Mechanics 15-403
15.5 Heat Transfer 15-406
15.6 Higher Order Elements 15-411
15.7 Mesh Refinement and Convergence 15-415

Chapter 16 Eigensystems 16-423


16.1 Eigenproblems 16-424
16.2 Properties of Eigensystems 16-426
16.3 Vector Iteration Methods 16-427
16.4 Transformation Methods 16-430
16.5 One-Dimensional Eigenproblem 16-436

Chapter 17 Numerical Optimization 17-441


17.1 Numerical Optimization 17-442
17.2 Types of Mathematical Programming Problems 17-446
17.3 Non-Linear Programming (NLP) Problem 17-448
17.4 Genetic Algorithm 17-455
17.5 Design Examples 17-462

Chapter 18 Computer Graphics 18-475


18.1 Introduction 18-476
18.2 Graphics Operations 18-477
18.3 Transformations and Projections 18-477
18.4 Three-Dimensional Graphics 18-479
18.5 Case Study: 3D Wireframe Viewer 18-483
18.6 Using FLTK 18-486

References R-493
Appendix A Using Microsoft Visual Studio 2022 A-495
Appendix B Standard Template Library B-513
Appendix C C++ Odds and Ends C-525

S. D. Rajan, 2000-24 xii


O B J E C T - O R I E N T E D N U M E R I C A L M E T H O D S

S. D. Rajan, 2000-24 xiii


1
I N T R O D U C T I O N

Chapter

Introduction
“Thereareonlytwo kindsoflanguages: theonespeoplecomplainaboutandtheonesnobodyuses..”BjarneStroustrup

“Theworldhateschange,yetitis theonlythingthathasbroughtprogress.”CharlesF.Kettering

“Mentakeonlytheirneedsintoconsideration,nevertheirabilities.”NapoleonBonaparte

The steam engine is said to have fueled the Industrial Revolution. In a similar vein, the microprocessor has fueled the
Information Age and affected every single facet of humanity from health to education to work and play. Scientists and engineers
have played a pivotal role in fueling this revolution. Developments in computer languages as well as the development of
numerical analysis techniques have made it possible to create computer systems that can perform amazing tasks – allow two
people thousands of miles away to communicate with each other, fly a spacecraft from the earth to Mars, help in the design and
manufacture of artificial limbs, create virtual environments for the development and testing of aircrafts, automobiles, and a host
of other products.
Learning a new human language can be a daunting task. But as linguists would tell you, the key to learning a new language is to
read, write, listen, and speak as much as possible. Learning a language involves knowing the “alphabet”, sentence construction,
the grammar, ability to read, write and speak. What does one mean when one says, “I am fluent in Spanish.”? Does that indicate
a fluency in reading, or writing, or speaking, or all? What does fluency mean? Are there different grades of fluency? There are
no definitive answers to these questions. While the situation with computer languages is similar, there are subtle differences.
Language standards developed by American National Standards Institution (ANSI) and International Standards Organization
(ISO) have strongly discouraged the proliferation of language dialects. On paper, a program written using the standards should
compile and execute on any hardware-software platform. These standards evolve over time and are agreed on by the Standards
Committee unlike human languages that have their own evolutionary scheme. The most important difference is that one can
make mistakes in “writing and speaking” programs and learn to correct them anonymously without peer comments or stranger
criticism!
Tens (if not hundreds) of computer languages have been developed and used by programmers throughout the world over the
last several decades. Some of these include BASIC, FORTRAN (FORmula TRANslation), COBOL (COmmon Business
Oriented Language), Lisp, Ada, Pascal, C, Smalltalk, Java and so on. The C++ language is an extension of C. Bjarne Stroustrup
developed this language in the early eighties at AT&T Bell Laboratories. It would be incorrect to refer to C as a subset of C++
or C++ as a superset of C. C++ has features that make it a programming language of choice for business, scientific and
engineering applications.
We hope this book will open your minds (and the doors) to making this world a better place to live in.

Objectives of this book


 To understand why C++ is a suitable computer language for the development of software for scientific and engineering
analysis and design.
 To study and understand the different numerical analysis techniques which are useful for engineering analysis and design.
 To study and understand the basic concepts associated with software development.
 To understand the basics of object-oriented (OO) programming.
 To apply OO techniques in the development of software-based numerical solution techniques.

S. D. Rajan, 2000-24 1-1


I N T R O D U C T I O N

The constructs of the C++ language are slowly introduced throughout the text. The basics of the language are introduced in
Chapter 2. More useful ideas and constructs are discussed in Chapters 3 and 4. This background is enough to start a serious
study of numerical analysis techniques. We start with basic ideas such as approximations and series expansions in Chapter 5
followed by roots of equations and numerical integration and differentiation in Chapter 6. In Chapter 7 we see a gentle
introduction to object-oriented (OO) ideas. The applications of OO ideas especially with regards to development of scientific
and engineering software development are shown in Chapters 8 and 9. Having introduced some of the basic building blocks of
numerical analysis – arrays (vectors and matrices), we move on to Chapter 10 where we see a number of matrix operations
including solutions to linear algebraic equations. We follow this with associated numerical analysis ideas – interpolation,
polynomial approximation and curve fitting.
We see more advanced ideas and topics in the second half of the book. In Chapter 12, file handling constructs are discussed.
We follow this in Chapter 13 with more advanced ideas dealing with classes and objects such as inheritance and polymorphism.
In Chapters 14 through 17, we see important numerical analysis ideas dealing with ordinary differential equations, partial
differential equations, eigenproblems and numerical optimization. Finally in Chapter 18, we see an introduction to computer
graphics.

S. D. Rajan, 2000-24 1-2


I N T R O D U C T I O N

1.1 What is a Computer?


A computer is a sophisticated tool. It is typically made up of hardware, firmware and software. Hardware (see Fig. 1.1) includes
components such as a computer case containing the motherboard, central processing unit (CPU), random access memory
(RAM), CD-ROM drive, floppy drive, magnetic or solid-state disk (hard disk), video card, sound card, network interface card
(NIC) etc., along with output devices such as monitor, printer etc. and input devices such as keyboard, mouse etc. Software is
defined as programs, routines, and symbolic languages that control the functioning of the hardware and direct its operation.
The operating system (OS) is an example of software as are word processors, drawing programs, video-editing programs, CAD
programs, weather forecasting programs, web browsers, machine learning software, etc. Firmware includes coded instructions
that are stored permanently in read-only memory (ROM). For example, when the computer is powered on, instructions from
the firmware are used to load the OS from the disk and pass control to it.
Motherboard
CPU
Hard Disk
Video Card
RAM
NIC
Sound Card
CD-ROM drive
Floppy Drive

Fig. 1.1 Components of a computer system


Hardware advances are taking place at a breathtaking speed whereas advances in software are at a relatively much slower rate.
Despite these advances, increases in human productivity in the workplace seem to be nebulous. Some of the problems seem to
be because of a lack of understanding of what technology is and how humans need to interact with technology. Successful
deployment of computer-based tools is not just a function of computer hardware or software but involves much more as we
hope to see in this text.

1.2 What is a Computer Program?


A computer program contains low-level instructions (or machine instructions) that enable information (instructions and data)
flow between different components of a computer. How do we develop a computer program? The process typically starts with
a need for a computer program. Based on the needs one would develop a set of specifications that detail the capabilities of the
computer program, on what computer platforms it needs to run, how it would interact with the users, the type and nature of
input that would drive the program and the type and nature of the output that the program would generate. The computer
programmer or software engineer would then take the capabilities specifications and translate them into implementation
specifications. A short answer is that the programmer would develop the source code, compile the source code to create the
object code, link the object code to create the executable image (computer program), and then execute the computer program
on a specific hardware-software computer system. Let us now see what are meant by these terms.
Source Code: Source code is made up of statements in a computer programming language. Here are the source statements of
a computer program in FORTRAN that converts weight from pounds to Newtons.
C PROGRAM CONVERT
C
C CONVERTS POUNDS TO NEWTONS
C
REAL POUNDS, NEWTONS
C
C --- GET THE USER INPUT IN POUNDS
WRITE (*,100)
100 FORMAT (1X, 'INPUT WEIGHT IN POUNDS: ')
READ (*,*) POUNDS

S. D. Rajan, 2000-24 1-3


I N T R O D U C T I O N

C
C --- CONVERT TO NEWTONS
NEWTONS = 4.448*POUNDS
C
C --- DISPLAY THE CONVERSION
WRITE (*,101) POUNDS, NEWTONS
101 FORMAT (1X, 1PE15.8, ' POUNDS IS EQUAL TO ', 1PE15.8,
1 ' NEWTONS')
C
C --- ALL DONE.
STOP
END

Typically, one would create the source code using an editor. An editor is a computer program that allows the creation of and
subsequent editing of the contents of a text file. Microsoft Word© or Notepad are examples of an editor. Programmers use a
custom-made environment called an Integrated Development Environment (IDE) to develop, write, edit, debug and execute
computer programs. The source statements are stored in one or more files, and they have a special file extension. C/C++
source files have file extensions as .cpp, .c, .h etc.
Compile: Once the source statements are ready, they need to be compiled. A compiler (another computer program!) takes the
source statements (in one or more files) and creates object files. The compiler issues warnings and error messages if the source
statements do not follow the C++ syntax. One can look at object files as intermediate files that by themselves cannot be used
but are needed to create the executable image. Object files have file extensions as .obj, .o etc.
Link (or Build): Once all the source statements are compiled and the object files are created, the linker (another computer
program!) is used to “tie” the object files to other C++-enabled components (or libraries) so as to produce the executable image.
Executable files have file extensions as .exe etc. though on Unix systems, by default, the linker created executable image is called
a.out. If the linker is not able to find all the components, it will issue error messages and the executable image is usually not
created.
Execute: Once the executable image or the program is created, it can be executed on the hardware-software platform for which
it was developed. Programs may not execute correctly because of either logical errors or run-time errors. If a program runs from
start to finish without any errors but does not produce the correct output is said to have logical errors. On the other hand, if
the program “crashes” during execution then it has encountered a run-time error. Examples include illegal operation (divide by
zero, overflow, underflow, etc.), illegal memory access (access violation) etc.
Debug: Programs rarely work correctly the first time. Finding and correcting both logical errors and run-time errors are
challenging. However, there are systematic approaches and debugging tools that can be used to find these errors in programs
small and large.
Writing excellent computer programs is an art whereas the skills necessary for writing good computer programs can be learnt
through good programming habits (much as learning good scientific or engineering practices). There are three distinct
components that we must deal with in our quest to program effectively. First, we have the language itself. Second, is the
environment in which the computer programs are developed, written, debugged and executed. A good IDE certainly helps.
Learning to effectively use IDE tools – editor, debugger etc. can be a life saver. Last, we have the programmer with his or her
thought processes, ability to visualize, plan and execute, idiosyncrasies, troubleshooting capabilities, and hopefully, a whole lot
of patience.

1.3 Programming in C++


The CEO of Frame and Truss
Told his employees “Thou shan’t cuss”
If you so hate FORTRAN
Then join the clan
That writes its programs in C++

As we will repeatedly see in this book, skills can be mastered through practice and hard work. Note that there is nothing more
satisfying than a completed, running program however little or much it may do!
An interesting site that can be used as a starting point to have answers to questions about C++ is

S. D. Rajan, 2000-24 1-4


I N T R O D U C T I O N

https://isocpp.org/faq
I will use four (philosophical) questions and answers from that site.
Question: Is C++ a practical language?
Answer: Yes. C++ is a practical tool. It's not perfect, but it's useful.
In the world of industrial software, C++ is viewed as a solid, mature, mainstream tool. It has widespread industry support which
makes it "good" from an overall business perspective.
Question: Is C++ a perfect language?
Answer: Nope. C++ wasn't designed to demonstrate what a perfect OO language looks like. It was designed to be a practical
tool for solving real world problems. It has a few warts, but the only place where it's appropriate to keep fiddling with something
until it's perfect is in a pure academic setting. That wasn't C++'s goal.
Question: What is the big deal with OO?
Answer: Object-oriented techniques using classes and virtual functions are an important way to develop large, complex software
applications and systems. So are generic programming techniques using templates. Both are important ways to express
polymorphism – at run time and at compile time, respectively. And they work great together in C++.
There are lots of definitions of “object oriented”, “object-oriented programming”, and “object-oriented programming
languages”. For a longish explanation of what Stroustrup thinks of as “object oriented”, read Why C++ isn’t just an object-oriented
programming language1. That said, object-oriented programming is a style of programming originating with Simula (about 40 years
ago!) relying on encapsulation, inheritance, and polymorphism. In the context of C++ (and of many other languages with their
roots in Simula), it means programming using class hierarchies and virtual functions to allow manipulation of objects of a variety
of types through well-defined interfaces and to allow a program to be extended incrementally through derivation.
Question: Is C++ better than Java? (or C#, C, Objective-C, JavaScript, Ruby, Perl, PHP, Haskell, FORTRAN,
Pascal, Ada, Smalltalk, or any other language?)
Answer: Stop. This question generates much much more heat than light. Please read the following before posting some variant
of this question.
In 99% of the cases, programming language selection is dominated by business considerations, not by technical considerations.
Things that really end up mattering are things like availability of a programming environment for the development machine,
availability of runtime environment(s) for the deployment machine(s), licensing/legal issues of the runtime and/or development
environments, availability of trained developers, availability of consulting services, and corporate culture/politics. These business
considerations generally play a much greater role than compile time performance, runtime performance, static vs. dynamic
typing, static vs. dynamic binding, etc.
Those who ignore the (dominant!) business criteria when evaluating programming language tradeoffs expose themselves to
criticism for having poor judgment. Be technical, but don’t be a techie weenie. Business issues really do dominate technical
issues, and those who don’t realize that is destined to make decisions that have terrible business consequences — they are
dangerous to their employer.
The most widely circulated comparisons tend to be those written by proponents of some language, Z, to prove that Z is better
than other languages. Given its wide use, C++ is often top of the list of languages that the proponents of Z want to prove
inferior. Often, such papers are “published” or distributed by a company that sells Z as part of a marketing campaign.
Surprisingly, many seem to take an unreviewed paper written by people working for a company selling Z “proving” that Z is
best seriously. One problem is that there are always grains of truth in such comparisons. After all, no language is better than
every other in all possible ways. C++ certainly isn’t perfect, but selective truth can be most seductive and occasionally completely
misleading. When looking at a language comparison consider who wrote it, consider carefully if the descriptions are factual and
fair, and also if the comparison criteria are themselves fair for all languages considered. This is not easy.
It should be noted that a programming language is a vehicle; the programmer is the driver and must chart the course.

1 https://www.stroustrup.com/oopsla.pdf

S. D. Rajan, 2000-24 1-5


I N T R O D U C T I O N

1.4 What is Numerical Analysis?


There are hundreds of engineering and scientific problems that can be solved analytically. Consider the simple example of
finding the roots of a quadratic polynomial of the form f ( x )  ax 2  bx  c . The analytical solution to the real roots is
b  b 2  4 ac
x 1,2  provided b 2  4 ac  0 and a  0 . When this problem is presented with numerical values for the
2a
constants a , b and c , one can find the roots by using a pencil and paper, or a calculator. When the problems become more
complex, analytical solutions are either too difficult or time consuming to obtain, or it is simply not possible to obtain. In cases
such as these, it may be possible to obtain an approximate, numerical solution.
There are at least three components to obtaining the numerical solution. First, there must exist a procedure, method, or
technique that establishes the link between the input to the problem and the expected output from the solution methodology.
Second, it is necessary to translate the solution methodology to a finite set of distinct steps or algorithm. Last, we must be able
to take the algorithm and convert it to a computer program. Let us look at the second and third components in the next two
sections. The rest of the book is about the first component!

Algorithm
Solutions to problems usually follow a specific path – examination of the problem statement, identification of the problem
parameters by differentiating between the input and the output parameters, recognizing what theoretical details (methods,
techniques etc.) are applicable, and finally, development of the algorithm that bridges the gap between the input and the output
parameters.
For the problems discussed in this text we will define the solution process in terms of algorithms. An algorithm is a sequence
of detailed steps that is general enough to be applicable for most situations in solving a problem. These steps involve the input
variable(s) to the procedure, and lead to the determination of the output variable(s).
Steps in a typical algorithm involve input, output, assignment, and control structures.
Input and Output
Every algorithm involves either input or output or, more often than not, both. For example, the general procedure for the
analysis of beam deflections using the solution of an ordinary differential equation uses the beam cross-sectional and material
properties, the different span lengths, the loading on the different spans, and the manner in which the beam is supported as
input to generate the output – the rotations, displacements, and the internal forces along the beam.
Assignment
An assignment in the form of an equation or expression is made up of one or more of algebraic operators, variables, constants,
and mathematical functions. Examples of algebraic operations include addition, subtraction, multiplication, division etc.
Examples of mathematical functions and operators include sin , , etc. We will be introduced to specific C++ examples
of assignments in Chapter 2.
Control structures
While an algorithm may have several steps, the computations are executed in a specific order that may involve only a few steps,
or certain steps may be executed repeatedly. Control structures help the programmer in sequencing the execution of the steps
in an algorithm. Research in program development has shown that control structures can be divided into three types – the
sequence structure, the selection structure, and the repetition structure. Sequence structure means that statements are executed in order –
this is the manner in which an algorithm is developed in terms of ordered steps. As the name suggests, the selection structure
involves execution or skipping of specific steps in an algorithm. Finally, the repetition structure involves repeated execution of
a sequence of steps. We will be introduced to specific C++ examples of control structures in Chapter 3.
We will illustrate these ideas through an example.
1.4.1 A Sample Algorithm
Consider the problem of finding a root, x̂ of a nonlinear equation given as f ( x )  0 . A well-known solution technique is the
Newton-Raphson Method that we will see in detail in Chapter 6. The basic idea is to start with an initial guess, x 0 for the root.
A better estimate x k 1 for the root is generated as

S. D. Rajan, 2000-24 1-6


I N T R O D U C T I O N

f (x k )
x k 1  x k  k  0,1, 2,... (1.4.1.1)
f x k 
The iterative process of finding a better estimate continues until an appropriate termination criterion is reached. For example,
one could establish the maximum number of iterations, kmax , or one could compare the change in the estimate against a
predefined tolerance,  , as x k 1  x k   , or compare the change in the function value against a predefined tolerance,  , as
f  x k 1   f  x k    , or even f ( x k 1 )   .
Before a computer program is written, one must translate the theory and process discussed above into an algorithm. A good
algorithm is a detailed set of instructions that a computer programmer can use to translate into appropriate computer statements.
Algorithm for Newton-Raphson Method
(1) Input: Establish the values for kmax ,  ,  ,  (see Step 3) and the initial guess x 0 .
(2) Set k  0 .
(3) Compute f ( x k ) , f ( x k ) .
(4) If f ( x k )   , note that the solution did not converge and go to Step 8.

(5) Otherwise, use Eqn. (1.4.1.1) to find the next estimate x k 1 .


(6) Check the termination criteria. If one or more criterion is satisfied, then go to Step 8.
(7) Otherwise, increment k  k  1 . Go back to Step 3.
(8) Output: Message as to whether the solution converged or not. If the solution converged, then the values of x̂ , f ( xˆ ) and
k are useful output values.
Steps (2) through (7) involve repetition. Selection is shown in Step (4) and Step (6). Note Step (4). The second term in Eqn.
(1.4.1.1) is singular if f ( x k )  0 . However, the term is numerically singular (leading to an exception - numerical overflow), if
f ( x k ) is numerically small. An exception is a kind of error that may prevent a program from progressing further and from
producing correct output. Assignment takes place in Step (5).
1.4.2 Implementing Algorithms
The next step in the overall process is to translate the algorithm into a program or program component. However, there are
still unresolved issues such as program architecture, data structures, program input and output that the programmer must have
answers to before a computer program is actually developed.
In some sense, we are jumping ahead of ourselves. The thought process starts with statements that describe the problem and
your abstract view that describes the model for the problem. For example, if the main objective is to develop computer programs
to carry out the structural analysis of structural systems such as planar trusses. A typical abstraction may look as shown in Fig.
1.4.2.1.
The model is an abstraction. For example, what does the statement “Define a truss” mean? How do we go from input data to
generation of output? How do we define and store the data associated with a truss? How general is our model? What are its
limitations? What steps should be followed to turn this abstract model into a functional program? What is the role of algorithms
in this model?
We will see some these issues discussed throughout the text.

S. D. Rajan, 2000-24 1-7


I N T R O D U C T I O N

Problem Description

2k 4k 2k

10’ 10’ C
A B

15’
D

Abstraction

Model
...
Define a truss
...
Read the input data.
...
Analyze.
...
Generate the output.
...

Fig. 1.4.2.1 Creating an abstract model

1.5 What are Objects?


Objects are routinely identified by human beings. They are given a name, defined as having properties (what is it?) and
capabilities to do something (what does it do?). Computer scientists term these characteristics as attributes and behavior. Let us
look at a few examples.
A person can be thought of as an object. The attributes for a person include the person’s name, gender, height, weight, age etc.
Some of the questions that we should be able to ask about the person include “Is this person eligible to vote?” (determined
based on the age), “Will this person fit inside a NASA capsule?” (determined based on the height), etc.
Time is an object. It is defined in terms of hour, minute and second. These are its attributes. Some of the questions that we
should be able to ask about time include the following - “Is it past breakfast time?”, “How much time has elapsed since 9 a.m.?”,
“How do we print time in different formats?”, etc.
Construction material from the viewpoint of a structural engineer is an object. The material is identified by a name, e.g., A36 Steel,
Aluminum 2014-T6 etc. In addition, it has structural properties of interest to us – modulus of elasticity, mass density, yield stress
etc. These are the possible attributes for a material object. We would like to know “Is the material elastic with a normal strain
value of 0.02?”, “How much does a cubic meter of the material weigh?”, etc.
In a computer program, the objects are defined in terms of classes. An instance of a class is an object in a manner similar to
int x;
where x (object) is an instance of type integer (class). One can think of classes as user-defined data type. We should be able to
set as well as obtain the values for some or all the attributes that define an object. We would also like to have answers to the
behavioral aspects of the class. Such features are available through the class interface. We will see more about classes starting in
Chapter 7.

1.6 Why Object-Oriented Numerical Analysis?


Large software developers will tell you that OO technology has revolutionized software engineering. It is changing the way in
which software is developed, reused, and enhanced. Here are some excerpts about the benefits of OO technology [Lee and
Tepfenhart, 2001].

S. D. Rajan, 2000-24 1-8


I N T R O D U C T I O N

 The proficiency of higher-level OO model should provide the software designer with real-world, programmable
components, thereby reducing software development costs.
 Its capability to share and reuse code with OO techniques will reduce time to develop an application.
 Its capability to localize and minimize the effects of modifications through programming abstraction mechanisms will
allow for faster enhancement development and will provide more reliable and more robust software.
 Its capability to manage complexity allows developers to address more difficult applications.
Object technologies lead to reuse, and reuse of program components leads to faster software development and higher-quality
programs. Object-oriented software is easier to maintain because its structure is inherently decoupled. This leads to fewer side
effects when changes must be made and less frustration for the software engineer and the customer. In addition, object-oriented
systems are easier to adapt and easier to scale – large systems can be created by assembling reusable subsystems [Pressman,
2001].

1.7 Tips and Aids


Finally, a few words about getting the most from this book and from a course on numerical analysis and computer
programming. The presentation style used in this text has two emphases. First, the relevant syntax, theory and/or assumptions
are presented that describe the C++ language, a numerical analysis technique or method. Second, the syntax or theory is
followed by several examples that serve to illustrate different traits of the C++ language syntax, problem-solving abilities of the
method, technique or theorem. Every example has been chosen with care. You are encouraged to look at each and every
example in detail. Ideas are explored and comments made so that we are able to appreciate the different nuances of the theory
and its applications. Finally, every section is followed by several problems that you are encouraged to solve. Some have answers
that can be used to check your solution.
I have accumulated over the years some perspective on effective teaching and learning.
 Come prepared to class. Spend about 15-30 minutes reading the material for that day’s lecture. Read it more than once if
necessary, even if you do not understand most of the material. Familiarity with the “language of the lecture” – terminology,
figures, equations etc. is a major advantage that you will carry with you to the lecture.
 At the end of the day (not the end of the week), carefully review the lecture material – language syntax, algorithm,
derivations, solved examples, etc. Look at this task as changing the oil in an automobile – either you do it regularly or you
take the car to a mechanic for a major repair job later. Close the lecture notes and resolve the solved examples from start
to finish. Go back to the lecture notes if you are unsuccessful. Repeat the exercise until you successfully solve the problem.
Too often we are tempted to say, “I understand the material but …”. There is simply no substitute for solving a problem
from start to finish.
 Practice, practice and practice. You will have a better understanding of the material by solving problems.
 Don’t be afraid of asking questions. A correctly posed question can be a huge life and time saver. Your ability to ask the
correctly posed question is directly proportional to the amount of “homework” you do.
 Use every conceivable resource – the instructor’s knowledge and office hours, the TA’s enthusiasm, knowledge and office
hours, the library, material available on the internet, etc. This is perhaps the only time in your life when you will have the
time, energy, motivation, atmosphere and resources available simultaneously to meet your needs.
 Study in small groups associating with other students who are as motivated, diligent, and capable, if not better, than you
are. Otherwise, this is likely to be a waste of time. Put this study session into your weekly schedule.
 A student should be ethical before he or she becomes a practicing scientist or engineer. When it is not clear as to what is
ethical, consult someone knowledgeable.
 Successful exploitation of numerical analysis and computer programming requires understanding of the material, patience,
practice, and application. One of the most attractive aspects of numerical solutions is that we can develop the entire
solution, apply, and debug the solution in a virtual environment. Do not hesitate to experiment with ideas or be innovative.
Understand the problem, be systematic, and when required, do not ignore the details. The devil is in the details.
It is hoped that the material in this text encourages and helps you in being a better scientist or engineer and an even better
student.

S. D. Rajan, 2000-24 1-9


I N T R O D U C T I O N

Summary
In this first chapter of the book, we saw the basic ideas associated with computer programming using C++ and its links to
numerical solution techniques. It is important that readers be aware that the basic quest in the book is to develop and use reliable
tools to solve scientific and engineering problems.

Where to go from here?


This would be the right time to install the Integrated Development Environment (IDE) that you wish to use. The Microsoft
Visual Studio environment is discussed in Appendix A. Go ahead and install the IDE and familiarize yourself with the different
components. Write, compile, link and execute the sample “Hello World” program discussed in the Appendix.
There are other (other than Microsoft) C++ compilers available for the Windows operating system. These include compilers
from Intel (integrated into MS Visual Studio), g++ under cygwin, Bloodshed Dev-C++, Digital Mars and others. In addition,
there are other IDEs available on other operating systems. Notable among them are IDEs (Microsoft Visual Studio Code, not
to be confused with Microsoft Visual Studio that is used in developing all the programs in this textbook) and C++ compilers
for the Linux operating system such as Intel, Portland Compiler Group, Eclipse, g++, etc. that are available for free to students.

S. D. Rajan, 2000-24 1-10


I N T R O D U C T I O N

Exercises
Appetizers
Problem 1.1
Research the world wide web to find answers to the following questions.
(a) What other computer languages not mentioned in this Chapter were or are being used by programmers? Write a short
paragraph on each of these languages.
(b) What hardware advances have taken place in the last 5 years that are now available in computer systems?
(c) What are the most commonly used operating systems? What hardware platforms do these operating systems require?
Problem 1.2
Write an algorithm for finding the maximum of a set of numbers.
Problem 1.3
Write an algorithm for finding the minimum of a set of numbers.
Problem 1.4
Consider an airplane as an object. Identify its attributes. List its behavior that one may want to incorporate in a computer
program.
Problem 1.5
Are the following hardware, software, firmware or none of the above? (a) Microsoft Excel© (b) CPU (c) Adobe Acrobat© (d)
Cache (e) BIOS (Basic Input/Output System) (f) Personal Device Assistant (PDA).

Main Course
Problem 1.6
Consider a point in space defined in terms of its  x , y , z  coordinates. Write an algorithm (including error detection) for each
one of the following tasks.
(a) Compute the distance between two points.
(b) Compute a unit vector between two points.
(c) Compute the shortest distance of a point from a straight line. Assume that the line is defined by two points.
Problem 1.7
Consider each of the following as an object. Identify their attributes. List their behavior that one may want to incorporate in a
computer program. (a) Time (b) Bank account (c) Employee (d) Fuel sources.
Problem 1.8
Make a list of frequently asked questions (FAQs) on how to use your favorite IDE. Get the answers to these questions.

C++ Concepts
Problem 1.9
Write an algorithm for the tic-tac-toe game. Make the following assumptions. The game will be played exactly once. Who (user
or computer) will play first will be determined by a coin toss. The person who plays first marks an X and the other person uses
an O. The cells in the grid are identified by numbers 1 through 9 - the user inputs a valid number 1 through 9. You have to
develop strategies for the computer (aim being to win the game if possible).
Problem 1.10
Describe the steps that you would take to convert the ideas from Problem 1.9 into a computer program.

S. D. Rajan, 2000-24 1-11


I N T R O D U C T I O N

References
Gary Bronson, C++ for Engineers and Scientists, Brooks/Cole Publishing Company, 1999.
Gary Bronson, Program Development and Design Using C++, Brooks/Cole, 2000.
Dietel and Dietel, C++ How to Program, 3rd Edition, Prentice-Hall, 2000.
Cay Horstmann, Computing Concepts with C++ Essentials, Wiley, 1997.
Stanley Lipmann, Josee Lajoie and Barbara Moo, C++ Primer, Addison-Wesley, 2005.
Rick Mercer, Computing Fundamentals with C++, 2nd Edition, Franklin, Beedle and Associates, 1998.
Scott Meyers, Effective C++, Addison Wesley, 1999.
Walter Savitch, Absolute C++, Addison Wesley, 2002.
Victor Shtern, Core C++ - A Software Engineering Approach, Prentice-Hall, 2000.
Bjarne Stroustrup, Programming: Principles and Practice Using C++, Addison-Wesley, 2009.

S. D. Rajan, 2000-24 1-12


2
P R O G R A M M I N G I N C + +

Chapter

Programming in C++
“Maninventedlanguagetosatisfyhis deepneedtocomplain.”LilyTomlin

“Amanofgreatmemorywithoutlearninghatharockandaspindleandnostafftospin.”GeorgeHerbert

“Realprogrammerscanwriteassemblycodeinanylanguage.”LarryWall

In this chapter we will see the power of the C++ language through simple yet powerful programs. We will start by examining
a short and complete program to show the different components that are found in most C++ programs. Learning to use C++
is in some ways learning a human language. You will have to learn the syntax of the language. You will have to practice on how
to use the language correctly. You will have to seek the help of more experienced people when you encounter a problem.
Sometimes, you will have to put logic and common sense aside and accept the idiosyncrasies of the language. And finally, you
will have to be very patient, organized and determined.
Use a hands-on approach to programming. Try to build and execute the program. There is simply no substitute for practice,
practice, and practice in learning how to develop computer programs. It is highly recommended that you use an Integrated
Development Environment (IDE) to develop, debug and refine your programs. What is an IDE? An IDE is an environment
(a visual computer program) that gives a programmer a variety of tools to make the development and maintenance of programs
easier to do. A typical IDE provides (a) an editor to create and edit the source code, (b) a program “make” capability – a tool
that instructs the compiler and the linker on what and how to compile and link a program, and (c) an interactive debugger that
would help the programmer step through and debug the program etc. Microsoft Visual Studio is an example of an IDE that
provides an environment for building applications or programs in a variety of languages – Basic, C++, FORTRAN etc.
This chapter is just the beginning. However, it will provide a quick jump start and rapidly introduce several useful features. A
full understanding of the features will come as we use a feature repeatedly throughout the text.
C++ is an extremely powerful language. We will learn more about the language features and capabilities throughout this text.
Finally, it is recommended that you go through the entire chapter, perhaps more than once, in order to get a firm grasp of the
basics – creating, editing, compiling, linking, executing and debugging a program.

Objectives
 To understand the basic syntax of C++ programs.
 To understand the concepts associated with data types, variables, arithmetic expressions, assignment statements and simple
input and output.
 To understand and practice writing complete C++ programs.
 To compile, link and execute programs.
 To learn the art of troubleshooting.

SUPPLEMENTAL MATERIAL

All sample programs shown in this text are available on the book web site. The programs have been compiled and executed
using Microsoft Visual Studio 2022 compiler running under Windows 10/11. Unless otherwise noted, the programs should
compile and execute on all ISO-compliant systems.

S. D. Rajan, 2000-24 2-13


P R O G R A M M I N G I N C + +

2.1 Introduction
Every language, whether human or computer based, has its syntax. We must recognize and use the syntax. Unlike human
languages and their usage, the computer language syntax is unforgiving and hence must be followed correctly. If we don’t, it
would be impossible not only to execute computer programs correctly but also it may not be possible to build computer
programs. Below we present a small but complete C++ program. This program is designed to display the string Welcome to
Object‐Oriented Numerical Analysis. on the screen.
The source code to a typical computer program is made up of several lines of input contained in a text file. A text file is a file
that can be viewed in an editor or viewer such as Microsoft Windows Notepad. Often, the file extension associated with a C++
source file is cpp. For example, naming a file main.cpp would imply that the file contains C++ source code.
Example Program 2.1.1 A Simple C++ Program
In the example shown below, the contents of the actual text file (also called source code) are shown with the line numbers on
the left so that references to individual lines can be made later in the text. The color coding is via Microsoft Visual Studio editor.
main.cpp

The example shows some program components of a long list that you will possibly find in a C++ program. The lines of input
in a C++ program are free format input meaning that one can type the input starting at any location in a line. One can also
break the input into several lines. There are exceptions to this rule as we will see later.
How are the statements executed? In C++ as in most other languages, the statements are executed sequentially as they appear
in the program. In other words, the statement in line 1 is executed first, followed by the execution of the statement in line 2 and
so on. Sometimes, some lines are not executed because they are comment lines and sometimes the program logic requires that
these statements be skipped.
Lines 1 through 9 are comment lines. These lines are ignored by the compiler. Good programming habit is to add comment
lines to all programs not only to explain to ourselves what is being done in the program but also to help others who may have
to use our computer programs. A comment section starts with the pair /* and ends with the pair */. Line 10 is a blank line that
has been used to improve the readability of the program as are lines 12 and 16. In line 11 a different style of adding comments
is shown – anything that follows (to the right of) // on a line is treated as a comment and is ignored.
A typical C++ program will not be totally self-contained. It will leverage components created by others. Those starting new
with C++, may find it confusing as to what is a part of the language, what is not and how the language can be extended by the
programmer to meet the programmer’s requirements. C++ has a set of reserved keywords and symbols (or tokens) that are a
part of C++ language. In other words, as a programmer, these keywords and symbols should be used in a specific manner –
following the syntax associated with the keyword or symbol. For example, the reserved keywords used in the program are the
following: include, using, int, main, and return. Each of these keywords has a special syntax. For example, the #include
statement has the following syntax.
#include filename
The reserved symbols used in the program are # ; { } “ ” \ / * : <<. C++ provides additional functionalities through the
use of library functions etc. If we wish to use them, then we must explicitly state in our program what functionalities are being
used. The cardinal rule in C++ is that declarations must precede usage. Otherwise, there is ambiguity in the statements that
follow. We will introduce and discuss these reserved symbols throughout the text.

S. D. Rajan, 2000-24 2-14


P R O G R A M M I N G I N C + +

Going back to the example, we need to find a way to display the string on the computer screen. The correct terminology is
output the string to the screen. C++ provides several classes for input and output. The class that is used here is called iostream.
The information about the iostream class is contained in the file called iostream (such files are called header files1). The specific
object associated with this class that is most commonly used for outputting streams of characters is called cout(std::cout is
an ostream class object taken from C++’s standard library). Whenever cout is used, it must be followed by the operator << that
is then followed by the information to be displayed contained in a proper form. We will see more about this in Section 2.5.
Since the cout statement is used later in the program, the include statement is used to tell the compiler that we wish to use the
iostream class (line 11). The # sign used before the include keyword, must be placed in the first column (some compilers will
accept # as the first nonblank character in the line). The declaration in line 11 makes it syntactically correct when cout is used in
line 15.
Every C++ program must have one and only one special function called main. A typical function is a program component that
is usually called from other parts of the program. They are used to simplify the development of computer programs. main is a
function! However, the main function is not called in any program. Instead, the program execution starts in the main function.
We will study functions in greater detail in Chapter 4. The main function starts on line 13. The int keyword signifies that an
integer value will be returned from the main (function) to the program that calls main. The body of main is contained within the
symbols { and }. These symbols are known as curly braces or brackets. Hence, the three lines 15 through 17 form the body of
main. Line 15 carries out the only task that this program is designed for. In this example, the literal contents of the character
string between " " symbols are output to the screen except for \n. \n is a special formatting symbol that signifies a new line
character should be sent to the screen so that the cursor rests on the next line. The generated output is shown in Fig. 2.1.1(a).
Fig. 2.1.1(b) shows the program output if \n is not used. Finally, since we declared main to return an integer value, the last
statement in line 17 has the statement to return a zero value. Note that the ; (semicolon) symbol is used to terminate a statement.
The syntax associated with the use of a particular feature will indicate if a semicolon is required or not. As we can see from this
example, the output via cout and the return statements require a semicolon to terminate the statements.
Go ahead and compile, link and execute Example 2.1.1.

Fig. 2.1.1 (a) Output generated by Microsoft Visual Studio

Fig. 2.1.1 (b) Output generated by Microsoft Visual Studio


Tip: C++ specifies a proper format for the usage of keywords and symbols. Keywords are case sensitive - Cout is not the same
as cout. Similarly, blank spaces must be used with care - < < is not the same as <<. If something is declared, you may elect not
to use it. However, it is a wasteful practice to declare variables or functions but not to use them.
Some of the more common components of a C++ program are briefly discussed below.
Header files: Header files are separate files that contain C++ statements and are typically used in other files via the #include
statement. Header files usually have the .h file extension.
Variables: Variables are used to store values. The values are assigned to the variables and possibly changed over the course of
execution of a program.

1 A header file contains function prototypes that give the compiler information on the functions used in a program.

S. D. Rajan, 2000-24 2-15


P R O G R A M M I N G I N C + +

Expressions: Expressions are created using constants, variables, operators, functions etc. Expressions are evaluated and the result
is a value.
Statements: A statement is made up of one or more of the following - variables, expressions, C++ keywords and tokens.
Input and Output: Information is acquired by the program as input from the keyboard, mouse, file, etc. and information is sent
by the program to the monitor, file, printer etc.
Functions: A function is a program component that can be called from other parts of the program. We will see the syntax and
functionalities of functions, and when and where functions should be used later in Chapter 4.

2.2 Variables and Data Types


As was stated in the introductory chapter, most of the computer programs store and manipulate data or information. C++
provides support for several types of data. The fundamental ones are shown below. We will learn more about bits, bytes and
words in Chapter 5.
Data Type Values (System dependent) C++ Example
short 16 bits long. Integer value from -32767 to 32767 ‐145
int 32 bits long. Integer value from -2 147 483 648 to 2 147 483 647 1034
long 64 bits long. Integer value from –9 223 372 036 854 775 808 to 9 223 372 036 854 80967845L
775 807
float 32 bits long. Floating point value in the range 3.4(10  38 ) (7 digits precision) ‐0.0035f

double 64 bits long. Floating point value in the range 1.7(10  308 ) (15 digits precision) 1.45

bool false or true true


char 1 character g

An integer value, unlike a floating point value has no fractional component. The values –345 and 12000 are examples of integer
values. The values 1.34 and –0.0045 are examples of floating-point values. By default, all floating-point numbers are of type
double. Hence, 1.34 is a double constant whereas 1.34f is a float constant. Note that an L at the end of an integer value
signifies that the constant should be treated as a long data type. Similarly, an f at the end of a float number signifies that the
number is a float data type. As we will see later, there are other types of data including ones that the user or programmer can
define and store.
To store values that may change over the course of execution in a program, we use variables. In other words, variables can be
manipulated to store values. A variable is identified with a name. A variable name can have many characters (typically between
1 and 31), starts with an alphabet, and typically is made up of the following characters – a through z, A through Z, 0 through 9,
and _. No blank spaces are allowed. Examples of valid variable names:
a345, z_helper, Alpha
The following variable names are invalid.
1Alpha, a Temperature, B$65
Note that each variable can store only one value. Such a variable is called a scalar variable. When several values need to be stored,
we can use a vector variable. We will look at vector variables later in the chapter.
To understand the usage of the variables in a computer and the type of data that is stored in them, we will use the following
convention. You may find that other books or software firms have different conventions. Some of the data types have not been
discussed as yet and can be ignored for now.

Data Type Variable Name Prefix Examples


Integer scalar n? nX, nIterations, nJoints
Float scalar f? fY, fTolerance
Double scalar d? dArea, dP123
Boolean scalar b? bDone, bConverged
Integer vector nV? nVScores, nVSSN
Integer matrix nM? nMVertices, nMShapes

S. D. Rajan, 2000-24 2-16


P R O G R A M M I N G I N C + +

Float vector fV? fVRHS, fVForces


Float matrix fM? fMCoef, fMElementForces
Double vector dV? dVRHS, dVGPA
Character c? cGrade
String str? strNames, strStates

Here are a few examples illustrating how we can declare variables for different data types in a computer program.
int nV; // one integer variable
int nA=1, nB=3, nC=5; // three integer variables with initialization
int nA(1), nB(3), nC(5); // three integer variables with initialization
double dPrecision; // double precision variable
float fX, fY, fZ; // float variables
bool bStatus=true; // boolean variable with initialization
char cOperation=’+’; // character with initialization

Note the different styles in declaring the variables. For example, the statement
int nV;
declares an integer variable whose value is currently unknown. Such variables are called uninitialized variables. Good
programming practice requires that variables be used only after they are initialized. In other words, the declaration for
nV should be followed by a statement later in the program where an integer value is assigned to nV. Now consider the
following declaration.
bool bStatus=true; // boolean variable with initialization
Here the boolean variable bStatus is declared and initialized with the true value. A variable must be declared once and
only once before it is used in any program segment.

2.3 Expressions
An expression is made up of a sequence of tokens or basic elements that can be evaluated. For example,
nValA + nValB
is an expression that is made up of two variables nValA and nValB and the addition operator, +.

Mathematical Operators
The following table lists the mathematical operators you can use in constructing an expression.
Operator Meaning
+, ‐ Unary positive, negative
+ Addition
‐ Subtraction
* Multiplication
/ Division
% Modulus (or remainder)

Operator Precedence
The operators have default precedence when used in combinations. The default precedence can be overridden by using
parenthesis. The order of ascending precedence is the following.
precedence
0. = Assignment
1. + Addition
1. ‐ Subtraction
2. * Multiplication
2. / Division
2. % Modulus
3. () Parenthesis
4. +, ‐ Unary positive, negative

S. D. Rajan, 2000-24 2-17


P R O G R A M M I N G I N C + +

Consider the following examples where integer values and arithmetic are used.
Expression Order of evaluation Evaluates to
5+3‐2 5+3=8‐2=6 6
5+3*2‐1 3*2=6;5+6=11‐1=10 10
(5+3)*2‐1 5+3=8;8*2=16‐1=15 15
5*6/3 5*6=30/3=10 10
5*(6/4) 6/4=1;5*1=5 5
5*6/4 5*6=30/4=7 7
5%2*3 5%2=1*3=3 3

In the case of operators with the same precedence, the expressions are evaluated left to right. Note that integer arithmetic results
in the fractional value being lost. 6/4 evaluates to 1 as does 7/4. However, with floating point arithmetic 6.0/4.0 evaluates to
1.5. We will discuss more about this issue in Chapter 3.

Mathematical Functions
The following table lists the commonly used mathematical functions. The header file <cmath> needs to be included before we
can use these functions. The function parameters and computed value are of double data type, unless otherwise noted.
Name Function Examples
sin(x) Computes sine of an angle x expressed in radians. Similarly, sin (dX)
cos(x) and tan(x) compute cosine and tangent values cos(dX*dY/2.0)
tan(2.3+4.5/dZ)
respectively.
asin(x) Arc sine. Function returns angle expressed in radians. asin(dB)
Similarly, acos and atan are functions for arc cosine and arc
tangent respectively.
cosh(x) Hyperbolic cosine. Similarly, sinh and tanh are hyperbolic cosh (dFill)
sine and tangent respectively.
sqrt (x) Square root of x. sqrt(25.0+dAlpha)

x raised to power y  x  .
pow (x,y) y pow (fX, 3.5f)

log(x) Natural logarithm (base e) of x. log (fImpulse)


log10(x) Logarithm to base 10 of x. log10 (2.5)
fabs(x) Absolute value of x. fabs (dQ‐dR)
abs(x) Absolute value of x. abs(nM)
exp(x) Computes e to the power of x. exp (fabs(dA)+dB)
ceil (x) Returns the smallest integer that is greater than or equal to x. ceil(2.9) is 3.0
floor(x) Returns the largest integer that is less than or equal to x. floor(2.9) is 2.0

Here are more examples of mathematical expressions.


Mathematical expression C++ expression
fB*pow(fH, 3.0)/12.0
bh 3
12
(fM*fL*fL)/(2.0*fE*fI)
ML2
2 EI
sin( a )cos(b ) (sin(fA)*cos(fB))/sqrt(1.0+fC*fC)‐5.5
 5.5
1 c 2
y x  log( a ) fY*fabs(fX‐log(fA))

10t exp( t ) 10.0*fT*exp(‐fT)


0.5+asin(fA/3.1415926)
1 a
 sin 1  
2  

S. D. Rajan, 2000-24 2-18


P R O G R A M M I N G I N C + +

2.4 Assignment Statements


There are several types of C++ statements including an assignment statement. An example of an assignment operator is the
simple assignment (=) operator, which assigns the value of its right operand to its left operand. For example,
nValue = nValA;
is an assignment statement. An assignment statement is usually composed of a destination variable, an assignment operator, and
an expression to be assigned. In the following example
nValue = nValA + nValB;
the values of variables nValA and nValB are added together and the variable nValue is assigned the result of the addition operation.
Do not be surprised if you find that a variable appears on both sides of the assignment operator. For example, the statement
nA = nA + 1;
implies the following – take the current values of nA, add 1 to it, take the result and assign it as the new value of nA. The value
of the destination variable is called l-value (appearing on the left side of the assignment operator) and the value of the expression
on the right side of the assignment operator is called the r-value.
There are several arithmetic assignment operators that can be used to abbreviate assignment expressions. These operators can
reduce any statement of the form
variable = variable operator expression;
where operator is one of the mathematical operators (+, ‐, *, /, or %), to the form
variable operator = expression;
The following table shows the arithmetic assignment operators, sample statements, and their equivalent statements.
Assignment Sample C++ Equivalent C++
operator statement statement
+= c += 9 c = c + 9
‐= d ‐= 5 d = d ‐ 5
*= e *= 2 e = e * 2
/= f /= 3 f = f / 3
%= g %= 7 g = g % 7

If nA, nB and nC are all declared to be integers and fP, fQ and fR are declared to be floats, then the following assignment
statements have the same data types on both sides of the assignment operator.
nA = 10 + nB*nC;
fP = 10.2f*fQ ‐ fR;
What does the compiler do when it encounters an assignment with different data types? For example, how is
dX = 10 + 2.5*nA;

evaluated where the left-hand side is a double variable (dX) and the expression on the right is made up of an integer constant
(10), a double constant (2.5) and an integer variable nA? We will look at this situation in greater detail in the later chapters.

2.5 Simple Input and Output


The C++ standard libraries are rich in supporting input/output operations. In this section we will introduce the stream concept
and discuss the most common I/O operations.

Streams
A stream is a sequence of bytes. In input stream, the bytes flow from a device like keyboard or disk drive to main memory. In
output stream, the bytes flow from main memory to a device such as display screen, printer, or disk drive.
The iostream header file contains basic information required for all stream I/O operations. This header file contains objects
such as cin, cout, cerr, and clog, some of which will be discussed here.
The iostream library contains many classes that handle I/O operations. The istream class supports stream input operations.
The ostream class supports stream output operations. The iostream class supports both the input and output operations. The
iostream class is derived from both the istream and ostream classes.

S. D. Rajan, 2000-24 2-19


P R O G R A M M I N G I N C + +

Stream Input
Stream input may be performed with the right shift operator (>>), also referred to as the stream-extraction operator. This
operator normally skips whitespace characters like blanks, tabs, and newlines and returns zero (false) when end of file is
encountered in the input stream.
cin is an object of the istream class and corresponds to standard input device. In the following, the cin object used with the
stream-extraction operator causes a value for integer variable nScore to be input from cin to memory. Assume that nScore has
previously been declared an integer. Then the statement
cin >> nScore;
is used to read in the value of nScore.

Stream Output
The ostream class offers several output capabilities. These include output of standard data types with the stream-insertion
operator, characters with the put member function2, unformatted output with the write member function, and various
formatted output.
Stream output may be performed with the left shift operator (<<) which is also referred to as the stream-insertion operator.
cout is an object of the ostream class and it corresponds to the standard output device. The cout keyword has the following
syntax.
cout << …;
where the insertion operator << is used to output standard types represented above as …. Consider the following example.
int nA=1, nB=10, nC;
nC = nA + nB;
cout << "The sum of " << nA << " and " << nB << " is " << nC
<< "." << endl;
Note the use of endl (end line) stream manipulator. The endl stream manipulator creates the same result as \n escape sequence
and also causes the output buffer to be output immediately even if it is not full. The output generated by the above statement
is shown below.
The sum of 1 and 10 is 11.
For example, to read an integer value we could use the following statements.
#include <iostream> // iostream class
using std::cin; // standard input
using std::cout; // standard output
…..
….
int nScore;
….
cout << “What is the score? “; // ask user for the score
cin >> nScore; // read the user input

Note in this example since the using std:: style is used, both cout and cin do not need the std:: prefix. Alternately, the ::
(unary scope resolution operator) can be used directly with the standard namespace std. In other words,
#include <iostream>
using std::cout;

cout …

is equivalent to

#include <iostream>

2 Member functions will be discussed with objects and classes starting in Chapter 7.

S. D. Rajan, 2000-24 2-20


P R O G R A M M I N G I N C + +

std::cout …

The choice of which style to use in the program is a user decision.

Formatting with Stream Manipulators


As mentioned before, C++ provides many capabilities to format input and output data. Stream manipulators perform the
formatting tasks. The most common stream manipulators, descriptions, and example are in the following lists.
Floating-Point Precision
setprecision() controls the number of significant digits or significant decimal digits with one integer argument
precision() same as above and returns the current precision setting with no argument(precision 0 restores the
default precision value 6 for both)

Field Width
width() sets the field width and returns the previous width with one integer argument, and with no argument
returns the current setting
setw() sets field width(a value wider than the field width will not be truncated and width setting applies only
for the next insertion or extraction)
The precision and setprecision manipulators control the output. If the display format is scientific or fixed, then the precision
indicates the number of digits after the decimal point. If the format is automatic (neither floating point nor fixed), then the
precision indicates the total number of significant digits. This setting remains in effect until the next change. The width and
setw manipulators control how many character spaces are reserved for outputting a value.
Here are a few example statements showing how the precision and field width can be controlled. We will examine these features
in greater detail in the following chapters.
cout.precision (10);
cout << setprecision(10); // same effect as previous line
cout.width (20);
cout << setw(20); // same effect as previous line

Formatted output is shown in the examples that follow.


Example Program 2.5.1 Data Types with Formatted Output
In this example we will look at computations involving different data types. The program is written to compute through the use
of variables of different data types, the following expressions.
integer: 100000 + 200000
long: (-64000)  (-12800)
13
float:
1 16
13
double:
1 16
The listing of the main program is shown below.

main.cpp
As before, we see examples of function prototyping using the header files iostream and iomanip (lines 11-12). The former
include file makes it possible to use the following keywords - std::cout, std::endl. The latter include file supports the use of
std::setprecision, std::setw.

S. D. Rajan, 2000-24 2-21


P R O G R A M M I N G I N C + +

The output generated by the program is shown in Fig. 2.5.1. Note the difference in the output between the float and the double
outputs. The float value beyond 6 significant digits is unreliable.

Fig. 2.5.1 Output generated by Microsoft Visual Studio for Example 2.5.1
A few points to note. Multiple statements can be typed on a single line if they are separated by semicolons – see lines 19, 24,
and 31. A complete statement can appear on more than one line – see lines 28 and 29, and 35 and 36.

Example Program 2.5.2 Math Function and Simple I/O


In this example we will look at using the inbuilt math functions. We will write a program to ask the user for an angle (in degrees)
and compute the sine of that specified angle.

S. D. Rajan, 2000-24 2-22


P R O G R A M M I N G I N C + +

main.cpp

The const qualifier before a variable signifies that the value of the variable cannot change over the course of the program. If an
attempt is made to change the value of the variable, a compiler error results. Consider the following statements.
const int NVALUES = 3; // NVALUES is declared as a const int
NVALUES = NVALUES + 3; // invalid statement. will not compile
This is a defensive programming mechanism, and one should use such a programming style. A sample output generated by the
program is shown in Fig. 2.5.2.

Fig. 2.5.2 Output generated by Microsoft Visual Studio for Example 2.5.2

2.6 Vector Variables


So far we have looked at scalar data types – data types that store one value. What if we wish to store three floating point values
representing the (x, y, z) coordinates of a point in a single variable. C++ provides vector variables for such a purpose. The
correct syntax for declaring a vector variable is as follows.
datatype variablename[integer constant];
For example, to store the coordinates of a point we have two options shown below.
float fXCoor, fYCoor, fZCoor; // 3 different scalar variables
float fVCoordinates[3]; // 1 vector variable to store 3 values
The three values in the vector can be accessed using the [] operator as follows. fVCoordinates[0] refers to the first value in
the fVCoordinates vector, and fVCoordinates[1] and fVCoordinates[2] refer to the second and the third values, respectively.
In other words, a vector of size n is accessed starting at index 0 through index n-1. This process of allocating storage space for
a vector is called static allocation. We cannot dynamically allocate the storage space using the following statements.
int n; // int variable. not an integer constant
cin >> n; // obtain value of n
float fVC[n]; // illegal statement. will not compile
We will later see how to overcome this drawback in Chapter 8.

S. D. Rajan, 2000-24 2-23


P R O G R A M M I N G I N C + +

Example Program 2.6.1 Vector Data Types


We will now write a program to obtain three integer values interactively, compute their sum, and display the three numbers and
their sum.
main.cpp

As a matter of programming style, it is desirable to declare constant integer variables than use an integer constant throughout
the program. First, it is easier to understand the significance of MAXINPUT than the number 3. Second, if the value of the constant
needs to be modified, then only one statement needs to be changed in the program as opposed to changing the integer constant
wherever it is used in the program.
A sample output generated by the execution of the program is shown in Fig. 2.6.1.

Fig. 2.6.1 Output generated by Microsoft Visual Studio for Example 2.6.1

S. D. Rajan, 2000-24 2-24


P R O G R A M M I N G I N C + +

Tip: Here is a common programming error. What will happen if we use nVNumbers[3]to access the last value in the above
computer program? The situation is unpredictable. We will see how to trap and correct such errors with arrays in Chapter 9.
We can also store a string of characters in a char vector.
char szHeader[] = "Welcome to my 4‐function calculator program";

Note that the above statement defines the variable szHeader as a character string and initializes the value of the variable to
the specified string. The compiler automatically computes the length of the vector needed to store the string. Character strings
require an additional storage space to store the string delimiter – a special character that signals the end of the character
vector. This character is ‘\0’. Hence the other term for such strings – a zero-terminated string. Consider the following example.
char szFirstName[4] = "John"; // incorrect
This is invalid since the string John needs five spaces not four. The fifth space is required to store the string delimiter. String
manipulation can become very cumbersome. In the following statements, while the variable declaration is legal, the assignment
is not.
char szMovieTitle[18];
szMovieTitle = "Lord of the Rings"; // will not compile
However, the following statements are valid.
szMovieTitle[0]="L";
szMovieTitle[1]="o";

We will use the standard string class whenever possible. The std::string class is gradually introduced in the following chapters
and is discussed in detail in Chapter 7.

2.7 Troubleshooting
To err is human, and most IDE’s provide tools to aid in finding and correcting the errors. In this section we will look at how
some of these tools can be used.
Example Program 2.7.1 Compile Errors (Example Program 2.1.1 Revisited)
It is quite frustrating as a beginning programmer to find that seemingly small issues can prevent a program from compiling or
linking or executing. In this section, we will see how to react to error messages.
main.cpp

Compiling the program yields the following error messages.

S. D. Rajan, 2000-24 2-25


P R O G R A M M I N G I N C + +

Because of a couple of simple typing errors, we see several error messages. As the first error message states, the error is in line
17 arising from a missing ; at the end of line 15. The correct statement should have been
std::cout << "Welcome to Object‐Oriented Numerical Analysis.\n";
The second and the third messages are displayed because the compiler does not understand what rwturn is. Note that the IDE
tags a potential problem with red squiggly characters under the word rwturn. Modern IDEs are extremely powerful and are
designed to assist the programmer correct potential problems before formally compiling the program.

Example Program 2.7.2 Link Errors (Example Program 2.1.1 Revisited)


In this example we will see what is meant by link errors.
main.cpp

Compiling the program yields no errors. However, when we try to link the program we get the following error messages.

The error message states that “error LNK2019: unresolved external symbol _main referenced in function "int __cdecl
invoke_main(void)”. In plain English, the linker was looking for the main program and could not find it. Once again, we have
a typo (typographic error) in the program. Line 13 should read
int main()
Syntactically there was no error in the program. We could have a function called nain in our program! However, as we saw
earlier in the chapter, every program must have one and only one main function. This is the function where the execution of the
program begins. There is ambiguity as to where the program should start its execution if this function does not exist.

S. D. Rajan, 2000-24 2-26


P R O G R A M M I N G I N C + +

Example Program 2.7.3 Logical Error (Example Program 2.5.2 Revisited)


Finally, we will see an example of a logical error. These errors are the most difficult to find and correct.
main.cpp

The above program compiles and links correctly. However, it does not produce the correct answer as shown in Fig. 2.7.1.

Fig. 2.7.1 Invalid output generated by the program


The program has a logical error. Logical errors are more difficult to detect and correct. The program has two errors. First, the
variable to store the angle, Angle, is declared an integer. It should be declared a floating-point variable. Second, the conversion
from degrees to radians is not made before the sin function is used.
Most computer systems provide an interactive debugger to help the programmer control the execution of the program and
hence locate the source of logical errors in the program. Interactive debuggers are useful only if you have a debugging strategy.
You should know what to expect from the program (the correct output) and use the debugger to find out why the program
does not compute the correct results. Creating a “hand solution” is the first step before using the debugger. For example, we
know using a calculator that sin(40.5 )  0.649448 .
Some of the features provided by interactive debuggers include the following.
 Setting breakpoints: A breakpoint is a program location (or statement). The debugger suspends program execution
when (and if) that location is reached. At this stage, the user can carry out a variety of tasks such as examining or even
changing the current values of certain key variables. If necessary, a user can set or remove one or more breakpoints
spread throughout the program.
 Execution flow control: The user can step through the program one statement at a time, step over functions, or can
continue execution till the next breakpoint is reached.
 Examine and change values of variables: The user can examine the current values of variables and if necessary, change
the values of these variables.

S. D. Rajan, 2000-24 2-27


P R O G R A M M I N G I N C + +

 Set watchpoints: The user can use this feature to find where in the program the value of a certain variable is changed
etc.
Let us set up the strategy to debug Example 2.7.3. First, we will set up a breakpoint at line 22-23. The reason is that we want to
ensure that the program reads our input correctly and stores the correct value in the variable Angle. Second, when the program
execution encounters the breakpoint, we will examine the value of Angle.
Below we present the output generated by Microsoft Visual Studio when the breakpoint is encountered. The program output
window is shown in Fig. 2.7.2. The IDE main screen is shown in Fig. 2.7.3.

Fig. 2.7.2 Program output window. Program is in suspended state of execution.


Note that the value of Angle as displayed in one of the windows is 40 not 40.5 as it should be. This should alert us to the fact
that the fractional component of the input value is being truncated. If we look at the source code, we will realize that Angle is
declared as an integer not a floating-point variable.

Fig. 2.7.3 Debugging screen. The program is in suspended state of execution.


We will make a few changes to the program and start the debugging process once again. The new version is as follows.

float Angle; // to store the angle


float SineAngle; // to store sine of the angle

// compute sin of an angle


cout << "Input an angle in degrees: ";
cin >> Angle;
SineAngle = sin(Angle);

S. D. Rajan, 2000-24 2-28


P R O G R A M M I N G I N C + +

cout << "Sine of " << Angle << " degrees is: "
<< SineAngle << '\n';

The second part of the error is a little more difficult to debug. We set a second breakpoint, and the output (at the second
breakpoint) is shown in Fig. 2.7.4. The evaluated value of sin(40.5 ) is shown as 0.334151179. What went wrong? At this stage,
we should look at the C++ documentation on the sin(x) function. The help facility for the sin function is shown in Fig. 2.7.5
and we immediately discover that the parameter x should be in radians (Angle should be expressed in radians).

Fig. 2.7.4 Debugging screen. The program is in suspended state of execution.

Fig. 2.7.5 Microsoft Visual Studio on-line help documentation on the use of sin(x)function
Remember that debugging is an art as much as it is a science.

S. D. Rajan, 2000-24 2-29


P R O G R A M M I N G I N C + +

2.8 The namespace Concept


Some readers may have noticed that while we have used the combination
using std::cout; // defined at the beginning of the file
….
cout << "What is n? ";

we have also used (see Example 2.7.3 where using std::cout is not defined)
std::cout << "What is n? ";
Both these usages are correct. In fact, there is a third usage style.
using namespace std; // defined at the beginning of the file
….
cout << "What is n? ";
cin >> n;

The using syntax does not require that keywords such as cout, cin etc. be qualified with the std:: qualifier. The prefix std::
indicates that the keywords cout, endl, setw etc. are defined inside the namespace std. The standard namespace std is defined
in the standard library and is available by inclusion of the appropriate header file (e.g. #include <iostream>). Other namespaces
can be defined and used by the programmer and the :: operator (scope operator). This means that two identical keywords with
possibly different functionalities can be used in the same program segment as long as their usage is properly defined in both
namespaces and they are properly referenced in the program that uses them.
As a matter of style, we will usually not employ the using keyword in the rest of the text and instead use the std:: qualifier
when appropriate.
Interested readers are urged to look at Example 2_8_1 for a user-defined namespace example.

S. D. Rajan, 2000-24 2-30


P R O G R A M M I N G I N C + +

Summary
In this chapter we saw how to write, compile, link, execute and debug simple C++ programs. In the following chapters we will
learn more about C++. We will then look at how to get organized and develop the program.
Below we summarize the key facts learnt in this chapter.
 C++ is divided into two parts – the core language and the standard library. Elements of the standard library are used
with the syntax std:: or through using std:: declaration at the top of the file.
 Definitions and declarations must precede usage. Look at C++ compiler having access to a program dictionary that
has three components – (a) C++ keywords and tokens, (b) functions whose prototypes are available in header files,
and (c) user-defined variables and functions. Imagine that this is a dynamic dictionary with respect to (c). In other
words, items may be added to the dictionary while the compiler is interpreting the program. If something is used in
the program that cannot be found in the dictionary, then the compiler issues an error message.
 Every program has one and only one main function. This is the location from where the execution of the program
starts.
 Comments in programs start with the pair /* and end with the pair */. Comments on a single line occur to the right
of the pair //.
 A block of statements occur within the braces { and }. As an example, the statements in the main function can be
found within the braces. A semicolon ; is used to terminate most commonly used C++ statements.
 Usually there are two types of files found in C++ programs – .cpp and .h files. Usually, the .cpp files contain the
statements that are executed, and the .h files contain the definitions and declarations. Note that the file extensions
may be different with different compilers and IDEs.
 The more commonly used standard data types are short, int, long, float, double, bool and char.
 Variables are used to store values of the standard data types. Variables are identified by variable names.
 Scalar variables store one value. Vector variables store several values.
 Functions are independent program segments that can be called from different locations in a program. Functions are
discussed later starting in Chapter 4.
 C++ provides several functions including mathematical functions that the programmer can use in his or her program.
 Expressions can be created using constants, variables, functions and operators. Expressions with mathematical
operators follow certain evaluation rules.
 There are several types of C++ statements such as assignment, input/output, etc.
 C++ provides streams (sequence of bytes) for obtaining input from devices such as a keyboard or disk, and for
outputting data to a display device, printer, or disk.

Where to go from here?


Even though by programming language comparisons C++ is an older and more mature language, the language is constantly
evolving. The International Organization for Standardization (ISO) sets the standards for the language. The initial version was
ratified in 1998 and was called C++98. Since then, there have been major revisions resulting in C++03, C++11, C++14,
C++17, and C++20 standards. C++23 is the latest version of ISO/IEC 14882 standard. The more curious reader is encouraged
to look at one of these standards to see what a specifications document looks like.
The exercise problems that follow are designed to slowly introduce the reader to generating the solution to numerical problems
that would be meaningful to engineers and scientists. Most of the programs are less than 10-20 statements and can be completed
in a few minutes.

S. D. Rajan, 2000-24 2-31


P R O G R A M M I N G I N C + +

Exercises
One of the common errors in writing the computer programs suitable for the topics discussed in this chapter is
deciding what data type should be used for the variables. Spend some time to think over the problem before deciding
the data type. For example, should the sides of a rectangle be represented by an integer or a floating-point variable?

Appetizers
Problem 2.1
Write a program to output the following pattern on the screen.
***
****
***

Problem 2.2
Write a program to interactively obtain an integer value and display (a) the negative of that number, (b) its absolute value, and
(c) its square.
Problem 2.3
Write a program to interactively obtain a floating-point value, x for each of the three following cases and display (a) the square
root of that number, (b) its absolute value, and (c) 4.5x .
Problem 2.4
For the following values: a  1.2 , b  35.6 , c  1056.78 , d  22.5 , e  153.4  , write a program to compute and display
a b a b c  b2  c 
the results of the following expressions: (a) , (b) , (c) a  0.4 , (d) sin( a )  cos   , (e)
c c  33.3 a   a  b 2 
 
tan( d )
.
tan( d )  sin( e )

Main Course
Problem 2.5
Write a program to obtain the length and width of a rectangle, and compute and display the perimeter and the area of the
rectangle.
Problem 2.6
Write a program to carry out the following conversions (a) obtain the length in in and convert it to m , (b) obtain the
temperature in  F and convert it to  C , and (c) obtain the mass in slg and convert it to kg .
Problem 2.7
Write a program to obtain the ( x , y , z ) coordinates of two points. Now compute and display (a) the distance between the two
points, and (b) a unit vector from point 1 to point 2 (display the unit vector as aiˆ  bjˆ  ckˆ ).
Problem 2.8
Write a program to obtain the following material values expressed in  in ,  F ,slg, lb  and convert them to  m , C , kg , N  - (a)
modulus of elasticity, (b) coefficient of thermal expansion, and (c) mass density.

S. D. Rajan, 2000-24 2-32


P R O G R A M M I N G I N C + +

C++ Concepts
Problem 2.9
Write a program to display the values of the function y( x )  ax 3  bx 2  cx  d for the range 10  x  15 using an
increment of 5. Obtain the values of the coefficients of the cubic polynomial from the user. Display the values as follows (one
per line).
(x, y(x)) = (x value, y value)
For example,
(x, y(x)) = (‐5.0, ‐120.3)

Problem 2.10
Summarize all the facts about C++ that (a) you have learnt from this chapter, and (b) from reference material outside this book.

S. D. Rajan, 2000-24 2-33


P R O G R A M M I N G I N C + +

S. D. Rajan, 2000-24 2-34


3
C O N T R O L S T R U C T U R E S

Chapter

Control Structures
“Whenyougettotheforkinthe road,takeit.”YogiBerra

“Ifyoudonotknowwhereyouaregoing,you’llwindupsomewhereelse.”YogiBerra

“Herearethreegreatquestionswhichinlifewehaveoverandoveragaintoanswer:Isitrightorwrong?Isittrueorfalse?
Isitbeautifulorugly?Oureducationoughttohelpustoanswerthesequestions.” JohnLubbock

“When I turned two I was really anxious, because I'd doubled my age in a year. I thought, if this keeps up, by the time I'm
sixI'llbeninety.”StevenWright

“Toclimbsteephills requiresslowpaceatfirst.” WilliamShakespeare

Now that we know how to write simple C++ programs, it is but natural to ask, “How can I write useful programs?” As we saw
in Chapter 1, computer programs manipulate information. The algorithms that are used in manipulating the information (a) test
conditions that are to be met for certain actions to take place, and (b) repeatedly execute several steps for a specified number of
times or until certain conditions are met.
Consider a simple example of a vector that contains integer values, e.g., -20, 15, 20, 55, -130. We need to develop an algorithm
that will search this vector looking for a known target value and report whether the vector contains the target or not, and if the
target is located, where in the vector the target is located. We could use the following algorithm to solve this problem.
Input: Target value, t , and the vector of integer numbers, V .
Output: The location, l where the target t exists in V .
Step 1: Find how many elements, n in the vector V .
Step 2: Loop through all the elements starting at the first location, i.e., i  1, 2,..., n .
Step 3: Compare the number V ( i ) against the target t . If they are equal, set l  i . Exit.
Step 4: Increment i  i  1 .
Step 5: If i  n , go to Step 2. Else set l  0 indicating that the target was not found. Exit.
An examination of the algorithm shows that repetition takes place in Steps 2 through 5 and that conditional tests occur in Steps
3 and 5. Conditional tests are necessary not only to find the target but also to terminate the repeated execution of the Steps 2
through 5. In the rest of the chapter, we will see how to use C++ control structures.

Objectives
 To understand the concept of control structures.
 To understand and practice selection concept.
 To understand and practice repetition concept.

S. D. Rajan, 2000-24 3-35


C O N T R O L S T R U C T U R E S

3.1 Selection
Quite often, decisions must be made in any algorithm. Some steps in an algorithm may be needed only if certain conditions are
met. The selection concept can be used to handle such a situation. C++ provides two constructs where selection is possible –
through the use of if .. else and the use of switch statements. We will look at the if .. else usage first.

if .. else syntax
The general syntax can take on several forms as shown below.
Form 1
if (expression)
statement(s) if expression is true;
Form 2
if (expression)
statement(s) if expression is true;
else
statements(s) if expression is false;
Form 3
if (expression1)
statement(s) if expression1 is true;
else if (expression2)
statements(s) if expression2 is true;
else if (expression3)
statements(s) if expression3 is true;

else
statements(s);

Form 4
if (expression1)
statement(s) if expression1 is true;
else if (expression2)
statements(s) if expression2 is true;
else if (expression3)
statements(s) if expression3 is true;

Each expression captures a condition that needs to be met and the statements that follow specify the action that needs to be
carried out if the condition is true. At most, only one of the expressions is evaluated as true and the statements associated with
the action are executed. As we can see by the different forms, the else part of the statement is optional.
When more than one statement is associated with any part of the construct, these statements must be enclosed within the {}
braces. The expression used in a selection statement must evaluate as true or false and can be made up of logical or relational
operators. C++ considers 0 to be false and a non-zero value to be true.
Relational Operators: The most commonly used relational operators are < (less than), <= (less than or equal to), > (greater
than), >= (greater than or equal to), != (not equal to), and == (equal to). Let us look at some examples.
The statements “If the score in the exam is greater than or equal to 60 then the student has passed the exam. Otherwise, the
student has failed the exam.” can be implemented as
if (nScore >= 60)
cout << "Passed the exam.";
else
cout << "Failed the exam.";

The statement “If the number of entries is not equal to zero, then the average of all the entries is the sum over the number of
entries” can be implemented as
if (nEntries != 0)
{
fAvg = fSum/nEntries;
cout << "Average of the " << nEntries <<

S. D. Rajan, 2000-24 3-36


C O N T R O L S T R U C T U R E S

" numbers is " << fAvg;


}
else
cout << "Average cannot be computed.";

The relational operators are used to compare two quantities that must be of the same data type or must be such that a suitable
conversion is available to facilitate the comparison of the two quantities. The previous example can also be written as
if (nEntries == 0)
cout << "Average cannot be computed.";
else
{
fAvg = fSum/nEntries;
cout << "Average of the " << nEntries <<
" numbers is " << fAvg;
}

Tip: It is a common programming error to use the assignment operator = instead of using equality operator == in a selection
statement.
For example,
if (nA == 5)
is not the same as
if (nA = 5)
However, both the statements will compile. The danger is that the second form will always evaluate as true, an unintended
consequence and the value of the variable will be set as 5.
Logical Operators: More complex selections can be made by using logical operators. The most commonly used operators are
|| (OR) and && (AND). In a typical usage we have a compound expression in the following forms
if (expression1 || expression2) …
if (expression1 && expression2) …
where the selection is made up of two expressions and involves either OR or AND operators. The final evaluation of such an
expression is shown below.

Expression 1 Operator Expression 2 Evalutes to


true and true true
false and true false
true and false false
false and false false
true or true true
true or false true
false or true true
false or false false

The statement “The product of two numbers is positive if both numbers are positive or both the numbers are negative” can be
implemented as
if ((nA > 0 && nB > 0) || (nA < 0 && nB < 0))
cout << "Product is positive.";
else
cout << "Product is negative.";

The statement “A legal value of student GPA is between 0.0 and 4.0 both inclusive” can be implemented as
if (fGPA >= 0.0 && fGPA <= 4.0)
cout << "Valid GPA value.";
else
cout << "Invalid GPA value.";

S. D. Rajan, 2000-24 3-37


C O N T R O L S T R U C T U R E S

The statement “If the score in the math portion of the exam is greater than 90 or if the score in the language portion of the
exam is greater than 90, then the student is a gifted student” can be implemented as
if (nScoreMath > 90 || nScoreLanguage > 90)
cout << "Student is a gifted student.";

Example Program 3.1.1 Selection


We will now look at a more detailed example involving selection.
Problem Statement: Obtain the age of a person. Based on the age, classify the status of the person as follows and print the
appropriate message.
Age What to print
Between 0 and Person is a child.
10
Between 11 and Person is a juvenile.
18
Between 19 and Person is an adult.
59
Greater than or Person is a senior citizen.
equal to 60

main.cpp

In the example, all possible paths arising from the if … else construct, are enclosed within the braces {}. We encourage this
usage since it makes the program easier to read and makes adding new statements less error-prone. After the age variable nAge
is initialized in statement 17, the program executes sequentially. Assume that the value of nAge is 53. Execution starts at line 19,
and the expression is evaluated as false. The next statement that is executed is 23 and the expression is evaluated as false. The

S. D. Rajan, 2000-24 3-38


C O N T R O L S T R U C T U R E S

same situation arises with statement 27. Finally, statement 31 evaluates as true and the associated action in line 33 is executed.
The last statement that is executed in the program is line 41.
As we will see throughout the text, there is no unique way of structuring a program. We can rewrite the previous example in
the following two forms.
Alternate Form 1
if (nAge >= 60)
cout << "Person " << nAge << " years old is a senior citizen.\n";
else if (nAge >= 19 && nAge <= 59)
cout << "Person " << nAge << " years old is an adult.\n";
else if (nAge >= 11 && nAge <= 18)
cout << "Person " << nAge << " years old is a juvenile.\n";
else if (nAge >= 0 && nAge <= 10)
cout << "Person " << nAge << " years old is a child.\n";
else if (nAge < 0)
cout << "Invalid age " << nAge << ".\n";

Alternate Form 2
if (nAge < 0)
cout << "Invalid age " << nAge << ".\n";
if (nAge >= 0 && nAge <= 10)
cout << "Person " << nAge << " years old is a child.\n";
if (nAge >= 11 && nAge <= 18)
cout << "Person " << nAge << " years old is a juvenile.\n";
if (nAge >= 19 && nAge <= 59)
cout << "Person " << nAge << " years old is an adult.\n";
if (nAge >= 60)
cout << "Person " << nAge << " years old is a senior citizen.\n";

Alternate Form 1 is easy to read (however it does not use the {} rule!). Alternate Form 2 may be easy to read but is inefficient
since the conditions with all the if statements are tested and is also error-prone. While a usage such as
if (nA) …
is valid, we will avoid such usage and explicitly state our objective, e.g.
if (nA != 0) …

Nested if statements
Finally, a word about nested if statements. Consider the following statements that do not produce the desired result.
float fX = 4.15f, fY = 2.14f;
….
// not written correctly
if (fX <= fY)
if (fX < sqrt(20.0))
cout << "X is less than square root of 20.";
else
cout << "X is greater than Y.";

Note that every else is associated with the nearest if. The mere fact that statements are indented does not guarantee that this
association is made. The correct way to write the statements is as follows.
float fX = 4.15f, fY = 2.14f;
….
// corrected version
if (fX <= fY)
{
if (fX < sqrt(20.0))
{
cout << "X is less than square root of 20.";
}
}
else

S. D. Rajan, 2000-24 3-39


C O N T R O L S T R U C T U R E S

{
cout << "X is greater than Y.";
}

We strongly recommend that you use the {} braces to identify the block of statement(s) associated with the if and the else
parts of the statement.
Conditional operator ?:
C++ provides a succinct way to take care of a specialized form of selection. For example, consider the following statements.
if (fX > fY)
fA = 2.1*fB;
else
fA = 0.5*fA;

The C++ conditional operator ?: can be used instead of the above statements as follows.
fA = (fX > fY? 2.1*fB : 0.5*fA);
Note that three operands are involved. The first operand captures the selection condition. If this condition is true, then the
second operand is executed. If the condition is false, the third operand is executed. Here are a couple more examples.
fX > fY? fA=2.1*fB : fA=0.5*fA;
std::cout << (fX > fY? fA : fB);

3.2 Repetition
As we saw in the introductory section of this chapter, sometimes one or more statements need to be executed repeatedly until
a certain condition is satisfied. The block of statements is in a loop. C++ provides three commonly used loop constructs –
while, do ..while and for statements.

while statement
The general syntax of the while statement is as follows.
while (test condition)
{
statement(s)
}
The statements within the {} are executed if the test condition expression evaluates to true. The test condition (or expression) is
tested at the beginning of the block. The statements in the block are executed only if the expression evaluates to true. Hence it
is possible that the statements in the block do not execute even once because the expression is not true.
Example Program 3.2.1 Repetition using the while Statement
We will illustrate the usage of the while statement with an example.
N
Problem Statement: Write a program to compute the sum of the first N integers, i.e. i .
i 1

main.cpp

S. D. Rajan, 2000-24 3-40


C O N T R O L S T R U C T U R E S

Let’s look at the strategy used to drive the while loop. A loop counter or index, i, is used to keep track of how many times the
statements in the block need to be executed. This loop counter is initialized to 1 in line 21. If the loop counter is not initialized,
the test condition cannot be evaluated correctly. The variable to store the sum, nSum, is initialized to zero in line 20. The test
condition compares the value of i to nN. The two statements in the block, lines 24 and 25, execute as long as i is less than or
equal to nN. Line 25 is used to increase the value of the loop counter. If this statement is omitted, the statements in the block
will continue to execute forever – an infinite loop. The test condition can be complex – we saw how complex expressions can
be constructed using relational and logical operators in the previous section.

do .. while statement
The general syntax of the do..while statement is as follows.
do
{
statement(s)
} while (test condition);
The statements within the {} are executed as long as the test condition expression remains true. However, unlike the while
statement, the test condition is evaluated at the end of the block of statements not at the beginning of the block of statements.
Hence the statements in the block will be executed at least once. We will illustrate the usage of the do ..while statement with
an example.
Example Program 3.2.2 Repetition using the do ..while Statement
N
Problem Statement: Write a program to compute the sum of the first N integers, i.e. i .
i 1

main.cpp

S. D. Rajan, 2000-24 3-41


C O N T R O L S T R U C T U R E S

The loop counter is declared and initialized in line 21. In line 26, the loop counter is used in the test condition. Similarly, the
sum is defined and initialized in line 20 and is updated in line 24.

Increment and Decrement Operators


C++ provides increment and decrement operators as follows.
++ Increment
‐‐ Decrement
These unary operators are used in conjunction with a variable as the operand. When these variables are integers or floating-
point numbers, a 1 is added to or subtracted from the operand. Consider the following examples where the operators are used.
int nP = 12; // nP is initialized to 12
nP++; // nP is now 13. same as nP = nP + 1;

int nP = 44; // nP is initialized to 44


nP‐‐; // nP is now 43. same as nP = nP – 1;

int nP = 44; // nP is initialized to 44


‐‐nP; // nP is now 43. same as nP = nP – 1;

When the operators are used before the operand they are known as prefix operators as in ‐‐nP and as postfix operators if they
are used after the operand as in nP++. One must be careful in using these operators. Consider the case where we have a vector
nVA of length 3 containing three values as 11, 65 and 70. The correct statements are as follows.
int nVA[3];
int nP = 0; // nP is 0
nVA[nP++] = 11; // nP is 1. nVA[0] = 11
nVA[nP++] = 65; // nP is 2. nVA[1] = 65
nVA[nP] = 70; // nP is still 2. nVA[2] = 70

The following statements will not work.


int nVA[3];
int nP = 0; // nP is 0
nVA[++nP] = 11; // nP is 1. nVA[1] = 11
nVA[++nP] = 65; // nP is 2. nVA[2] = 65
nVA[nP] = 70; // nP is still 2. nVA[2] = 70

The difference is that when nP++ is executed, nP is incremented after the value of nP is used in the expression whereas ++nP
means increment nP first and then use nP in the expression.
In the above examples, the value of the variable associated with the increment and decrement is changed by one. Finally, it
should be noted that the unary increment and decrement operators can be used in more sophisticated contexts as we will see
later in the text.

S. D. Rajan, 2000-24 3-42


C O N T R O L S T R U C T U R E S

for statement
The general syntax of the for loop statement is as follows.
for (initialization; test condition; update)
{
statement(s)
}
The basic idea is to repeatedly execute the statement(s) in sequence as long as the test condition is true. The initialization
part is used to initialize the values of the loop control variable and if necessary, other variables. The test condition must
evaluate to true for the statements to execute. The update part is executed at the end of the loop and is typically used to change
the value of the loop control variable.
Example Program 3.2.3 Repetition using the for Statement
N
Problem Statement: Write a program to compute the sum of the first N integers, i.e. i .
i 1

main.cpp

Note the initialization that takes place in line 20 and the body of the for loop between lines 21 and 24. As we will see next, it is
possible to carry out the initialization of the sum in the initialization part of the for loop.

Comma Operator
The comma operator , is often used in conjunction with some of the statements that we have seen before. The operator is used
to separate a list of expressions and returns the value of the last expression that is evaluated. Consider the following statement.
nSum = (nX = 2, nY = nX + 3);

After the statement is executed, the value of nX is 2, the value of nY is 5 and finally, the value of nSum is 5 since the last expression
that is evaluated is nX + 3.
An appropriate location to use the comma operator is in the initialization section of the for statement. For example, we could
rewrite the for loop in Example 3.2.3 as follows.
for (nSum = 0, i=1; i <= nN; i++)
{

S. D. Rajan, 2000-24 3-43


C O N T R O L S T R U C T U R E S

nSum += i;
}

In the initialization part, nSum is set to zero and i is set to 1. The execution of the for loop then begins.

Nested for Loops


Finally, it is possible to have a nested form of for loops. Consider the following statements.
int nCount = 0;
for (i=1; i <= n; i++)
{
for (j=1; j <= m; j++)
{
nCount++;
}
}
cout << "Value of nCount is : " << nCount << "\n";

The two loops controlled by the loop indices i and j execute the sole statement nCount++. If n=10 and m=5, then the value of
nCount after the two loops is 50. One should be careful in making sure that the test condition is met at some stage of the loop
execution; otherwise, the loops will be stuck in an infinite loop. Consider the following statements.
n=50; fSum = 0.0f;
for (i=1; i <= n; i++)
{
fSum += static_cast<float>(pow(i,2.0));
if (i > n/2) i=1; // dangerous
}

The execution will be stuck in the for loop since the value of i will always be less than n. Consider the following statements.
n=m=10; fSum = 0.0f;
for (i=1; i <= n; i++)
{
for (j=1; j <= m; i++) // dangerous
{
fSum += static_cast<float>(i+j);
}
}
Once again, the execution is stuck in the inner for loop since the value of j remains at 1!
Tip: It is a common programming error to get stuck in infinite loops because the termination condition is never met or because
of typing the incorrect loop index variable name or because of misplaced semicolons!

3.3 Other Control Statements


There are other mechanisms to control the execution flow in a C++ program.

break syntax
The general syntax of the statement is
break;
Placing this statement at an appropriate location ensures that the control (or program flow) is immediately shifted to outside
the innermost loop that the break statement is associated with.
Consider the following example. We wish to compute the sum of the ages of all the students in a class. We will assume that we
do not know (or have not been told) how many students are in the class. The user is expected to input the ages of the students
one at a time. When there are no more students, the user will input the age as zero or a negative number. We will first write the
relevant statements using the do ..while statement.
int nAge; // to store the student’s age
int nSum = 0; // to store the sum of the ages

S. D. Rajan, 2000-24 3-44


C O N T R O L S T R U C T U R E S

do
{
cout << "Enter the age (zero or negative to end): ";
cin >> nAge;
if (nAge > 0)
nSum += nAge;
} while (nAge > 0);
cout << "The sum of the ages is " << nSum << ".\n";

We will rewrite the program segment using the break statement.


int nAge; // to store the student’s age
int nSum = 0; // to store the sum of the ages
for (;;) // an infinite loop!
{
cout << "Enter the age (zero or negative to end): ";
cin >> nAge;
if (nAge <= 0)
break;
nSum += nAge;
}
cout << "The sum of the ages is " << nSum << ".\n";

The for statement with nothing specified for its three components mimics an infinite loop. The break statement makes it
possible to exit the loop.

continue syntax
The continue statement is closely associated with the break statement. The general syntax of the statement is
continue;
Placing this statement at an appropriate location ensures that the statements between the continue statement and the end of
the nearest loop are not executed.
We will rewrite the example shown earlier with the break statement now using the continue statement.
int nAge; // to store the student’s age
int nSum = 0; // to store the sum of the ages
do
{
cout << "Enter the age (zero or negative to end): ";
cin >> nAge;
if (nAge <= 0)
continue;
nSum += nAge;
} while (nAge > 0);
cout << "The sum of the ages is " << nSum << ".\n";

switch syntax
One could look at the switch statement as a specialized case of if … else statement. The statement provides different execution
paths based on the value of an expression that evaluates to an integer. The syntax of the statement is as follows.
switch (integral expression)
{
case ConstantExpression1:
statement(s)
break;
case ConstantExpression2:
statement(s)
break;

S. D. Rajan, 2000-24 3-45


C O N T R O L S T R U C T U R E S

default:
statement(s)
break;
}
The entire block of statements are enclosed between the braces { …}. Each of the different execution paths begins with the
keyword case that is followed by a unique constant integer expression and the colon symbol. The break keyword at the end of
an execution path is optional. If the break keyword is used, then the subsequent statements are not executed, and the control is
transferred to the first statement after the switch block. The keyword default is optional and the statements in that subblock
are executed only if the integral expression is not equal to any one of the constant expressions associated with the different case
labels.
A few things to note about the switch statement. The switch expression and all the expressions associated with the case labels
must be of the same data type. Recall that the primitive data types – int, short, long, and char, are all of the integral data
type. In addition, these expressions must be constants.
Let us now look at an example. We will write a program segment to compute the grade point average (GPA) of a student’s
grades in a semester. We will assume that we have a 4-point grading system with all the courses carrying equal credits.
char cGrade; // to store the student’s grade
int nCourses = 0; // to store the total number of courses
float fGPA = 0.0; // to store the GPA
do
{
cout << "Enter the Grade (Type S to end): ";
cin >> cGrade;
switch (cGrade)
{
case 'A': nCourses++;
fGPA += 4.0;
break;
case 'B': nCourses++;
fGPA += 3.0;
break;
case 'C': nCourses++;
fGPA += 2.0;
break;
case 'D': nCourses++;
fGPA += 1.0;
break;
case 'E': nCourses++;
fGPA += 0.0;
break;
case 'S':
break;
default:
cout << “Invalid grade.\n”;
break;
}

} while (cGrade != 'S');


if (nCourses != 0)
fGPA /= nCourses;
cout << "Student has taken " << nCourses << " and the GPA is "
<< fGPA << ".\n";

goto syntax
The general syntax of the goto statement is as follows.
goto label;

label: statement(s);

S. D. Rajan, 2000-24 3-46


C O N T R O L S T R U C T U R E S

Programmers have a love-hate relationship with the goto statement. Unrestricted use of the goto statement can lead to programs
that are very difficult to read and maintain. C++ allows the goto statement to be used within a function (we will see what
functions are in the next chapter). The execution control is transferred from the point where the goto is used to the location
where the target label is used in the goto statement. The label is an (unique) identifier and is followed by a colon. We will now
rewrite the program segment used earlier in illustrating the break and continue statements using the goto statement.
int nAge; // to store the student’s age
int nSum = 0; // to store the sum of the ages
begin:
cout << "Enter the age (zero or negative to end): ";
cin >> nAge;
if (nAge > 0)
{
nSum += nAge;
goto begin;
}
cout << "The sum of the ages is " << nSum << ".\n";

Note that the label can appear in any location in the program – before or after it is referenced in any statement in the program.
For the remainder of this chapter, we will look at several examples illustrating more selection and repetition concepts,
programming styles, algorithm development, etc.

Example Program 3.3.1 Using Control Statements for Computing Student GPA
The next example completes the problem of computing the student GPA discussed earlier in this chapter.
Program Statement: Develop a program to compute the semester GPA and the cumulative GPA of a student taking courses in a
four-point grading system. The program should ask the user to input the (a) current GPA and the number of semester hours
taken so far, and (b) the grade and the semester hours for each course taken this semester. It should display as a final report the
GPA and the semester hours before the current semester and for the current semester, and the cumulative GPA and semester
hours.
Solution: We will first develop the terminology and then the algorithm to solve the given problem. We will use the term
cumulative to signify the time period prior to the current semester, semester to signify the current semester and new to signify
the state at the end of the current semester. The equation to compute the GPA is
n

gs i i
GPA  i 1
n
(3.3.1)
 si
i 1

where n is the total number of courses, s i is the number of semester hours (usually between 1 and 4) for the ith course and g i
is the numerical grade (A=4, B=3, C=2, D=1, E=0) for the ith course. The above equation can also be written as
ncum nsem

g s g
i 1
i i
j 1
j sj
GPA  ncum nsem
(3.3.2)
 si   s j
i 1 j 1

where ncum is the (cumulative) number of courses taken so far and nsem is the number of courses taken this semester. Note that
ncum ncum

 g s  GPA   s
i 1
i i cum
i 1
i (3.3.3)

where  GPA cum is the cumulative GPA.


Now to the algorithm.
1. Read the cumulative GPA and semester hours. Are these values valid?
2. Ask the user for the course grade. Loop starts here.

S. D. Rajan, 2000-24 3-47


C O N T R O L S T R U C T U R E S

3. If the grade is S exit this loop. Is the grade valid? If yes, ask the user for the number of semester hours for the course.
Is the value valid?
nsem nsem
4. Track or update s
i 1
i and g
j 1
j sj .

5. End loop.
6. Use Eqns. (3.3.2) and (3.3.3) to compute the required quantities.
7. Display the results.
The developed program is closely (but not entirely faithful) based on the above algorithm and is shown below.
main.cpp

S. D. Rajan, 2000-24 3-48


C O N T R O L S T R U C T U R E S

A few things to note about the program. First, the error checks are minimal. The TODO section identifies what needs to be
done. This is left as an exercise for the reader. Second, the program shows how different data types can be handled appropriately
through a process of type casting or explicit data conversion. While GPA can have a fractional component, the number of
semester hours is an integer. As we saw in Chapter 2, the result of purely integer arithmetic contains no fractional component
– 4/3=1 as is 5/3=1. We look at this issue in sufficient detail in the last section the chapter.

We will now look at another example involving several control statements.


Example Program 3.3.2 Using Control Statements for Data Analysis
Program Statement: Develop a program to analyze data obtained from a simple Tension Test experiment. The test is carried out
to construct what is known as a Load-Deflection diagram. For most materials that are subjected to this test, the initial portion
of the graph is linear before the response becomes nonlinear. It is assumed that a maximum to 10 pairs of load-deflection values
will be obtained from the test with the starting pair assumed to be (0,0). The load values are assumed to be in the ascending
order. The entire graph is to be approximated as piecewise linear segments, i.e. the points on the graph will be connected to
each other via straight lines (see Fig. 3.3.1). The slope of the line is to be monitored to see when (load value) the graph becomes

S. D. Rajan, 2000-24 3-49


C O N T R O L S T R U C T U R E S

nonlinear. Display a table showing the load-deflection pairs and the slope of each segment. Also display the load value at which
the response becomes nonlinear.
Load-Deflection Diagram

0.45
0.4
0.35
0.3
0.25

Load
0.2
0.15
0.1
0.05
0
0 20 40 60 80
Deflection

Fig. 3.3.1 A sample load-deflection diagram


Solution: This problem requires that we use the vector data type to store the load values, deflection values and slope values.
The slope, s of a straight line connecting points  x 1 , y1  and  x 2 , y 2  is given as
y 2  y1
s (3.3.4)
x 2  x1
Now to the algorithm.
1. Initialize all variables.
2. Ask the user for the load and deflection values. The loop starts here.
3. If the load value is zero, exit the loop.
4. Are the values valid? Are the load values in ascending order? Is the load value greater than the previous value?
5. Store the load-deflection pair in the appropriate vectors.
6. Have we reached the limit on the number of pairs of data (10)? If yes, go to step 7. Otherwise go to step 2.
7. Loop through all segments. Number of segments is one less than the number of points.
8. Compute the slope of the line in the segment using Eqn. (3.3.4).
9. End loop.
10. Loop through all segments but the last one.
11. Compare the slope in the current segment with the slope in the next segment. If the slopes are not equal within a
specified tolerance, the response has become nonlinear. The load at the end point of the current segment is the load
at which the response has become nonlinear.
12. Display the output.
The developed program based on the above algorithm is shown below.
main.cpp

S. D. Rajan, 2000-24 3-50


C O N T R O L S T R U C T U R E S

A total of no more than 11 data points is assumed and the memory to store the values are allocated at compile time. There is
memory wastage if only 5 data points are defined. The program cannot execute if more than 11 data points need to be defined.
This is one of the problems with static memory allocation that we will address later in the book. Lines 37 through 62 are used
to obtain the input from the user.

S. D. Rajan, 2000-24 3-51


C O N T R O L S T R U C T U R E S

A few things to note about the program.


(1) First is the definition and usage of vector data types. The primitive data types – int, float, double and char are easy to
define and use.
datatype variablename[const int value];
Since character strings is not a primitive data type, we use the standard string class when it is required to define and use a vector
of character strings. A string variable can store a character string of unknown or variable length. For example,
string strName;
string strDate ("Dec 12, 2001");
are example declarations of string variables. strName initially contains no characters whereas strDate contains the string
(including blanks or spaces) defined between the " " symbols. If later in the program, the name is to be used or defined, one
can use strName in several different ways as shown below.
strName = "John Doe";
strName = " John " + " Doe ";
std::cin >> strName;
std::cout << "Name is: " << strName;
The above statements show that the standard string variable behaves very similar to the primitive data type. We will learn more
about the standard string class throughout the text. Once again, note that when the vector variable is defined to contain n
elements, the indexing to access these elements starts at 0 and ends at n-1. For example, with the following declarations
string strVDays[7];
float fVX[3];
it would be illegal (the program will compile without any error messages!) to have the following statements:
strVDays[7] = "Saturday";
fVX[3] = 10.45;
The results or behavior are unpredictable if these statements are executed in the program. When vectors are initialized as in line
14, it is not necessary to explicitly state the size of the vector. The compiler can figure out the required size.
(2) MAXVALUES is defined as 11 not 10 since the first pair of values is assumed to be (0,0) and will be stored at [0] locations of the
load and deflection data vectors.
(3) The nPoint variable is used to keep track of the current point on the load-deflection diagram. It is initialized to zero in line
32, and the load and deflection are initialized at the first location to zero in lines 33 and 34. Note how nPoint value is updated
as nPoint++ (nPoint becomes 1) in line 34 so that the load value can be correctly stored in line 55. This strategy is used once
again in line 56.

S. D. Rajan, 2000-24 3-52


C O N T R O L S T R U C T U R E S

(4) Logical expressions are used to check the validity of the user input in line 47 and line 50.
(5) The break statement is used to exit the loop in line 45 if the user input for the load value is 0 and in line 60 if ten values are
defined by the user.
(6) Eqn. (3.3.4) is implemented in lines 68 and 69. A safety check is not implemented to avoid a divide by zero error. This can
be easily done and is left as an exercise.
(7) Step 11 of the algorithm is implemented in the loop in lines 73 through 82. A tolerance value of 10 3 is used in checking
whether slopes of adjacent segments are equal or not. The use of fabs math function requires the use of cmath header file. Once
again, the break statement (line 80) is used to exit the loop.
(8) Finally, let us look at the statements to format and display a table – a number of columns with a header and a fixed (column)
width where data are displayed in each row. The three column headers (“Load”, “Deflection” and “Slope”) are defined in lines
87-88. We will left-justify the entries in each column via the std::setiosflag(std::ios::left) statement. Next, we will assume
that a field width of 7 is adequate to represent the load values. The column width is chosen as the larger of two numbers – the
number of digits required to represent the value and the number of characters in the column header. The << operator formats
the floating value so that the display is as compact and efficient as possible. Hence with a field width of 7 we should be able to
display (positive) values between 0 and 999999 including values as 0.15, 0.16667, 3000.12 etc. A field width of 10 is used with
the deflection values since the column header Deflection contains 10 characters. The same logic applies in formatting the last
column containing slope values. We will see more about formatted output in the last section of this chapter.

The following example is a precursor to programs involving numerical analysis and solution techniques.
Example Program 3.3.3 Exhaustive Search or Trial-and-error
Program Statement: The coefficient of restitution, c is a measure of the elasticity of the collision between two objects one of which
is usually at rest. For example, if a ball is dropped from a height H onto a floor and is observed to bounce to a height h , the
coefficient of restitution can be computed as
h
c (3.3.5)
H
Clearly if conservation of energy principle is followed 0  c  1.0 .
A ball is dropped from a height of 3.5 m onto a floor. The coefficient of restitution between the ball and floor is 0.9. Compute
how many bounces occur before the ball bounces to a height as close to 1 m as possible.
Solution: Numerical solution is an attractive approach if the analytical solution is difficult to compute. However, in this problem,
even though we know the analytical solution, we will use trial-and-error approach (or exhaustive search) to find the solution.
The motivation is to gain confidence in the development of the trial-and-error approach by comparing the trial-and-error
solution to the analytical solution.
The basic idea is to increment i , compute the new height hi and compare the new height to the target height, htarget that is 1
m. We will keep track of the difference between the computed height and the target height
hdiff  hi  htarget
The difference should decrease with increasing i , and then start increasing. Only under certain set of values will this difference
be zero or nearly zero – it is unlikely that the bounced height will be exactly the target height.
The analytical solution to this problem can be found as follows. Let the height to which the ball bounces after every subsequent
bounce be denoted as h1 , h2 , h3 ,... . Note that
h1  c 2 H h2  c 2 h1  c 4 H hi  c 2i H (3.3.5a)
from which one could solve for i as

 2i  log  c   log 
hi 

H

S. D. Rajan, 2000-24 3-53


C O N T R O L S T R U C T U R E S

log  hi H 
or, i (3.3.6)
2 log  c 
Algorithm: Here is the developed algorithm. As an added safety, we keep track of the number of iterations (or bounces). If the
number exceeds a predefined maximum number, we exit the loop. This is one more way we can avoid getting stuck in an infinite
loop.
1. Obtain user input for initial height, coefficient of restitution and the target height. Set  hi new   hi old  H ,
h 
diff new
  hdiff  old
 H , and i  0 .
2. Loop to compute the new height.
3. Increment i . If i  Max iterations , exit the loop and print an error message.
4. Compute  hi new  c 2  hi new , and  hdiff    hi new  htarget .
new

5. If  hdiff    hdiff  we have found the solution. Set  hi new   hi old , i  i  1 and exit the loop.
new old

6. Set  hdiff    hdiff  and  hi old   hi new .


old new

7. Go to step 2.
8. Print the results. The number of bounces is i and the bounced height closest to the target height is  hi new .
main.cpp

Line 15 establishes the upper bound on the iterative algorithm. It is a good idea to use an upper bound so that in case the
program is unable to find a solution in a reasonable number of iterations, the iterative loop can be exited. The user input is

S. D. Rajan, 2000-24 3-54


C O N T R O L S T R U C T U R E S

obtained in lines 28 through 33. No error checking is done, and the reader is encouraged to modify the program and carry out
checks to ensure that the input values are physically possible, e.g. initial height is positive, target height is less than the initial
height, etc.

Steps 2 through 7 in the algorithm are implemented in lines 40 through 71. Initialization of the float variables that store the
old and the new heights is made in lines 36 and 37. The math library functions pow and fabs are overloaded functions. In other
words, the argument to the functions can have different data types and the C++ compiler at compile time is able to ascertain
the data type of the argument and call the appropriate version of the pow and fabs functions. Otherwise type casting will have
to be used to convert double values to float values via the static_cast<float> expression. We will learn more about overloaded
functions in Chapter 4.
Two checks are made to exit the infinite loop – in line 49 and in line 65. There are several challenges to writing a general-purpose
computer program. One of the challenges is error detection so that the program is not carrying out calculations that are never
ending. The reader is encouraged to specify the input that would lead to the maximum number of iterations (currently 1000)
being exceeded.

S. D. Rajan, 2000-24 3-55


C O N T R O L S T R U C T U R E S

The program follows the algorithm except for statements 73 through 85 where we compute the analytical solution. Eqn. (3.3.6)
is implemented in lines 75-76. However, since the number of bounces is an integer, we use the ceil function to find the next
higher integer (nUpper). To find the point that is the closest to the target height, we also use the next lowest bounce (nLower).
We use these two integer values to compute the height as per Eqn. (3.3.5a), and then find the bounce that is closest to the target
height.

3.4 Tying Loose Ends


Before we finish the chapter, we will look at a few of C++ features that will enable us to write better programs.
enumerated types
C++ provides for a user-defined type called enumeration. The general syntax is as follows.
enum enumeration_type {list of integer constants};

Let us consider the example of the different types of polygons. We could define the enumeration type as follows.
enum Polygons {TRIANGLE=1, QUADRILATERAL, PENTAGON, HEXAGON, HEPTAGON,
OCTAGON, NONAGON, DECAGON};
Polygons now is treated as a data type just as int, float etc. The compiler assigns integer values such that TRIANGLE has a value
of 1, QUADRILATERAL has a value of 2, PENTAGON has a value 3, and so on. Note that by default, if TRIANGLE is not assigned a value,
it is taken as a zero.
Here is an example usage of this enumeration type in a program following the above definition.
Polygons PolyType; // variable to store the polygon type
std::string strPolyType;

// get the polygon type from the user


std::cout << "Polygon shape (lower case)? ";
std::cin >> strPolyType;

S. D. Rajan, 2000-24 3-56


C O N T R O L S T R U C T U R E S

if (strPolyType == "triangle")
PolyType = TRIANGLE;
else if (strPolyType == "square" || strPolyType == "rectangle")
PolyType = QUADRILATERAL;
else
std::cout << "Unsupported polygon type.\n";

Because of the usage of the enumeration type, the program is more readable than using simple integer constants to differentiate
the types of polygons.
Type Coercion and Casting
How do we handle expressions where different data types are involved? In line 46, the variable fSemGPA on the left side of the
assignment (symbol) is a floating value. On the right side of the assignment, we have a floating point constant, 4.0, and an
integer variable, nCourseSemHrs. In order to ensure that the computations are done correctly so that fractional components are
preserved, we need to convert (or promote) the integer variable nCourseSemHrs. This is done three different ways. The keyword
float can be used as a qualifier as
float (expression)
or (float) expression
The preferred style is to use the unary cast operator
static_cast<float>(expression)
The value of the expression in each case is stored temporarily as a floating point number that is then used in the subsequent
computations. Note that we could have written statement 46 as
fSemGPA += static_cast<float>(4*nCourseSemHrs);
Here are some rules governing type coercion when dealing with arithmetic and relational expressions and assignment operations.
Note that promotion or widening of a data type involves conversion of a value from a “lower” type to a “higher” type according
to a programming language’s precedence of data types.
(1) Step 1: Each char, short, bool, or enumeration value is promoted to int. If both operands are now int, the result
is an int expression.
(2) Step 2: If Step 1 still leaves a mixed type expression, the following precedence of types is used:
Lowest  highest
int, unsigned int, long, unsigned long, float, double, long double
The value of the operand of the lower type is promoted to that of the higher type, and the result is an expression of
that type.
Consider the following expression (nValue + 4.0) where nValue is defined as an int. Step 1 indicates that this expression is a
mixed expression. Hence, nValue is temporarily coerced to a double, and the entire expression is evaluated as a double.
Similarly, in relational expressions of the form nValue1 > fValue1, the value of nValue1 is temporarily coerced to a float before
the comparison takes place.
Formatted Output
C++ provides the programmer with sufficient controls to tailor or format the output to a file or the screen. In Section 2.5 we
looked at the very basics of formatted output using precision, setprecision, width and setw commands. In this section, we
will see a few more ways of controlling the output format. Every output stream has a member function called setf that can be
used to specify the type of desired output. In conjunction with the member constants in the ios class, various operations can
be carried out. Note that the ios class is described (hence included) in the <iostream> and <fstream> header files.
Table 3.4.1 Formatting with setf
Flag Effects of setting the flag Default
ios::showpos Shows a plus sign before positive integers. Not active
ios::showpoint For floating point numbers, the decimal point and the trailing zeros are shown. Not active
If this flag is not set, a floating point number without a fractional component
may not have the decimal point and the trailing zeros displayed.

S. D. Rajan, 2000-24 3-57


C O N T R O L S T R U C T U R E S

ios::fixed The scientific notation (using e) is not used. Instead the floating-point numbers Not active
are output completely. This unsets the ios::scientific flag.
ios::uppercase An uppercase E is used instead of the lowercase e in scientific notation. Not active
ios::scientific Floating point numbers are written in scientific notation using the e symbol. Not active
ios::right If the field width is specified (using the width function or setw manipulator), the Default
next item output is right justified.
ios::left If the field width is specified (using the width function or setw manipulator), the Not active
next item output is left justified.
ios::resetiosflags Clears the flag so that a new setting can be specified. For example, if the current
output is left justified, then to have right justification be applicable, use
resetiosflags(ios::adjustfield)before using ios::right.

The flags invoked using the setf function can be removed using the unsetf function as we will see in the following example.
Example Program 3.4.1 C++ Stream Input/Output
Program Statement: Develop a simple program to understand how formatting statements works with floating point numbers.
Solution: We will develop an interactive program in which the user will be asked to input a floating-point number and the output
precision. The number will be read in and then displayed on the screen in various output styles. The program will run in a
continuous loop with the program termination possible by typing in CTRL-C.
The resulting program is shown below.
main.cpp

S. D. Rajan, 2000-24 3-58


C O N T R O L S T R U C T U R E S

The program enters an infinite loop in line 16. The user input is obtained in line 20. The input number is displayed in four
different styles. First, the default style is displayed. This style may be compiler dependent. We use the user-specified precision
to control how many digits after the decimal point are displayed. In the first user-defined style, we set the style as not showing
the value in the exponent (scientific) form and always showing the decimal point as well as the trailing zeros. This is carried out
in line 27-29. Using the bitwise | operator, we can carry out in one statement what can also be done in two statements. In other
words, line 20 is equivalent to the following two statements
std::cout.setf (std::ios::fixed);
std::cout.setf (std::ios::showpoint);
The bitwise operator can be used to stack as many different types of output specifications as required. Once a style (or flag) is
set, it remains in effect until it is reset or unset. In the second user-defined style, in addition to the existing style, we specify that
the + or the – sign be displayed before the number. This is specified in line 32. Finally, in the third user-defined style, we want
to display the numbers in the scientific notation (line 37). Line 36 is necessary to unset the current style – fixed or non-exponent
way of displaying float values. Finally, the default settings are restored in lines 41-42 by unsetting the existing flags, and restoring
the default floating point settings in line 43.
A sample output from the program in shown in Fig. 3.4.1.

Fig. 3.4.1 Sample output

Example Program 3.4.2 Creating a Formatted Table


Program Statement: Obtain not more than 10 floating point numbers from the user. Display the numbers on the screen as a two-
column table – the first column should show the index and the second column the value of the floating point numbers.
Solution: The program development is quite straightforward. We will define a float vector of size 10. The user will be prompted
to input the number of input values. This will be followed by the user inputting one value at a time. Once all the values are read,
we will output the table as per the specifications – two columns with the first column showing the index and the second column
showing the corresponding value.
The resulting program is shown below.
main.cpp

S. D. Rajan, 2000-24 3-59


C O N T R O L S T R U C T U R E S

Lines 19-23 show the code to obtain a valid value for the number of user-defined values. These values are then obtained and
stored in lines 26-30. To display the float values in the scientific style, the setf function is used in line 33. Next the column
headings are displayed in lines 35-36. The number of blank spaces used before displaying both the index and the value is a
function of the field width used to display both these values. The field width is set to 5 for displaying the index. Unlike other
formatting flags, the field width manipulator, setw is valid only for the next item that follows the specification. This is set in line
39. Similarly, in line 41, the field width is set to 10 display the value. Is this field width adequate to display any floating point
value?

S. D. Rajan, 2000-24 3-60


C O N T R O L S T R U C T U R E S

Summary
In this chapter we saw most of the important C++ control structures. At this stage with the knowledge gathered from Chapters
2 and 3, we should be able to write moderately complex programs.
Below we summarize the important facts learnt in this chapter.
 Declaration and definitions must precede usage and execution. Declaration typically implies name and type associated
with a symbol. Definition implies the how and where the symbols are used. A symbol (or, an identifier) can be declared
more than once, but must be defined exactly once. Look at C++ compiler and linker having access to a C++
dictionary and a thesaurus that is augmented by a user-defined dictionary and thesaurus. The compiler looks first at
these documents when it is compiling a program and the linker collects all the object files to create the executable. A
compiler or linker error message is issued if there is missing information.
 Selection is achieved through (a) if … else if … else statement, and (b) switch statement. The selection conditions
can be formed through expressions involving relational and logical operators.
 Repetition is achieved through (a) while statement, (b) do … while statement, and (c) for statement.
 In addition, the break, continue and goto provide other forms of control statements. Later in the book we will see
the return and the exit statements.
 We saw examples where the size of a vector-related variable needed to be known in advance for memory allocation
to take place, e.g. the use of MAXVALUES in Example 3.4.2. This is a hinderance for writing a general-purpose program
and later in the book we will see the various solutions to this problem.
 At the end of the chapter, we saw (a) the enumerated type that help in making programs easier to read, and (b) the
dangers in mixed type arithmetic (type coercion).
 C++ provides a rich set of formatting controls for both screen as well as file outputs. We looked at a few of them in
this chapter. In later chapters we will see the rest.

Where to go from here?


Practicing to write small computer programs is the best way to learn the capabilities of any programming language. The exercise
problems that follow are designed to slowly introduce the reader to generating the solution to numerical problems that would
be meaningful to engineers and scientists. Most of the programs are less than 50 statements and can be completed in a short
period of time.

S. D. Rajan, 2000-24 3-61


C O N T R O L S T R U C T U R E S

Exercises
When writing programs simple or complex, it is a good programming habit to check for errors in input and inform
via clear error messages as to why the input has an invalid value. When a problem specifies exhaustive search (or
trial and error), assume that you do not know the analytical solution. You may use the analytical solution to check
the computer program generated solution.

Appetizers
Problem 3.1
Write a program to display as a table the values of the function y ( x )  ax 3  bx 2  cx  d for the range 10  x  15 using
an increment supplied by the user. Obtain the values of the coefficients of the cubic polynomial and the increment from the
user.
Problem 3.2
Write a program to obtain a set of up to a maximum of 10 positive float numbers. For this set of numbers, compute its (a)
average, (b) minimum, (c) maximum, and (d) standard deviation. Prompt the user to enter the numbers one at a time. A negative
value signals the end of the user input. Check for some basic input errors such as the first number being negative etc.
Problem 3.3
Write a program to obtain a set of ( x , y ) coordinates for 5 points. Find the two points that are (a) closest to each other, and
(b) farthest apart from each other.
Problem 3.4
Write a program to accept (a) an integer input and print out the number with the digits reversed (for example, if the input is -
18080, the output should be 08081-), and (b) a string input and print out the string with the characters reversed (for example, if
the input is arizona, the output is anozira).
Problem 3.5
What is the output from the following statements?
int nV=0;
for (int i=1; i <= 50; i = 3*i)
nV++;
cout << “nV is ” << nV << “.” << endl;

Main Course
Problem 3.6
Write a program to compute and display the value of sine of an angle using the following formula.
x3 x5 ( 1)i 1 x 2 i 1
sin( x )  x    ...   ....
3! 5!  2i  1 !
Terminate the series if the difference between two consecutive numbers in the series is less than 10 4 . Compare your results
with the value provided by the sin function.
Problem 3.7
Rewrite Example Program 3.2.4 with the following changes. The user input should be as follows - (a) the initial input should be
the student last name, first name, current GPA and the current number of semester hours, and (b) for the current semester, the
input should be the course number, the course semester hours, and the raw score on the course (between 0 and 100). No more
course input is required if the course number is STOP. The raw score should be translated to a letter grade as follows – A : 91-
100, B : 81-90, C : 71-80, D : 61-70, and E : 0-60. Assume that a student can take at most 6 courses per semester. Print a Grade
Report showing the student’s name, details of the current semester, and the new (cumulative) semester hours and GPA.

S. D. Rajan, 2000-24 3-62


C O N T R O L S T R U C T U R E S

Problem 3.8
d 4 y w( x )
The differential equation for a transverse deflection of a beam is given by  . The solution for a simply-supported
dx 4 EI
wx
beam of length L subjected to a uniform loading, w is given as y( x )  
24 EI
 x 3  2Lx 2  L3  . Write a program to
compute the largest deflection and its location using trial and error. Obtain the values of L , E, I , w and the units for force and
length from the user. Assume that the user input is in consistent units and no unit conversion is to be carried out.
Problem 3.9
What is the output from the following statements?
int nV=0;
for (int i=1; i <= 50; i = 3*i);
nV++;
cout << “nV is ” << nV << “.” << endl;

Problem 3.10
What is the output from the following statements?
int nV=0;
for (int i=0; i <= 50; i = 3*i);
nV++;
cout << “nV is ” << nV << “.” << endl;

C++ Concepts
Problem 3.11
dT
Newton’s Law of Cooling is given as  k  T  T  where T is the temperature that is a function of time, k is a positive
dt
constant, T0 is the temperature at time t  t 0 , and T is the ambient temperature. Solution of the above equation is given as
T ( t )  T   T0  T  e  kt
A coroner is called to a crime site (a warehouse) at 1 am on Jan 15, 2009. She finds a corpse whose temperature then is 80 F .
The temperature in the warehouse is maintained at 62 F . After two hours the temperature of the corpse drops to 72 F .
Write a program that uses trial and error to find the date and time of death. (Hint: The coroner when questioned says that for
most situations 0.1 hr  k  0.5 hr .)
Problem 3.12
A cable suspended between two posts hangs in the form of a catenary (Fig. P3.12) whose equation is given as
T w 
y( x )  cosh  x   c
w T 
where y( x ) is the height of the cable above the ground, T is the tension in the cable, w is the weight of the cable per unit
length, and c is a constant. By measuring the height of the cable, the following conditions are known - y( x  100)  750 ,
y( x  0)  75 and y( x  100)  750 . Write a program that uses trial and error to find the values of the parameters in the
catenary equation so as to satisfy the given conditions as closely as possible. (Hint: Take 450  T  500, 10  w  20, and
40  c  60 ).

S. D. Rajan, 2000-24 3-63


C O N T R O L S T R U C T U R E S

y(x)

750

750
75 x
100 100

Fig. P3.12

Problem 3.13
Write a program to obtain 7 values from the user. Store and display (a) these seven original values, (b) the seven sorted values,
and (c) their mean, median and standard deviation. Your approach should be general enough so that it is applicable for any set
of numbers. (Hint: Use Selection Sort. The basic idea is to determine the minimum (or maximum) of the list and swap it with the
element at the index where it is supposed to be. The process is repeated such that the nth minimum (or maximum) element is
swapped with the element at the (n-1)th index of the list. Here is an example involving 8 integer numbers and sorting in ascending
order.)
Initial 8 6 10 3 1 2 5 4
Pass 1 1 6 10 3 8 2 5 4
Pass 2 1 2 10 3 8 6 5 4
Pass 3 1 2 3 10 8 6 5 4
Pass 4 1 2 3 4 8 6 5 10
Pass 5 1 2 3 4 5 6 8 10
Pass 6 1 2 3 4 5 6 8 10
Pass 7 1 2 3 4 5 6 8 10

S. D. Rajan, 2000-24 3-64


4
M O D U L A R P R O G R A M D E V E L O P M E N T

Chapter

Modular Program Development


“Ambition isa lustthatisneverquenched,butgrowsmore inflamedandmadderbyenjoyment.” ThomasOtway

“Buildabettermousetrapandtheworldwillbeatapathto yourdoor.”RalphWaldoEmerson

“It is not the greatness of a man's means that makes him independent, so much as the smallness of his wants”. William
Cobbett

We saw several C++ constructs in Chapters 2 and 3. Using these constructs we can write moderately complex programs.
However, as the complexity of the tasks increases, developing a single long program contained entirely in the main program is
neither efficient nor practical. In this chapter, we will learn about the elements of modular program development starting with
functions, scope of variables and finally, program development with multiple source files.

Objectives
 To understand and practice the concept of functions.
 To understand the concept of scope of variables.
 To understand the concept of modular program development.
 To understand and practice good programming styles.

S. D. Rajan, 2000-24 4-65


M O D U L A R P R O G R A M D E V E L O P M E N T

4.1 Functions
We have in the previous chapters used functions without being formally introduced to them. In Example 2.5.2, we used the sin
function to compute the sine of an angle. As the documentation shows in Fig. 2.7.5, the function expects as an argument, the
angle in radians as a double precision value and returns the sine of the angle as a double precision value. One can look at a
function as a component in a program (an independent unit that can be separately compiled) that is defined because either it
has a very specific functionality or because this functionality is used in several locations in the program or both. Let us look at
the sin example again.
dAngle = 0.2;
double dValue = sin(dAngle);
The sin function expects to see a single parameter – a double precision value. In the above example, the variable dAngle provides
that value in radians. The sin function uses that value as the angle and evaluates the sine of the angle. The computed value is
returned to the calling program. The returned value can be stored and used via a variable. As shown above, the returned value
is stored in the variable dValue.
Functions can be passed with no arguments or one or more arguments, and can return either nothing or a single value. The
general syntax is as follows.
returnvalue functionname (argument 1, argument 2, …)
// argument list is optional
{

return somevalue; // optional. required only if
// returnvalue is not void.
}

In fact, the main program is a special function whose return value is an int. The functionname just like variable names must be
unique. The only exception is when a function is overloaded. We will see overloaded functions in Section 4.2. Here are some
examples of functions.
Example 1
int IntSquare (int n) // computes the square of an integer number
{
int nValue = n*n;
return nValue;
}

Example 2
float SumSquares (float x, float y) // computes the sum of the squares
{
return (x*x + y*y);
}

Example 3
bool IsEven (int n) // determines if a number is even or not
{
if ((n % 2) == 0)
return true; // even number
else
return false; // odd number
}

In the three examples, the number of function arguments is either one or two, and one value is returned to the calling program.
The variable name in the argument list as used in the calling program need not be the same as the corresponding variable name
as used in the function (see example below where the calling program uses nNumber and the function uses n). Note that each of
the function definitions states that the argument must evaluate to an integer (IntSquare and IsEven) or a float (SumSquares). For
example, in the case of an integer argument, in the calling program the argument can be an integer constant, an integer variable
or an expression that evaluates to an integer. In the IsEven function we have two return statements. Sometimes the return

S. D. Rajan, 2000-24 4-66


M O D U L A R P R O G R A M D E V E L O P M E N T

statement returns no value but merely terminates the execution of a function and returns control to the calling program. Before
functions are used anywhere in a program, they must be declared.

Example Program 4.1.1 Function to Determine if a Number is Even or Odd


Let us see how one would use the IsEven function in a program. The function can be developed quite easily if one recognizes
that an even number is exactly divisible by 2. In other words, the remainder is zero when an even number is divided by 2.
main.cpp

The function IsEven is declared in line 10 before being used in line 19. This (line 10) is called a function prototype. The prototype
establishes the return data type, the number of arguments and the data type associated with each argument. The names of the
variables used in the argument list can be provided but are not necessary. In other words, one could have written the prototype
as
bool IsEven (int n);

Since the function needs to be declared before being used, we could have written the program differently as follows.
#include <iostream>

bool IsEven (int n) // determines if a number is even or not


{
if ((n % 2) == 0)
return true; // even number
else
return false; // odd number
}

int main ()
{
….
return 0;
}

S. D. Rajan, 2000-24 4-67


M O D U L A R P R O G R A M D E V E L O P M E N T

In this version, the function IsEven is defined at the top of the file before the main program. Hence the function prototype is
not necessary. This solution does not work if the source program exists in several files. As we will see in the next section,
prototyping is a generic solution when functions are used.
How do we define functions where there is no need to return a value or if there are no arguments to be passed? C++ has a
keyword called void that can be used. Here is an example of a function that neither returns a value nor has a function argument.
Example 4
void PrintInputError (void)
{
std::cout << "Your input is invalid.\n";
}

OR
void PrintInputError ()
{
std::cout << "Your input is invalid.\n";
}

Example 5
void DisplayCoordinates (float x, float y)
{
cout << "X Coordinate: " << x << ". Y Coordinate: " << y << ‘\n’;
}

To correctly use a function with multiple parameters, once must be careful in ordering the arguments. Consider the following
example that uses the DisplayCoordinates function.
fXC = 1.1; fYC = ‐3.02;
….
DisplayCoordinates (fYC, fXC); // incorrect usage

While the coordinates will be displayed, the display is not correct. There is a one-to-one correspondence between the arguments
in the calling program and the parameters in the defined function.
When function arguments are used, the arguments can be passed as values, or as references, or as a pointer. In this chapter we
will discuss the first two.

Calling by Value: Consider line 19 in Example 4.1.1 where the function IsEven is invoked as IsEven(nNumber). When the
control is passed to the IsEven function, a copy of the variable nNumber (initialized with its current value) is created on the stack
before the statements in the function are executed. The stack is a special area of memory that the compiler uses for storing
named variables. The programmer is relieved of the responsibility of managing this space. Once all the statements in the function
are executed and control is passed back to the calling function, the variables stored in the stack associated with the function are
destroyed automatically. What is the implication of such a behavior? Consider the following program segments.
void AnalyzeData ()
{
….
double dXCoor = 1.2, dYCoor = 2.4, dMaxCoor;
dMaxCoor = MaxCoor (dXCoor, dYCoor);
std::cout << "Max of " << dXCoor << " and " << dYCoor << " is "
<< dMaxCoor;

}
….
double MaxCoor (double d1, double d2) // badly written function
{
if (d2 > d1) d1=d2;
return d1;
}

S. D. Rajan, 2000-24 4-68


M O D U L A R P R O G R A M D E V E L O P M E N T

Will we see the output as


Max of 1.2 and 2.4 is 2.4
or as
Max of 2.4 and 2.4 is 2.4?

We will see the correct result despite the badly written MaxCoor function. This is because in C++, by default, arguments are
passed as values. In this example, a copy of d1 and a copy of d2 are created on the stack just before the MaxCoor function is
executed. These copies are then used in the MaxCoor function and the final values (whether they are changed or not) are discarded
before control is passed back to the AnalyzeData function. In other words, the value of dXCoor is restored to 1.2 after control is
transferred to AnalyzeData function so that the std::cout statement is executed. How can we encourage better programming
practices in situations like this? A better way of defining and writing the MaxCoor function is to use the const qualifier.
double MaxCoor (const double d1, const double d2)
{
if (d2 > d1)
return (d2);
else
return d1;
}

If the function is defined with the const qualifier for both d1 and d2, the values of these two variables cannot be changed
anywhere within the function. In other words, the statement
if (d2 > d1) d1=d2;

will not compile!

Calling by Reference: A reference contains the memory address of a variable. When a function is called by reference, the
memory address of the variable is passed not the value of the variable. Consider a function swap that should swap the values of
two variables and the following program segment.

void swap (int n1, int n2); // prototype – call‐by‐value



int nA=10, nB=20;
std::cout << "Before swap A is " << nA < " and B is " << nB << ".\n";
swap (nA, nB);
std::cout << "After swap A is " << nA < " and B is " << nB << ".\n";

The outputs before and after the call to the swap function are identical since the usage is call-by-value. To ensure that the values
are swapped in the function, we need to change the program as follows.

void swap (int& n1, int& n2); // prototype – call‐by‐reference



int nA=10, nB=20;
std::cout << "Before swap A is " << nA < " and B is " << nB << ".\n";
swap (nA, nB);
std::cout << "After swap A is " << nA < " and B is " << nB << ".\n";

The ampersand character, &, is used to denote the memory address of a variable and hence denote that the function usage is
call-by-reference. The reference symbol is used in the function declaration or prototype and in the function definition but NOT
in the function call. In other words, to figure out whether the function usage is call-by-value or a call-by-reference one must
look at the function prototype or function definition. The swap function can be written as follows.

void swap (int& n1, int& n2) // (int &n1, int &n2) is also correct
{

S. D. Rajan, 2000-24 4-69


M O D U L A R P R O G R A M D E V E L O P M E N T

int nTemp = n1;


n1 = n2;
n2 = nTemp;
}

Passing Vectors: So far, we have seen how to pass scalar variables to a function. How do we pass a vector? C++ treats arrays
differently. When an array is used (in a calling function), the contents of the entire array are passed to the function. Let’s look at
an example of a function that can be used to print the elements of an integer vector.
void PrintVector (int nV[], int nSize)
{
for (int i=0; i < nSize; i++)
std::cout << "Element " << i << " : " << nV[i] << "\n";
}

The notation nV[] (without a constant integer within the square parenthesis) signifies a vector. To use this function, we can
define and use a vector as follows.
int nVBlocks[5], nVHeights[10];
….
PrintVector (nVBlocks, 5);

PrintVector (nVHeights, 10);

Note that with this example, it is possible to modify the values of the vector within the function. Since this function merely
prints the values, a better function definition is
void PrintVector (const int nV[], int nSize)

that prevents modification of the elements of nV.


We will consolidate what we have learnt about functions with an example.
Example Program 4.1.2 Using Vectors as Function Arguments
Problem Statement: Obtain several floating-point values from the user. Compute and display the average value, minimum value
and maximum value of these input numbers.
Solution: An examination of the problem statement shows the following components. First, we need to store several floating-
point values. We will store these values in a vector. Second, we need to compute three distinct values – the average, the minimum
and the maximum values. We will accomplish these four tasks using three functions. The function GetValues will be used to
obtain the floating-point values from the keyboard and store the values in the vector. The function AvgValue will be used to
compute the average value of all the entries in the vector. Finally, we will use a single function MinMaxValues to compute the
minimum and maximum values of all the entries in the vector. However, in terms of implementation, we will write two different
functions MaxValue and MinValue to compute the maximum and minimum values.
main.cpp

S. D. Rajan, 2000-24 4-70


M O D U L A R P R O G R A M D E V E L O P M E N T

The six function prototypes are declared in lines 13-20. The return type is clearly defined as are the function arguments, if any.
The const qualifier is used to tell the compiler that the argument should not be modified in the function. The main program
calls four functions – ShowBanner, GetValues, AvgValues, MinMaxValues, with the argument names associated with the variable
names defined in the main program not necessarily the names used in the prototypes. The other two functions (MaxValue and
MinValue) are called from the MinMaxValues function.

Let’s examine the implementation of the functions one at a time.

The ShowBanner function merely displays a banner when the program execution starts. We have used special characters in the
program as shown in lines 57-59. Their meanings along with other special characters are shown in Table 4.1.1.
Table 4.1.1 Special character sequences
Character Remarks
Sequence
\n Newline. Positions the cursor to the beginning of the next line.
\t Tab character. Positions the cursor to the next tab location.
\" Double quote. To output the " character verbatim.
\\ Backslash. To output the \ character verbatim.
\r Carriage return. Positions the cursor to the beginning of the current line.
\a Alert. Sounds the bell using the computer’s speakers.

S. D. Rajan, 2000-24 4-71


M O D U L A R P R O G R A M D E V E L O P M E N T

The GetValues function has two arguments. The first argument is a floating-point vector passed as reference. The second
argument is an integer (passed as value) that is the size of the vector. The const qualifier is used since it would be invalid to
change the size of the vector in the function.

In the AvgValue function, the input arguments to the function are the vector containing the values and the size of the vector,
and the output from the function is the average value. The first two (input) arguments are declared with the const qualifier.
Hence, the values of the elements of the vector fVEntries and the size in nSize cannot be modified in the function. However,
the last argument is passed as a reference since the average value is computed in the function and the value must be set in the
function. In line 34, this parameter is declared as fAvgValue and in the function the corresponding argument is declared as fAvg.

The MinMaxValues function has as input arguments the vector of entries as the first argument and the size of the vector as the
second, and has output arguments the minimum and the maximum values. Since the minimum and maximum values are set in
the MinValue function and the MaxValue function, they are passed as references. The MinMaxValues function calls two other
functions MinValue and MaxValue to obtain the actual computed min and max values.

One can also pass partial contents of a vector to a function. Consider the following example.
int nVBlocks[5] = {1,2,3,4,5};
….
PrintVector (nVBlocks, 5); // prints all the five values

One the other hand, if one wishes to print only the last three values, the corresponding call would be as follows.
PrintVector (&nVBlocks[2], 3); // prints the last three values

Note the & before nVBlocks1. There is a difference between the above call and

PrintVector (nVBlocks[2], 3); // will not compile!

1 We will see more about memory addressability in Chapter 8.

S. D. Rajan, 2000-24 4-72


M O D U L A R P R O G R A M D E V E L O P M E N T

If a specific element of the vector is used, e.g., nVBlocks[2], then the implication is that just the third element of that vector is
to be used. For example, if we wished to swap the contents of the third and the fifth elements, then the swap function can be
used as follows.
void swap (int& n1, int& n2); // prototype – call‐by‐reference

swap (nVBlocks[2], nVBlocks[4]);

4.2 More about Functions


As we have seen in this chapter, functions are extremely useful especially if either the functionality of a task can be clearly
identified or if functionality is required at several places in a program.
Default arguments: When one or more arguments are used in call by value, a default value can be specified for one or more of the
arguments. Let us assume that we are computing the weight of a rectangular beam. The function prototype is as follows.
float BeamWeight (float fLength, float fHeight, float fWidth,
float fDensity=28000.0f);
What this prototype specifies is that if the function is called as follows
fWeight = BeamWeight (fLength, 0.03f, 0.01f);
then the function will automatically use the density value as 28000.0. The defined function is as follows.
float BeamWeight (float fLength, float fHeight, float fWidth,
float fDensity)
{
return (fLength*fHeight*fWidth*fDensity);
}

Note that the function definition does not have the default value defined unlike the prototype. As an added restriction, the
default values must be on the rightmost parameters. In other words
float BeamWeight (float fLength, float fHeight, float fWidth=0.03f,
float fDensity=28000.0f);
is valid but
float BeamWeight (float fLength, float fHeight=0.03f, float fWidth,
float fDensity=28000.0f);
is not. Use of default arguments must be done with care.

Overloading functions: It is possible to have two or more different definitions of functions with the same name. As we will see in
the next example, we can write different functions that compute the minimum value of all the elements of a vector as we saw
in Example 4.1.2 but also obtain the values of the elements of the vector via keyboard input. Here are the function prototypes
for the GetValues function.
void GetValues (int nVV[], const int nSize);
void GetValues (float fVV[], const int nSize);

Note that while the function names are the same, the argument list is not identical. If two functions have the same name, the
compiler is able to differentiate between their usage in a program based solely on the argument list (not the return value) – either
the data types of the arguments must be different, or the number of arguments must be different.
The primary advantage to function overloading is program readability – we do not have to concoct new names for functions
even though these functions are almost always identical. However, the disadvantage is that we need to maintain several versions
of the same function. Later, we will see how to overcome this disadvantage using templates. We illustrate these new ideas about
functions in the following examples.

S. D. Rajan, 2000-24 4-73


M O D U L A R P R O G R A M D E V E L O P M E N T

Example Program 4.2.1 Overloaded Functions


Problem Statement: Write a program to obtain a number of integer values and a number of double values. Store the numbers in
(separate) vectors, and compute the minimum value in each vector.
Solution: We will essentially reuse the functions from Example Program 4.1.2, making appropriate changes.
main.cpp

As we have seen before, the function prototypes (lines 12-15) need to be declared before they are referenced in the main
program. Each GetValues function is designed to obtain no more than 5 user inputs from the keyboard. These functions are
called in line 24 and 31. These calls are followed by calls to the MinValue functions to obtain the minimum value input by the
user.

Note that both the forms of the function GetValues are essentially the same except for the usage of different data types. Similarly,
for the MinValue functions.

S. D. Rajan, 2000-24 4-74


M O D U L A R P R O G R A M D E V E L O P M E N T

There are other issues that we need to be aware of concerning function overloading. For example, what will happen if the
following statements are used in the program to invoke the MinValue as
float fValues[NUMELEMENTS], fMinV;
fMinV = MinValue (fValues, NUMELEMENTS);

We will get a compilation error since a version of the function that supports float values does not exist.

Type Coercion in Argument Passing and Return Value from a Function


In Chapter 3 we saw data coercion issues dealing with expression evaluations. Such issues can also arise with passing arguments
to a function and return values from functions. While data promotion implies widening of the data stored in a constant or a
variable, data demotion is just the opposite. Consider the following assignment.
nA = fX;
where we have an int on the left and a float on the right. In this case, the fractional part of fX is lost when the value is stored
in nA provided the integral part can be stored in nA. Similar corruption of data can arise storing data in a long variable into an
int, or a double into a float.
Now, consider the following function defined in a program.
double MassDensity (double dMass, double dVolume)
{
return (dMass/dVolume);
}

If this function is called as follows


std::cout << "Density of new material : " << MassDensity(5601,2)
<< "\n";
then the C++ compiler will automatically promote the two arguments to 5601.0 and 2.0, and then carry out division with real
numbers as 5601.0/2.0.
One must be careful, however, if functions are overloaded. If we define another MassDensity function incorrectly as follows.
int MassDensity (int nMass, int nVolume)
{
return (nMass/nVolume);
}

With this new overloaded function in the program, we will most likely obtain an incorrect value!

S. D. Rajan, 2000-24 4-75


M O D U L A R P R O G R A M D E V E L O P M E N T

Let us look at another example involving exhaustive search that we have seen before.
Example Program 4.2.2 Exhaustive Search (Revisited)
Problem Statement: The fluid flow through a trapezoidal channel is given by the following equation:
A H 2H
P   (4.2.1)
H tan  sin 
where P is the wetted perimeter, A is the cross-sectional area of the fluid, and the rest of the parameters are shown in Fig.
4.2.1.

H

Fig. 4.2.1 Flow through a trapezoidal channel
Given the values of the wetted perimeter and the height, develop a computer program to find the values of cross-sectional area
of the fluid, A and  .
Solution: The approach is not much different than what we used in Chapter 3. However, in this example, we will be using
functions. We can rewrite Eqn. (4.2.1) as
A H 2H
P   0 (4.2.2)
H tan  sin 
The basic idea is to try different values (combinations) of the unknown parameters, A and  , and find the combination that
gives the value of the left-hand side of Eqn. (4.2.2) closest to zero. We can essentially call the left-hand side of Eqn. (4.2.2) as
the error,  .
Algorithm: Here is the developed algorithm.
1. Set the values (or obtain user input for) P and H . Initialize error,  to a large value.
2. Loop through 5    85 in increments of 0.01 .
3. Loop through 0.1 m 2  A  2.0 m 2 in increments of 0.001m 2 .
4. Compute the left-hand side of Eqn. (4.2.2). If this value is smaller than  then set  to this value and save the values
of A and  .
5. End loop for A .
6. End loop for  .
7. Print the results.
One could make the lower and upper bound for the values of A and  , and their increments as user input. In the above
algorithm, the increments are deliberately chosen to be very small. The algorithm is implemented below.
main.cpp

S. D. Rajan, 2000-24 4-76


M O D U L A R P R O G R A M D E V E L O P M E N T

In lines 39-40 we initialize the value of the smallest error to the largest double value, the value that is available from the math
library via climits. Nested for loops are used to vary the values of  and A in lines 48 through 58. The function
UserFunction is called in lines 44 to compute the current error in the solution. In the next line, the function UpdateError is
called to check for the smallest error (so far) and to save the solution – values of  and A .

S. D. Rajan, 2000-24 4-77


M O D U L A R P R O G R A M D E V E L O P M E N T

The final computed values are displayed in lines 61-63. The timing information from the program are obtained first in line 27
(when program computations start) and then again in line 66 when all the computations have taken place. The program uses
the functions from C++’s time library (#include <ctime>) to obtain the clock time. While not technically correct, the clock
time will be a good indicator of the computational effort expended in the program. The data type time_t is a struct which we
will discuss in Chapter 7. The call to function time takes place as
time (&time_tVariable);
as this call involves call-by-pointer (discussed in Chapter 8). The difftime function is called in line 68 to compute the difference
between two time_t variables.

Recursive Functions: A function that calls itself is termed a recursive function. It should be noted that any problem that can be
solved using recursion can also be solved using iterations. Recursion as an option over iterations should be chosen with care. It
is the preferred approach if the algorithm to solve the problem lends itself more naturally to recursive calls. Recursive functions
are computationally expensive since the overhead of function calls carries over.
As an example of recursion, let us assume that we are interested in computing the factorial of a number, n . From the definition
of a factorial, we have
n !  (1)(2)(3)...( n ) (4.2.3)
with 0!  1 . Once we rewrite the above equation as
n !  n( n  1)! (4.2.4)
we can immediately see why recursion can be used to compute the factorial. The resulting code is simple.
long Factorial (int n)
{
if (n == 0) // recursion ends here
return 1L;
else
return (n*Factorial (n‐1));
}

Factorial can be programmed using iteration as follows.


long Factorial (int n)
{
long lFact = 1L;
for (int i=2; i <= n; i++)
lFact *= i;

return (lFact);
}

S. D. Rajan, 2000-24 4-78


M O D U L A R P R O G R A M D E V E L O P M E N T

In this example, both the function forms are equally elegant. However, there are some situations where recursion is a clear
choice.
Function calls come with execution-time overhead. The stack frame is used to keep track of the functions and the associated
arguments that are used during function calls. Hence, one should be careful in using functions especially if a few functions are
used repeatedly in a program.

4.3 Scope of Variables


Before we discuss modular program development, it is necessary to understand what is meant by scope of a variable. Consider
the following program (all statements are not included for the sake of brevity).
int main ()
{
float fLength;
cin >> fLength;
float fArea = ComputeAreaRectangle (fLength, fLength);

return 0;
}

float ComputeAreaRectangle (float f1, float f2)


{
float fLength = f1;
float fWidth = f2;
return (fLength*fWidth);
}

Will the compiler issue an error message because it is unable to differentiate between the variable fLength declared and used
in the main program with the variable fLength declared and used in the function ComputeAreaRectangle? The answer is “No”.
Let’s review what we have learnt so far with regards to variables. A variable’s name needs to be unique within a program segment
– main program or a function. In other words, two variables even if they have different data types, cannot have the same name.
The scope (or life) of the variable fLength in the main program is from the time it is defined in the main program until the end
of the main program signified by the return 0 statement. Similarly, the scope of fLength in ComputeAreaRectangle is once again
from its definition float fLength = f1; to the end of the function signified by the return (fLength*fWidth); statement. Such
variables are called local variables. For example, fWidth is a local variable in the function ComputeAreaRectangle. Let’s look at a
modified form of the function.
float ComputeAreaRectangle (float fLength, float fWidth)
{
return (fLength*fWidth);
}

Now is the function parameter fLength a local variable in the ComputeAreaRectangle function? Yes, since it is declared in that
function. Once again, the compiler will treat the two fLength variables (in the main program and the ComputeAreaRectangle
function) differently since their scope limits are different.
Now consider a modified form of the previous program. Assume that the following statements appear in a single source file
and compile correctly.
float fLength;

int main ()
{
cin >> fLength;
float fArea = ComputeAreaRectangle (fLength, fLength);

return 0;
}
float ComputeAreaRectangle (float f1, float f2)
{
float fLength = f1;
float fWidth = f2;

S. D. Rajan, 2000-24 4-79


M O D U L A R P R O G R A M D E V E L O P M E N T

return (fLength*fWidth);
}
Note that the variable fLength is declared outside of both the main program and the function ComputeAreaRectangle. In the
main program, it appears that the variable fLength is used without being declared. However, according to the scope rules, a
variable that is declared outside the body of the main program and all the functions, and precedes these functions, has a global
scope within the file. Hence, the variable fLength in the main program is the global variable declared at the top of the source
file. However, the variable with the same name declared in the ComputeAreaRectangle function is a local variable whose scope
is restricted to the function. A more subtle scope rule deals with the use of variables in a compound statement. A compound
statement (or block) contains one or more C++ statements within the {} braces. Consider the following statements.
int main ()
{
float fA, fB;
int n, m;
….
if (fA > fB)
{
int i;
i = 2*abs(n‐m);

}
i += 2;
….
return 0;
}

The program will not compile flagging the statement i += 2; as an invalid statement with the variable i as being undefined.
The reason is that the scope of the variable i is entirely in the if (fA > fB) block. Once again it is important to remember that
the scope of any variable in any block is from the time it is defined in the block until the end of the block defined by the closing
brace }.
We now present an example program to illustrate the scope rules discussed so far.
Example Program 4.3.1 Understanding Scope Rules
Problem Statement: Obtain the perimeter and area of a segment of a circle.
Solution: The length of the arc, a of a circle is given by
a  r (4.3.1)
where r is the radius of the circle and  is the arc angle in radians. Similarly, the area of a segment of a circle is given by
  
As   r 2   
(4.3.2)
 360 
The program is shown below.
main.cpp

S. D. Rajan, 2000-24 4-80


M O D U L A R P R O G R A M D E V E L O P M E N T

There are two declarations with initialization of the variable fRadius – one on line 12 and the other on line 19. The one on line
12 has a global scope meaning that the initial value of –123.4 is available throughout the file. However, the local variable declared
on line 19 overrides the global variable in the main program. Hence in the main program, fRadius has an initial value of 10.0
not –123.4. The variable fRadius is also declared as function argument in the two functions. Once again, these are local variables
that take precedence over the global definition. The actual value that the fRadius variable assumes when the function executes
comes from the corresponding argument in the function call (lines 30, 31, 40 and 41).
Both the variables fArcLength and fArea are declared twice in the main program. The variables declared in lines 28 and 29, have
their scopes limited to the block between lines 27 and 35. However, the scope of the variables with the same names declared
on lines 38 and 39, start at line 38 through the end of the main program.

Let’s look at a slightly revised program. What will happen if line 19 is moved after line 35? The expression on line 26 evaluates
as false and the if block (lines 27 through 35) does not execute. Moreover, when the function calls are made in statements 32
and 33, the results are incorrect since fRadius has a negative value!

S. D. Rajan, 2000-24 4-81


M O D U L A R P R O G R A M D E V E L O P M E N T

4.4 Developing Modular Programs


The divide and conquer philosophy is effective whether a program has 500 lines of code or 5 million lines of code. Software
engineering is an evolving science, and in this section, we will see one of the numerous components of software engineering
and modular program development. We will look at developing a program where there is more than one source file.
Preprocessing Directives: C++ provides several preprocessing directives that aid in program development and maintenance. The
purpose of these directives in the source file is to tell the preprocessor (or compiler) what specific actions to perform. As per
syntax rules for these directives, the number sign (#) must be the first nonwhite-space character on the line containing the
directive; white-space characters can appear between the number sign and the first letter of the directive. Some directives include
arguments or values. Lines containing preprocessor directives can be continued by immediately preceding the end-of-line
marker with a backslash (\). Preprocessor directives can appear anywhere in a source file, but they apply only to the remainder
of the source file. We will discuss the useful ones in this section.

#define identifier token_string


The token_string component is optional. One or more white spaces (or blanks) must separate the identifier from the
token_string. When token_string is defined, the #define directive specifies the token_string that is substituted for all
subsequent occurrences of the identifier in the source file. Here are a couple of examples.
#define PI 3.1415926f
#define MAX(a1, a2) (a1 > a2? a1 : a2)
In the first case, every occurrence of PI in the source file is substituted with 3.1415926. In other words, if the file contains the
following statement
fArea = PI*fRadius*fRadius;
then the preprocessor turns the statement into
fArea = 3.1415926f*fRadius*fRadius;
Similarly, with the second directive, the statement
fUpperBound = MAX (f1, f2);
is converted into
fUpperBound = (f1 > f2? f1 : f2);
In C++ terminology, MAX is called a macro, and the preprocessor directive expands the macro. When the #define directive is
used without a token_string, then all the occurrences of identifier are removed from the source file. The identifier remains
defined until the source file ends or if it is undefined using the #undef directive. The identifier can be tested using the #if
defined, #ifdef or #ifndef directives.

#undef identifier
Once an identifier is defined using the #define directive, the identifier can be undefined using the #undef directive. In other
words, the #define … #undef directives are paired together in a zone of the source file where the identifier has a special meaning.
For example,
#define TOLERANCE 0.0001
…..
#undef TOLERANCE

There is another similar directive #defined whose syntax is as follows.


#defined identifier

It can be used as in the following example:


#if defined PI
#define TWOPI 2*PI
#endif

S. D. Rajan, 2000-24 4-82


M O D U L A R P R O G R A M D E V E L O P M E N T

We will now examine a set of directives that are closely linked to each other.

#ifndef identifier
#ifdef identifier
#if constant_expression
#else
#elif
#endif

The #ifndef (if not defined), #ifdef (if defined) and #if are all paired with the #endif directive. For example,
#ifndef HAPPY
#define HAPPY
….
#endif

The constant_expression is an integer constant expression and is made up of integer constants, character constants and the
defined operator. For example,
#if DEBUGLEVEL == 0
cout << "The value of x is " << fX << endl;
#endif

The #else and #elif (else if) directives go with matching #if and #endif directives. For example,
#if DEBUGLEVEL == 1
cout << "The value of x is " << fX << endl;
#elif DEBUGLEVEL == 2
cout << "The value of x is " << fX << endl;
cout << "The value of y is " << fY << endl;
cout << "The value of z is " << fZ << endl;
#else
cout << "Safe execution so far." << fX << endl;
#endif

What better way to illustrate all the ideas that we have discussed so far than through an example?
Example Program 4.4.1 A Simple Statistics Library
Problem Statement: Develop a statistics library containing functions that support the following measures (a) arithmetic mean, X ,
(b) median, X m , (c) standard deviation,  , (d) variance,  2 , and (e) covariance,  xy . We will assume that real numbers are
used in all the computations.
Solution: Any book on statistics will provide the following formulae for the abovementioned measures.
n

x i
X i 1
(4.4.1)
n
x n 2  x n 2 1
Xm  if n is even (4.4.2a)
2
X m  x  n  1 2 if n is odd (4.4.2b)
n

x X
2
i
 i 1
(4.4.3)
n
n

x X
2
i
 
2 i 1
(4.4.4)
n

S. D. Rajan, 2000-24 4-83


M O D U L A R P R O G R A M D E V E L O P M E N T

x i  X  y i  Y 
 xy  i 1
(4.4.5)
n
This statistical library (of functions) is similar to C++ math library that supports a host of functions such as sqrt, fabs,
sin etc. These math functions can be used in any program if the function prototypes are included in the program via the
<cmath> header file. In this example, we will create the following source files.
statpak.cpp program to use and check the functions in the statistics library
stat.h header file containing the function prototypes for the library
stat.cpp file containing the statistical functions
The advantages of splitting the entire C++ source statements into three files are many. First, we capture the entire functionality
of the statistical library into two physical files – stat.h and stat.cpp. There are no extraneous issues to deal with. Second, these
functions can be used in any program. One needs to include the function prototypes, typically, as
#include “stat.h”

at the top of the source file(s) in which the functions are used. In fact, we can share these functions with other programmers.
Third, the process of testing and debugging becomes simpler since we have split the library functions from the actual usage of
the functions. In this example, we will embed the test functionality in the source file statpak.cpp. Imagine that we have a
large program containing several thousand lines of source code distributed into several physical files. If a few corrections are
made in a single file, is it necessary to compile all the source files? We need to recompile only those files where the source
statements are changed with a modular program development. Finally, the set of actual values used to test the functions will be
hardcoded into the test program. This is not quite efficient since this will involve editing and inserting new values, recompiling
and linking every time we wish to try a new set of test values. We will overcome this drawback once we learn how to deal with
external files (Chapter 12).
The source files are presented and discussed below.
stat.h

Potential conflicts may be created when several souce files (.cpp) are used in a program and each file includes the same header
file (.h). The #pragma once directive tells the compiler to include the header file only once. Older versions of compilers that do
not support #pragma once can be made to work in a different way. To avoid including a header file more than once when a file
is compiled, C++ provides the following mechanism. Define a unique identifier to be associated with the header file. The
compiler uses this identifier to track how many times a specific header file is referenced, and loads the contents of the header
file only once. For example, the following statements will essentially do what #pragma once has done in this example.
#ifndef __STATLIB_H__
#define __STATLIB_H__
// function prototypes
float StatMean (const float fV[], int nSize);
float StatMedian (const float fV[], int nSize);
float StatStandardDeviation (const float fV[], int nSize);
float StatVariance (const float fV[], int nSize);
float StatCoVariance (const float fVX[], const float fVY[],
int nSize);
#endif

S. D. Rajan, 2000-24 4-84


M O D U L A R P R O G R A M D E V E L O P M E N T

In line 1, the preprocessor directive #ifndef is used. We have already seen and used the #include preprocessor directive. The
#ifndef signifies if not defined. In other words, if the identifier following #ifndef has not been defined and used so far, then
the C++ compiler is directed to load and parse the statements that follow until the #endif directive is encountered. In line 2,
the identifier associated with the stat.h header file is defined using the #define preprocessor directive. It is a good idea to make
the identifier unique to avoid conflicts with other identifiers defined in other source files. When appropriate, in this book the
following naming convention will be used in defining the identifier – the first two characters are the underscore character “_”.
Similarly, the last three characters are H__.
stat.cpp

The functions in the library are defined in this file. Note that in line 7, the #include "stat.h" directive is used so that the
function prototypes are available to the compiler. Compare this to the way we have seen include directives before. The difference
between having the file name in the angle brackets <..> and between the double quotes “..” is that when the angle brackets
are used, the compiler looks for the file in a special directory (typically these are files that use various elements of the C++
library), whereas when the double quotes are used, the compiler looks for the file in the current directory (these are defined by
the programmer).

S. D. Rajan, 2000-24 4-85


M O D U L A R P R O G R A M D E V E L O P M E N T

The rest of the file is a strict implementation of Eqns. (4.4.1) through (4.4.5). The functions are reused as much as possible to
avoid replicating the code – the standard deviation function (StatStandardDeviation)calls the function that computes the
mean (StatMean). Some error checks are carried out. For example, every function checks to see whether the size of the vector
is positive (non-zero). We also assume that the vectors are already sorted for the StatMedian function to work correctly.

Creation of a library is the first step in using the library. The second equally important step is to develop a program to test the
functions in the library. This process is called unit testing that we will examine in more detail in Chapters 9 and 10. Listed below
is the main program containing code to test the statistical functions.
statpak.cpp

S. D. Rajan, 2000-24 4-86


M O D U L A R P R O G R A M D E V E L O P M E N T

The prototypes of the functions from the statistical library are included in line 11. In line 13 we see the prototype of a local
function. Three vectors are declared and defined in lines 17 through 24. The statistical measures are computed for each vector
via a call to ComputeStats function where the mean, demain, standard deviation, and variance are computed and displayed.
Finally, in line 42 the covariance function is tested using vectors A and B.

Programs invariably contain bugs, and the bugs are likely to be detected via unit testing. Readers are encouraged to use their
calculators to compute the statictical values before running the program.
From Program From Microsoft Excel

S. D. Rajan, 2000-24 4-87


M O D U L A R P R O G R A M D E V E L O P M E N T

Finally, we will look at one more form of scope that cuts across different files using the extern keyword. Consider the following
problem. A program is being developed and the main program and all the other functions are contained in two source files.
Suppose we have a variable nPoints in the first file (called main.cpp), and that we wish to use this variable (access and/or modify
the value) in the second file (called draw.cpp). The schematic diagram as to how to achieve this is shown in Fig. 4.4.1.
main.cpp draw.cpp

.... ....
int nPoints; extern int nPoints;
.... ...
int main () int Scale (...)
{ {
.... for (i=1; i <= nPoints; i++)
cin >> nPoints; {
.... ....
return 0; }
}
return 0;
void MaxCoordinate (...) }
{
.... void Display (...)
} {
....
}

Fig. 4.4.1 An example showing how the extern keyword is used


A variable can have a global scope if it is declared outside all the blocks and functions (and the qualifier static does not appear
before the variable). This must be done once and only once in all the files that are a part of the program. In the above example,
this is done in the file main.cpp. If the variable needs to be used in another source file, then the extern qualifier must appear
before the variable. In the above example, this is done in file draw.cpp and hence nPoints is visible in all the functions that
appear in that file.

Example Program 4.4.2 Global Variables via extern Keyword


Problem Statement: Develop a library of vector functions – dot product and length of a vector. Keep track of the total number of
floating-point operations in the program.
Solution: The dot product, d between two vectors an1 and bn 1 is defined as follows
n
d   a i bi (4.4.6)
i 1

The number of floating point operations is n . The length, l of a vector an1 is defined

l  a12  a 22  ...  a n2 (4.4.7)


The number of multiplications is n ; we will assume that the square root is equal to one multiplication (strictly speaking the
square root is evaluated using an iterative technique where each iteration involves several multiplications).
We will develop the program in three source files. The first file (main.cpp) will contain the main program. In the second file
(VectorOps.h), we will declare the two function prototypes – VectorDotProduct and VectorLength. In the third file
(VectorOps.cpp), we will define the two functions. In order to track the number of floating-point operations (FLOPS), we will

S. D. Rajan, 2000-24 4-88


M O D U L A R P R O G R A M D E V E L O P M E N T

use a global variable nFLOPS that we will initialize to zero in the main program. In the two functions, we will update the value of
nFLOPS.
main.cpp

Note the declaration of the global variable in line 14. The variable is initialized in line 23. To use the vector functions, we will
define and declare vectors fX and fY on lines 19 and 20. The length of these two vectors and the dot product between these
vectors are computed in the for loop (lines 26 through 31). These operations are executed 10000 times.

VectorOps.h

To ensure that the header file containing the prototypes is included during the program compilation only once, the #pragma
once statement is used in line 7. Note that the vectors are passed by reference and their size is the last argument in the argument
list. The reader should reflect on what would happen if the incorrect value of the size of the vector is passed to the two functions.
Later in the book (Chapter 10) we will see a more robust way of handling these operations.

S. D. Rajan, 2000-24 4-89


M O D U L A R P R O G R A M D E V E L O P M E N T

VectorOps.cpp

The global variable nFLOPS is declared in line 9 but note that the extern qualifier is used. The nFLOPS value is updated in line 20
in function VectorDotProduct and in line 36 in function VectorLength. As a note of caution, we will minimize (if not eliminate)
the use of global variables (see Programming Tip #12).
Storage Classes
C++ has several storage classes (not to be confused with scope). When a variable is declared and used, memory needs to be
allocated and subsequently deallocated when the variable goes out of scope. In other words, the storage class determines the
memory life of a variable and a function. We will discuss some (auto and static) of the storage classes here and others (static,
extern, mutable) later.

automatic variables
A local variable’s life is defined by its scope – the variable is created when and where the variable is declared in a block and ends
when execution exits the block. However, C++ automatically assumes the responsibility of allocating and deallocating the
memory for the variable. Hence the name for the storage class – automatic. There is a C++ keyword auto that can be used as
follows:
auto int nPopulation;

static variables
Global and static variables and functions exist from the time a program starts execution till the time the program finishes
execution. We saw the use of the extern keyword before in connection with global variables. The keyword can also be used
with functions. Similarly, the keyword static can be used with variables and functions. Here is an example of a static variable
used in a function.
void DoNothing ()
{
static int nCount = 0;
nCount++;
std::cout << "Count is : " << nCount << "\n";
}

S. D. Rajan, 2000-24 4-90


M O D U L A R P R O G R A M D E V E L O P M E N T

We will use this function from another function as follows.


for (int i=1; i <= 4; i++)
DoNothing ();

The output that is generated when the program is executed is shown in Fig. 4.4.2.

Fig. 4.4.2 Output from a function using a static variable


Tip: Finally, let us review the type of scope rules as they apply to identifiers – local scope is within a block contained between
the curly braces {..}, function scope is only within a function not outside the function, file scope is applicable if the identifier
appears outside all the blocks and functions (if the qualifier static does not appear before the identifier, then the identifier is
global), and prototype scope begins and ends with a prototype statement.

4.5 Function Templates


We were introduced to the concept of function overloading in Section 4.2. We immediately saw the advantages to naming
functions based on their functionalities rather than the data type associated with the function parameters. As we saw in Example
Program 4.2.1, the two functions to compute the minimum value of a set of numbers were exactly the same except for the data
type associated with the set of numbers – one was int and the other double. Clearly if we wanted to support the other primitive
data types, then we would need a version for short, long, and float data types – a total of 5 different functions. This can lead
to severe program maintenance problems. For example, even a small change in the algorithm to compute the minimum value
would involve making the same change to all the 5 functions. C++ provides a cleaner approach using a function template.
A function template is a complete function where the basic functionality is coded without explicitly specifying the data type of
some or all the function parameters, so as to create a family of functions. We will now discuss the template syntax. The first line
for a template function is as follows.
template <class T>

The line is called the template prefix. The keywords template and class appear as shown above. The parameter T is the type
parameter. The compiler will substitute the appropriate data type for T based on the function call. The parameter name can be
any legal identifier. We have chosen T for convenience’s sake. Also, one could have more than one parameter separated by
commas. The general syntax is
template < [typelist] [, [ arglist ]] > declaration
and we will see this declaration and usage later when template classes are introduced in Chapter 9. Let’s go back to Example
Program 4.2.1 and the int version of the function MinValue.
int MinValue (const int nVV[], int nSize)
{
// set minimum value to the first entry
int nMinV = nVV[0];

// now compare against the rest of the entries


for (int i=1; i < nSize; i++)
{
if (nVV[i] < nMinV) nMinV = nVV[i];
}

return nMinV;
}

We could easily rewrite this function substituting the int data type associated with the vector of number with a generic data
type T as follows.
template <class T>
T MinValue (const T TVV[], int nSize)
{

S. D. Rajan, 2000-24 4-91


M O D U L A R P R O G R A M D E V E L O P M E N T

// set minimum value to the first entry


T TMinV = TVV[0];

// now compare against the rest of the entries


for (int i=1; i < nSize; i++)
{
if (TVV[i] < TMinV) TMinV = TVV[i];
}

return TMinV;
}

Compare the differences between the two versions. In the template version, we have the special first line. The function definition
for
int MinValue (const int nVV[], int nSize)

now reads
T MinValue (const T TVV[], int nSize)

Beyond this, the appropriate int declarations are replaced with T! All the other changes are style changes (writing TVV instead of
nVV etc.).

Example Program 4.5.1 Function Templates (Example Program 4.2.1 Revisited)


Problem Statement: Write a program to obtain several integer values and a number of double values. Store the numbers in (separate)
vectors, and compute the minimum value in each vector.
Solution: We will make the original program much shorter and cleaner using function templates. For both the functions –
GetValues and MinValue. In addition, we will split the source files by putting the functions in a separate file. The main program
is shown below.
main.cpp

The source statements for the main program are identical to Example Program 4.2.1 except for line 10 where the #include
statement is used to declare the two functions used in the program.

S. D. Rajan, 2000-24 4-92


M O D U L A R P R O G R A M D E V E L O P M E N T

Tip: If the template functions are defined in a separate file, then the source statements need to be included in the header file
(not a C++ source file) as shown below.
templates.h

The template functions are almost identical to the non-template functions except for the differences pointed out earlier in
addition to being defined in a header file. The entire program is now more compact and perhaps, easier to maintain.

4.6 Case Study: A 4-Function Calculator


In this section we will develop a simple four-function calculator. The user will be prompted as
Enter +‐*/CS or value [???]

where ??? is the current value that one would see on the calculator. The four functions are add, subtract, multiply and divide as
signified by the following symbols +, ‐, *, /. To clear the display (reset the value to zero), we will use C or c. Similar to power
off the calculator, we will use the symbol S or s. Let us look at some examples of using the calculator. To compute 12(5.1  6.3),
the user of the program would do the following.
Enter +‐*/CS or value [0] 5.1
Enter +‐*/CS or value [5.1] +
Enter +‐*/CS or value [5.1] 6.3
Enter +‐*/CS or value [11.4] *
Enter +‐*/CS or value [11.4] 12
Enter +‐*/CS or value [136.8] S

We will try to resolve an important problem before starting to develop the algorithm and program structure. One of the biggest
problems that new C++ programmers have with the standard input class is the way input is read and interpreted. For example,
what happens with the following code:

#include <iostream>

int main ()

S. D. Rajan, 2000-24 4-93


M O D U L A R P R O G R A M D E V E L O P M E N T

{
int nPoints;

std::cout << "Input number of points: ";


std::cin >> nPoints;
std::cout << "You have input: " << nPoints << "\n";

return 0;
}

if the user types an invalid value such as 1w3 instead of 123. This is the output generated by the Microsoft Visual C++ compiler
is shown in Fig. 4.6.1.

Fig. 4.6.1 Output generated by an invalid input


The standard input class parses the input buffer and stops when it encounters an invalid character. What is worse is that the rest
of the input w3 still remain in the input buffer and will be read in by any subsequent cin statements. There are several
mechanisms available within the standard input class to check whether the read statement executed successfully. We have
created one such function that is described below.
GetInteractive Function: The prototypes for the GetInteractive functions are shown below.
// int version
void GetInteractive (const string& strString, int& V);
void GetInteractive (const string& strString, int& V,
int nL, int nU);
// long version
void GetInteractive (const string& strString, long& V);
void GetInteractive (const string& strString, long& V,
long lL, long lU);
// float version
void GetInteractive (const string& strString, float& V);
void GetInteractive (const string& strString, float& V,
float fL, float fU);
// double version
void GetInteractive (const string& strString, double& V);
void GetInteractive (const string& strString, double& V,
double dL, double dU);
// string version
void GetInteractive (const string& strString, string& V,
int n);

// raw access and conversion functions


int GetLongValue (string& strInput, long& lV);
int GetDoubleValue (string& strInput, double& dV);

For example, we could rewrite the previous program as follows. The main program is shown below, and the entire program
would be formed by an additional source code contained in getinteractive.cpp.
#include “getinteractive.h”

int main ()
{
int nPoints;

GetInteractive ("Input number of points: ", nPoints);


std::cout << "You have input: " << nPoints << "\n";

return 0;
}

S. D. Rajan, 2000-24 4-94


M O D U L A R P R O G R A M D E V E L O P M E N T

The overloaded versions can be used if range checking is to be performed. For example, what if the value of nPoints is
between 1 and 100. Then we would rewrite the call to GetInteractive as.
GetInteractive ("Input number of points: ", nPoints, 1, 100);

The details of the GetInteractive function are not discussed here. Interested readers can explore the source code
(getinteractive.cpp).
And now on to the development of the algorithm. We will store the displayed value in a variable called dMemory and the next
input value in a variable called dNext. We will track the operation to be performed using a variable called nOper that will have a
value of 1, 2, 3 and 4 if binary addition, subtraction, multiplication and division is to be performed; otherwise, the value will be
set to 0. To obtain and act on the user command, we will use a variable called nCommand that will have the following values – 0
to exit the program, 1 for addition, 2 for subtraction, 3 for multiplication, 4 for division, 5 to clear the display to zero, and 6 if
the user has input a number.
Algorithm
1. Initialize dMemory, dNext and nOper to zero.
2. Loop until user terminates the program.
3. Get user command. Limit number of input characters to 10.
4. Is input one of +‐*/cCsS or is it a number? Try to read the number as a double precision value. If there is an invalid
input, ask the user for a valid input.
5. If the input is to stop the calculations, then exit the program.
6. If the input is to clear the memory, then set dMemory and nOper to zero.
7. If the input is one of the four operators, update nOper.
8. Else the input is a number. Check nOper. If nOper has been defined (nOper is not zero) then carry out the operation,
update dMemory and set nOper is zero. Else store this number in dNext.
9. End loop.
10. Terminate program.
The program is developed in a modular form with all the steps except Steps 3 and 4 implemented in the main program
(main.cpp). Steps 3 and 4 are implemented in several functions (utility.h and utility.cpp) and the details are presented
below.
Example Program 4.6.1 A Four-Function Calculator
main.cpp

S. D. Rajan, 2000-24 4-95


M O D U L A R P R O G R A M D E V E L O P M E N T

There are seven utility functions. The ShowBanner and ShowGoodBye functions are used once at the beginning and at the end of
the program. The UserCommand function parses the user input and returns only if the input is valid. The binary operations are
carried out in functions Add, Subtract, Multiply and Divide. The function prototype file is shown below.
utility.h

The implementation of the utility functions is shown below.

S. D. Rajan, 2000-24 4-96


M O D U L A R P R O G R A M D E V E L O P M E N T

utility.cpp

An interesting part of the implementation deals with formatting the user prompt. Recall that the user prompt must be of the
form:
Enter +‐*/CS or value [???]
The first part of the string is straightforward. The only sticky part is how to get the current value of dMemory instead of ??? and
store the entire prompt as a standard string. We will study more about strings in Chapter 7. However, we will explain what is
necessary to format strings. First, we need to include the class to format strings. Recall that we have been using the iostream

S. D. Rajan, 2000-24 4-97


M O D U L A R P R O G R A M D E V E L O P M E N T

for input and output. Similarly, the sstream class links strings with streams. In other words, it is possible to read from or write
to a string using the formatting capabilities that we have seen in the stream classes. In lines 7 and 8, we declare the string stream
class and indicate that we wish to use the standard output string stream class in the program. The actual variable strPrompt
associated with this class is declared on line 29. Note how the variable (or object) is used to format the string on lines 35 and
36. If we had wanted to write the string to the standard output, we would have used the following statement:
std::cout << "\nEnter +‐*/CS or value [" << dMemory
<< "] ";

The first parameter of the GetInteractive function is declared as const string&. The conversion from ostringstream into a
const string& requires the use of str member function from the ostringstream class as we see in line 39. As we have
mentioned before, sometimes it is easier to use and then understand why. We have used the advanced features here only because
there is no other alternative. The “why” will be tackled in Chapter 7.

Second, we have used an effective but not elegant approach to taking care of the divide by zero problem in the Divide function.
In line 95, if the denominator is zero, we merely return a zero value as the result instead of flagging the error. What is an
alternative? We could issue an error message and terminate the program. We will see more about error handling in the next
section and throughout the rest of the book.

4.7 Exception Handling


C++’s utilities to handle errors or exceptional situations is called exception handling. Examples of exceptional situations include
(a) errors that take place in C++ libraries, e.g. result of a math computation that is outside the range of numbers that can be
represented (underflow or overflow), dividing by zero, invalid math operation such as taking the square root of a negative
number, reading from a file that does not exist etc., (b) errors that take place in standard libraries or third-party libraries, e.g.
accessing an invalid element of a vector or string, etc., and (b) errors that take place in user-generated code. The legacy nature
of C/C++ makes this task a little more difficult.

try/catch Block
Exception handling is made up of the try/catch block. The try block contains C++ statements that may potentially generate
an exception as well as statements that should be executed if no exception occurs. The process of generating an exception is
called throwing an exception. One or more catch blocks immediately follow the try block. The general format is as follows.
try
{
// statements including at least one of the following
throw expression;

}
catch (datatype_1 identifier)
{

S. D. Rajan, 2000-24 4-98


M O D U L A R P R O G R A M D E V E L O P M E N T

// exception handling statements


}

catch (datatype_n identifier)
{
// exception handling statements
}
catch (…)
{
// exception handling statements
}

Each catch block specifies a unique datatype that it can catch and contains an exception handler. The last catch block contains
three dots (and no data type) and can be used to catch any type of exception not caught by the preceding catch block(s). The
identifier associated with the catch block is the catch block parameter and is designed to catch the exception thrown by the try
block.
Example Program 4.7.1 Exception Handling
The following program is a simple one. The user inputs a floating-point number, and the program computes its reciprocal and
its square root. In this example we will see how to catch three types of errors and handle them. The first is catching an invalid
input – user typing a number that is not a floating-point number. The second error is if the user inputs a zero value since it is
not possible to compute its reciprocal. The last error is if the user inputs a negative number since we are interested in computing
a real square root. The try keyword is in line 19 and the try block is contained in lines 20 through 32. The user input is read in
line 22. The validity of the input is checked in line 24 through the fail() member function. If the statement is true (user entered
an invalid input), an exception with a std::string data type is thrown in line 25. The other two errors are detected in line 26
and the associated exception is thrown in line 27 with a double data type. If no error is detected, the reciprocal and square root
values are computed and displayed on the console in lines 30 and 31.

The first catch block with a double data type is contained in lines 33 through 39. Appropriate statements are output to the
console indicating the type of error (zero value or a negative value). The second catch block with a std::string data type is
contained in lines 40 through 43. The exception handler outputs an error message that the user input is invalid. Finally, the

S. D. Rajan, 2000-24 4-99


M O D U L A R P R O G R A M D E V E L O P M E N T

catch-all block is contained in lines 44 through 47 and should never be executed. It is merely shown in this program to illustrate
how a catch-all block may be used in a program.

We will take at deeper look at C++ provided exception classes and how to handle errors in a typical program in later chapters.

S. D. Rajan, 2000-24 4-100


M O D U L A R P R O G R A M D E V E L O P M E N T

Summary
Material from the first four chapters should enable a programmer to completely write moderately complex programs. However,
most programmers do not write complete programs. Object-oriented programming makes it possible to use and modify
components written by others. We will start the study of numerical analysis from the next chapter.
While there is no universal programming style, we will set the guidelines for some good programming practices.
Programming Style Tip 4.1: Naming variables, constants and functions
Variable names in C++ start with an alphabet and involve the following characters: 0‐9, a‐z, A‐Z, $ and _. In this book, we
will maintain the following convention. The objective is to make the naming of the variables uniform and predictable. C++
provides the following basic scalar types – integer, float, double, boolean, and character. In addition, C++ also has support for
vectors and matrices as a basic type and through STL (standard template library). However, we will use our own vector and
matrix templates, as we will see in Chapter 9. Whenever appropriate, we will use the string class to store character strings.
Data Type Variable Name Prefix Examples
Integer scalar n? nX, nIterations, nJoints
Float scalar f? fY, fTolerance
Double scalar d? dArea, dP123
Boolean scalar b? bDone, bConverged
Integer vector nV? nVScores, nVSSN
Integer matrix nM? nMVertices, nMShapes
Float vector fV? fVRHS, fVForces
Float matrix fM? fMCoef, fMElementForces
Double vector dV? dVRHS, dVGPA
Double matrix dM? dMCoef, dMCoordinates
String str? strNames, strStates
Class name C? CNode, CPoint
Pointer p? pVCoor
Member of class m_? m_nRows, m_fVCoordinates

We will denote constants with upper case alphabets. For example


const float PI=3.1425926f;
We will denote function names starting with a capitalized alphabet, e.g. void ReadInput (…).
Tread lightly with this tip. Extreme use (not overuse) of this tip may lead to undecipherable variable names such as
pVdMCoordinates.
Programming Style Tip 4.2: Comments
A good programming habit is to liberally sprinkle the source code for a computer program with comments that are meaningful.
One approach is to make some of the comments mirror the steps from a detailed algorithm or from an associated theory. We
havel seen examples of this approach in this chapter.
Programming Style Tip 4.3: Use of blank spaces and blank lines
Use blank spaces whenever it makes the statements easier to read. For example blanks positioned before and after the
assignment sign, and after C++ keywords, make the following statements easier to read.
fXDifference = fX2‐fX1; // difference in the x coordinates
// update max value
if (nV1 > nV2) nMaxValue = nV1;

A blank line before and after a body of statements makes the body easier to read and understand.
// initialize all parameters to default values
Initialize ();

// now draw all lines


for (int i=1; i <= nLines; i++)
{

S. D. Rajan, 2000-24 4-101


M O D U L A R P R O G R A M D E V E L O P M E N T

GetLine (i);
DrawLine (i);
}

// put point labels


ShowPointLabels ();

Programming Style Tip 4.4: Indentation and Braces


Indentation of the statements makes the program easier to follow. In the following statements, indentation is used to distinguish
the different if‐else block of statements, as well the statements belonging to the for loops. The block of statements have
matching braces {..}. In addition, the use of braces makes the for loop easier to read, though it is not necessary for correct
program execution.
if (bComputeAverage)
{
for (fAvg = 0.0f, int i = 1; i <= nPoints; i++)
{
fAvg += fValues(i);
}
fAvg /= float(nPoints);
}
else
{
for (fSD = 0.0f, int i = 1; i <= nPoints; i++)
{
fSD += fValues(i)*fValues(i);
}
fSD = float(sqrt(fSD));
}

Programming Style Tip 4.5: Simple Statements


This is somewhat subjective because what is simple to an experienced programmer may be difficult to follow for the beginner
programmer. However, we will develop programs that are easy to read and follow. For example, we can take a complex
statement and break it into several easy-to-understand simple statements. The sequence of statements
float fOI, fII; // to store moment of inertia
fOI = (pow(wH+2.0f*fT, 3.0) * fW)/12.0f; // outer moment of inertia
fII = 2.0f*(pow(wH, 3.0) * (0.5f*(fW‐wT)))/12.0f; // inner MOI
m_fSyy = (fOI ‐ fII)/(0.5f*wH+fT); // section modulus
is easier to read than
m_fSyy = ((pow(wH+2.0f*fT, 3.0) * fW)/12.0f ‐ 2.0f*(pow(wH, 3.0) *
(0.5f*(fW‐wT)))/12.0f) /(0.5f*wH+fT);

Programming Style Tip 4.6: Naming iterator variables


Use i, j, k etc. as the variables to denote iterators. For example,
for (int i=1; i <= nLimits; i++)
{

}

Programming Style Tip 4.7: Modular Program Development 1


Whenever possible, we will develop program components (functions and classes) that can be reused. As a further refinement,
template functions and classes will be developed and used whenever appropriate.
Programming Style Tip 4.8: Modular Program Development 2
We will store the modules in separate files. This will make the task of program development and maintenance easier.

Programming Style Tip 4.9: Use std::string to store character strings

S. D. Rajan, 2000-24 4-102


M O D U L A R P R O G R A M D E V E L O P M E N T

Though we have not learnt about classes – how to define and use them, whenever possible, we will use the string class to store
a string of characters. We will see a formal introduction to the standard string class in Chapter 7.
Programming Style Tip 4.10: Create easy to read tabular output
As we have seen with the simple and moderately complex examples so far in the book, it is much easier for one to digest the
information if presented in a visually attractive form. Tabular output is one such form (see Example Programs 3.2.5). Graphical
output is another attractive form and we will see more about this in Chapter 19.
Programming Style Tip 4.11: Preprocessor directives and macros
We strongly discourage the use of both manifest constants and macros. For example, instead of defining a constant as
#define PI 3.1415926

define the constant instead as


const float PI=3.1415926f;

Not only will the above declaration generate compiler errors if an attempt is made to modify the constant but will also allow
debugger access to the constant.
Macros are cumbersome to maintain and can lead to subtle errors. We will see that using templates (Chapter 9) provides a much
better alternative.
Programming Style Tip 4.12: Global variables
Once again, we will strongly discourage the use of global variables using extern qualifier. Once we understand how to define
and use classes, we will see how to organize the program to minimize, if not eliminate, the need for global variables.
Programming Style Tip 4.13: Templates
We will strongly urge the use of templates whenever appropriate. Templates not only reduce the size of the source statements
in a program and make program maintenance easier, but they also make software reuse through the development of libraries
possible. As we have mentioned before, one of the strengths of C++ is the ease with which libraries have developed and made
available to programmers. The Standard Templates Library (STL) is one such example, and is discussed in Appendix B.

Where to go from here?


The reader is at an important crossroad if the reader has diligently read and practiced the ideas from these first four chapters.
There are several challenging problems in the exercises that follow that will test the reader’s problem-solving skills. Note that
not only there is no unique way of writing a program, but testing a program for its correctness is a challenge. The author cannot
emphasize enough the importance of learning how to use the debugger. Read Appendix A to see how to use the debugger in
Microsoft’s Visual Studio environment. It is simple and easy to learn. Like any other tool, you determine how effectively it can
be used to detect and fix problems in your program.

S. D. Rajan, 2000-24 4-103


M O D U L A R P R O G R A M D E V E L O P M E N T

Exercises
Most of the problems below involve the development of one or more functions. In each case (a) develop a a plan to
test the function(s), and (b) implement the plan in a main program. The functions should not use cin or cout
unless specified. Put the main program in a separate file and the function(s) in separate file(s).

Appetizers
Problem 4.1
Write a function IsOdd to determine whether an integer n is an odd number as
bool IsOdd (int n);

Problem 4.2
The thermal efficiency, e of a heat engine is defined as the ratio of the net work done to the thermal energy absorbed at the
highest temperature during one cycle as
Qc
e  1
Qh
where Qh is the amount of heat absorbed by the engine and Qc is the amount of heat given up. The function prototype is
given as follows.
float Efficiency (float fHeatAbsorbed, float fHeatLost);

Problem 4.3
The half-life of radon is 3.8 days. Write a function to compute the concentration of radon given the initial concentration (in
mol L ) and the elapsed time (in days). The function prototype is given as
float RadonConc (float fInitialConc, int nDays);

Problem 4.4
The Arrhenius Equation relates the rate at which a reaction proceeds with its temperature and is given as

k  Ae 
 Ea RT 

where k is rate coefficient, A is a constant, Ea is the activation energy, R is the universal gas constant, T is the temperature
in degrees Kelvin. Write a function to compute the rate coefficient (stored in a vector) for a given number of values of
temperatures (stored in a vector). nPoints is the number of temperature values.
void Arrhenius (float fA, float fEa, const float fVT[], float fVK[],
int nPoints);

Problem 4.5
Write a function to compute the state of stress (  x  ,  x y  ) on a plane given (  x ,  y ,  xy )
y’

and  in degrees (Fig. P4.5).


void StressTransform (float fSigx, float fSigy, float fTauxy, float& fSigxP, 

y’x’
x’

float& fTauxPyP, float fTheta); 


x’

x  y x  y 
x 

x'   cos 2   xy sin 2


2 2 
xy

x  y
x' y'   sin 2   xy cos 2 
yx

2

y

Fig. P4.5

S. D. Rajan, 2000-24 4-104


M O D U L A R P R O G R A M D E V E L O P M E N T

Main Course
Problem 4.6
Develop a library of functions to compute the surface area and volume of the following three-dimensional objects – (1) Cube,
(2) Tetrahedron, (3) Right pyramid, (4) Right circular cylinder, (5) Right circular cone, (6) Sphere. The function prototypes are
presented below.
void CubeProp (float& fSurfArea, float& fVolume, float fSide);
void TetrahedronProp (float& fSurfArea, float& fVolume, const float fVX[4],
const float fVY[4], const float fVZ[4]);
void PyramidProp (float& fSurfArea, float& fVolume, float fSide,
float fHeight);
void CylinderProp (float& fSurfArea, float& fVolume, float fRadius,
float fHeight);
void ConeProp (float& fSurfArea, float& fVolume, float fRadius,
float fHeight);
void SphereProp (float& fSurfArea, float& fVolume, float fRadius);
Problem 4.7
Conduction of heat through different shaped solid objects. The total heat transfer rate Q through the body is related to the
temperature difference T and a quantity called the conduction shape factor S by Q  Sk T  Sk  T1  T2  where k is
the thermal conductivity of the body and T1 and T2 are the boundary surface temperatures across which heat flow takes place.
The shape factor S is related to the thermal resistance of the body and is given as shown in the following table.

Shape S
Slab of thickness t and cross-sectional area of heat flow A (see Fig. P4.7(a)) At
Long hollow cylinder of length L , inner radius r1 at temperature T1 , outer radius r2 at 2 L
temperature T2 ln  r2 r1 
A sphere of radius R maintained at temperature T1 placed in a semi-infinite medium at a 4 R
distance z from a surface maintained at temperature T2 (see Fig. (b)) 1  R  2z 
A sphere of radius R maintained at temperature T1 placed in a semi-infinite medium 4 R
maintained at temperature T2 and placed at a distance z from an insulated surface 1  R  2z 
A cylinder of radius R and length L maintained at temperature T1 placed horizontally in 2 L
a semi-infinite medium at a distance z from a surface maintained at temperature T2 cosh 1  z R 
Circular hole of radius R centered in a square solid of side a and length L 2 L
ln  0.54 a R 

T
2

A K
T
z

2
T
t 1 R
T
1

(a) (b)
Fig. P4.7
Develop a program to obtain the values of different parameters and compute the value of Q . Construct a separate function
for each of the shapes shown in the table.

S. D. Rajan, 2000-24 4-105


M O D U L A R P R O G R A M D E V E L O P M E N T

Problem 4.8
Write a function to compute the area under a curve that is approximated as straight lines between adjacent points as shown in
Fig. P4.8. With this scheme, the shape under the curve between adjacent points can either be a trapezoid or a rectangle or a
triangle (note: area can be positive or negative or zero).
y

(x2, y2 ) (x3, y3 ) (xn-1, y )


n-1

(x , y )
1 1

x
(xn, yn )

(x , y )
i i
Fig. P4.8
The prototype of this function is as follows.
float AreaUnderCurve (const float fVX[],const float fVY[], int nPoints,
int& nTriangles, int& nRectangles, int& nTrapezoids);

The inputs to the function are the vector of x and y values of the points. The output is the return value that is the area under
the curve and the last three arguments – the number of triangles, rectangles and trapezoids detected during the computation of
the area. Develop three other functions:
float AreaRightTriangle (float fHeight, float fBase);
float AreaRectangle (float fHeight, float fBase);
float AreaTrapezoid (float fBase, float fHeightLeft, float fHeightRight);
to compute the area of a right triangle, rectangle and a trapezoid. Call these functions from the AreaUnderCurve function.
Problem 4.9
Fibonacci numbers are defined as the sequence of the following integers 0, 1, 1, 2, 3, 5, 8, … In other words, F1  0, F2  1
and Fi  Fi 1  Fi  2 , i  2, 3,... Write two functions to compute the Fibonacci numbers with the first using iterations and the
other using recursion.
Problem 4.10
Develop a library of functions to operate on points in the ( x , y , z ) space that are stored as a double vector of length 3. The
following functions are needed.
Distance: The straight-line distance between points 1 and 2.
DistanceFromOrigin: The straight-line distance between the point and the origin of the coordinate system.
UnitVector: Unit vector between points 1 and 2.
DistanceFromLine: Shortest distance from the point to the straight-line connecting points 1 and 2.
The function prototypes are given below.
double Distance (const double dV1[3], const double dV2[3]);
double DistanceFromOrigin (const double dVP[3]);
void UnitVector (const double dV1[3], const double dV2[3],
double dVUnitV[3]);
double DistanceFromLine (const double dVP[3], const double dV1[3],

S. D. Rajan, 2000-24 4-106


M O D U L A R P R O G R A M D E V E L O P M E N T

const double dV2[3]);

Once you have tested the functions, convert them to template functions so that they can be used with either float or double
data types.

C++ Concepts
Problem 4.11
Enhance the library of statistical functions shown in Example Program 4.4.1. Develop these functions as template functions.
The functions should compute the following statistical measures – (1) Arithmetic mean, (2) Geometric Mean, (3) Median, (4)
Mean Deviation, (5) Standard Deviation, (6) Variance, and (7) Covariance.
Problem 4.12 (See Problems 1.9 and 1.10)
Several engineering problems are solved using heuristics. One can loosely define heuristics as the employment of the solution
techniques that are based on experience rather than on a rigorous theory. The use of rule-based procedures is an example of
heuristics. We all use heuristics on a regular basis especially when playing games. Write a computer program for playing tic-tac-
toe. Start with an initial screen that looks like the grids shown in Fig. P4.12.

1 2 3 1 2 3
4 5 6 4 5 6
7 8 9 X 8 9
User plays first Computer plays first
Fig. P4.12

S. D. Rajan, 2000-24 4-107


M O D U L A R P R O G R A M D E V E L O P M E N T

S. D. Rajan, 2000-24 4-108


5
N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

Chapter

Numerical Analysis: Introduction


“Learningwithoutthoughtisuseless,thoughtwithout learningisdangerous.” Confucius

“Youcanuseaneraseronthedraftingtableorasledgehammerontheconstructionsite.”FrankLloydWright

“Ifalittleknowledgeisdangerous, whereisthemanwhohassomuchastobeoutofdanger?”ThomasH. Huxley

In the preceding chapters we saw how to write simple yet useful programs. As we have repeatedly seen before, good computer
programs can be used to among other things, automate mundane, repeating tasks and make them as error-free as possible. In
this chapter we will start to look at the basics of numerical analysis. We will see how integer and floating-point numbers are
represented and stored. We will look at accuracy, sources of errors and how best to deal with computer arithmetic.

Objectives
 To understand the basics of numerical analysis starting with numerical representation.
 To understand what is meant by numerical approximation and numerical errors.
 To understand Taylor series expansion and function approximation.

S. D. Rajan, 2000-24 5-109


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

5.1 Approximations and Errors


In this introductory section we will see how numbers are represented in computers, how computer arithmetic takes place, and
what are the sources of numerical errors?
Significant Digits
Scientific notation is the representation of floating-point numbers as
 a .bcd  10  xyz
where a , b , c , d , x , y , z are integers between 0 and 9. The significant digits in a number represent the digits that are known to
be correct or with confidence. For example, 0.0054 and 0.054 are both known to two significant digits. In these examples, the
zeros appearing after the decimal (leading zeros) help to locate the first nonzero entry and hence the zeros are not important.
On the other hand, how many significant digits are there in 0.0540 or in 5400 where we have trailing zeros? A better way of
answering that question is to write the numbers in scientific notation as 5.40  10 2 and 5.4  10 3 . With this notation, 0.0540
has three significant digits and 5400 has two significant digits. If on the other hand, 5400 is expressed as 5.400  10 3 , then the
number has 4 significant digits.
A number x̂ is an approximation to its true value x to s significant digits if the following inequality is satisfied by the largest
positive integer s
x  xˆ 10  s
 (5.1.1)
x 2
One could ask “Find x a to approximate 500 to 3 significant digits.” Using the above definition, we can express the solution as
500  x a 10 3

500 2
from which 499.75  x a  500.25 .
Numerical Representation
We as human beings are taught to recognize and manipulate numbers using the decimal system where the digits go from 0 to
9. For example,
 4096 10  6  10 0  9  101  0  10 2  4  10 3 .
Numbers in computer systems are represented as binary (or sometimes octal or hexadecimal) numbers. In the binary system,
the digits are either 0 or 1. For example,
100112  1  2 0  1  21  0  2 2  0  2 3  1  2 4  19 10
In computer terminology, a bit is the smallest unit of storage. In other words, a bit can either store a 0 or a 1 value. Bits are
grouped together to form bytes. For example, 8 bits form a byte. Bytes can then be grouped together to form words. On most
32-bit hardware systems such as those manufactured by Intel and AMD, 4 bytes (or 32 bits) form an integer word and a single-
precision word. 8 bytes (or 64 bits) form a double-precision word.
Consider a hypothetical computer system where integers are stored in 4 bits. Since integers are signed meaning that they can be
either negative or positive, one of these bits is used to store the sign (Fig. 5.1.1).

sign bit
Fig. 5.1.1 Bit representation for a 4-bit integer storage
A 1 in the sign bit usually signifies a negative number. The other three bits are then used to store the value of the integer. The
largest value occurs when all the three bits are 1’s. In other words, the largest value than can be stored is

S. D. Rajan, 2000-24 5-110


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

1112  1  2 0  1  21  1  2 2   7 10 . Hence the range of integer numbers, n that can be represented in a 4-bit
representation is 7  n  7 . Or the range of (decimal) numbers, n that can be represented in a p -bit representation is
(2 p 1  1)  n  (2 p 1  1) . Going back to Section 2.2, we can now see why a short number using 16 bits can be used to store
values between (216 1  1)  n  (216 1  1)  32767  n  32767 .
When a floating-point number is represented in the decimal system, we can continue to think of the number as we did with
integers. For example,
 4.203 10  4  10 0  2  10 1  0  10 2  3  10 3
Similarly, for the binary representation of floating-point numbers, we have the following example.
1.1012  1  2 0  1  2 1  0  2 2  1  2 3  1.625 10
The computer representation of a floating-point number is a little more complex. Typically, a floating-point number has three
components (Fig. 5.1.2).
exponent
0 1 8 9 31

sign bit mantissa


Fig. 5.1.2 IEEE representation of single precision floating point number
In other words, the number x is stored as a binary approximation as
x  q  2n (5.1.2)
where q is the mantissa and n is the exponent. The IEEE single precision floating point standard representation requires a
32-bit word1, which may be represented as numbered from 0 to 31, left to right. The first bit is the sign bit, S, the next eight bits
are the exponent bits, 'E', and the final 23 bits are the fraction 'F':
S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
0 1 8 9 31
The value V represented by the word may be determined as follows:

 If E=255 and F is nonzero, then V=NaN ("Not a number")

 If E=255 and F is zero and S is 1, then V=-Infinity

 If E=255 and F is zero and S is 0, then V=Infinity

 If 0<E<255 then V=(-1)**S * 2 ** (E-127) * (1.F) where "1.F" is intended to represent the binary number created
by prefixing F with an implicit leading 1 and a binary point.

 If E=0 and F is nonzero, then V=(-1)**S * 2 ** (-126) * (0.F) These are "unnormalized" values.

 If E=0 and F is zero and S is 1, then V=-0

 If E=0 and F is zero and S is 0, then V=0

1 ANSI/IEEE Standard 754-1985, Standard for Binary Floating Point Arithmetic

S. D. Rajan, 2000-24 5-111


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

The exponent can either be negative or positive. A bias is subtracted from the exponent in order to get the actual exponent.
This bias value is 127 for single-precision floats. As an example, an exponent (E) value of 134 means that the actual exponent
is (134-127), or 7.
The mantissa represents the precision bits of the number. It is composed of an implicit leading bit and the fraction bits. To
maximize the quantity of representable numbers, floating-point numbers are typically stored in normalized form. This basically
puts the radix point after the first non-zero digit. Thus, we can just assume a leading digit of 1, and don't need to represent it
explicitly. As a result, the mantissa has effectively 24 bits of resolution, by way of 23 fraction bits.
For example,
1 10000001 10100000000000000000000 = ‐1 * 2**(129‐127) * (1.101)2 = ‐(6.5)10
Since the number of bits for the exponent is 8, the approximate range of numbers that can be represented is
8
2( 2 1127)  3.4(1038 ) . There are five distinct numerical ranges that single-precision floating-point numbers are not able to
represent.
(1) Negative overflow: Negative numbers less than (2  2 23 )  2127 .
(2) Negative underflow: Negative numbers greater than 2 149 .
(3) Zero (see below).
(4) Positive underflow: Positive numbers less than 2149 .
(5) Positive overflow: Positive numbers greater than (2  2 23 )  2127 .
Overflow means that values have grown too large for the representation, much in the same way that you can overflow integers.
Underflow is a less serious problem because is just denotes a loss of precision, which is guaranteed to be closely approximated
by zero.
The procedure to store and interpret double precision numbers is very similar to single precision numbers (Fig. 5.1.3).
exponent
0 1 11 12 63

sign bit mantissa


Fig. 5.1.3 IEEE representation of double precision floating point number
There are some special values that one should be aware of. IEEE reserves exponent field values of all 0s and all 1s to denote
special values in the floating-point scheme.
Zero
As mentioned before, zero is not directly representable in the straight format, due to the assumption of a leading 1 (we'd need
to specify a true zero mantissa to yield a value of zero). Zero is a special value denoted with an exponent field of zero and a
fraction field of zero. Note that -0 and +0 are distinct values, though they both compare as equal.
Unnormalized
If the exponent is all 0s, but the fraction is non-zero (else it would be interpreted as zero), then the value is a denormalized
number, which does not have an assumed leading 1 before the binary point. Thus, this represents a number (-1)s x 0.f
x 2-126, where s is the sign bit and f is the fraction. For double precision, denormalized numbers are of the form (-1)s
x 0.f x 2-1022. From this you can interpret zero as a special type of unnormalized or denormalized number.
Infinity
The values +infinity and -infinity are denoted with an exponent of all 1s and a fraction of all 0s. The sign bit distinguishes
between negative infinity and positive infinity. Being able to denote infinity as a specific value is useful because it allows
operations to continue past overflow situations. Operations with infinite values are well defined in IEEE floating point.

S. D. Rajan, 2000-24 5-112


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

Not A Number
The value NaN (Not a Number) is used to represent a value that does not represent a real number. NaN's are represented by a
bit pattern with an exponent of all 1s and a non-zero fraction. There are two categories of NaN: QNaN (Quiet NaN) and
SNaN (Signalling NaN).
A QNaN is a NaN with the most significant fraction bit set. QNaN's propagate freely through most arithmetic operations.
These values pop out of an operation when the result is not mathematically defined.
An SNaN is a NaN with the most significant fraction bit clear. It is used to signal an exception when used in operations. SNaN's
can be handy to assign to uninitialized variables to trap premature usage.
Semantically, QNaN's denote indeterminate operations, while SNaN's denote invalid operations.
Converting a Decimal Number to a Non-Decimal Number (Base b)
So as to generalize the procedure for both integers and floating-point numbers, we will split the given number into its integral
and fractional parts. Note that an integer has no fractional part.
Integral Part
(1) Divide the integral part by the base b. This yields a quotient and a remainder. The remainder is the rightmost digit of
the integral part of the new number.
(2) Divide the quotient again by b. The remainder is the next digit of the integral part.
(3) Repeat step (2) until the quotient is zero. The (last) remainder is the leftmost digit of the new number.
Fractional Part
(1) Multiply the fractional part of the decimal number by base b. The integral part of the product constitutes the leftmost
digit of the fractional part of the new number.
(2) Multiply the fractional part of the product by base b. The integral part of the product constitutes the next digit of the
fractional part of the new number.
(3) Repeat step (2) until a zero fractional part or a duplicate fractional part occurs. The integer part of the (last) product
is the rightmost digit of the fractional part of the new number. A duplicate fractional part is an indication that the digit
(or sequence) is a repeating one.
Example 5.1
Problem Statement: Represent each of the following decimal numbers as a binary numbers.
(a) 12 (b) -24 (c) -1.45
Solution: For each of the numbers we present a table showing the calculations.
(a) 12 10  1100 2
Division (Quotient, Remainder) Binary Number
12/2 (6,0) 0
6/2 (3,0) 00
3/2 (1,1) 100
1/2 (0,1) 1100
(b)  24 10   11000 2 2
Division (Quotient, Remainder) Binary Number
24/2 (12,0) 0
12/2 (6,0) 00
6/2 (3,0) 000
3/2 (1,1) 1000
1/2 (0,1) 11000

2 Negative numbers are usually represented as 2’s complement. See Problem 5.9.

S. D. Rajan, 2000-24 5-113


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

(c)  1.45 10   1.0 1 100... 2


Integral Part
Division (Quotient, Remainder) Binary Number
1/2 (0,1) 1

Fractional Part
Multiplication (Product, Integral Part) Binary Number
0.45 x 2 (0.90,0) 0
0.90 x 2 (1.80,1) 01
0.80 x 2 (1.60,1) 011
0.60 x 2 (1.20,1) 0111
0.20 x 2 (0.40,0) 01110
0.40 x 2 (0.80,0) 011100
0.80 x 2 (1.60,1) 011100…

As we can see from the last row in the table, the pattern begins to repeat itself; hence the calculations are terminated. It should
be noted with this simple example, a number that can be represented exactly as a decimal number may not be represented
exactly as a binary number with a finite number of bits.

Types of Errors
Before we discuss the various types of errors that can result from computer arithmetic, let us first define two very important
terms – absolute error and relative error. Let x t be the true value of a quantity whose computed approximate value is denoted
as x a . The absolute error, Eabs is then given as
Eabs  x t  x a (5.1.3)
and the relative error, Erel is defined as
xt  xa
Erel  (5.1.4)
xt
While the signs are sometimes useful, usually we are more concerned with the magnitude of the error. Hence the absolute values
are used in both the error definitions. Both these error measures tell us something about how accurate the approximate value
is.
Example 5.2
Problem Statement: (a) The weight of a certain object is 15.0 N. A store clerk weighs the object and reports the weight as 15.5 N.
What are the absolute and relative errors in the clerk’s measurement?
(b) A student astronomer using a telescope estimates the distance to a celestial object as 15,500,000 miles. It is known that the
celestial object is in fact 15,000,000 miles away. What are the absolute and relative errors in the student’s measurement?
Solution:
(a) From the problem data we have x t  15.0 and x a  15.5 . Using Eqn. (5.1.3) we have
Eabs  x t  x a  15.0  15.5  0.5 N
Using Eqn. (5.1.4) we have
xt  xa 15.0  15.5 0.5
Erel     0.0333
xt 15.0 15.0
(b) From the problem data we have x t  15000000 and x a  15500000 . Using Eqn. (5.1.3) we have

S. D. Rajan, 2000-24 5-114


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

Eabs  x t  x a  15000000  15500000  500000 miles


Using Eqn. (5.1.4) we have
xt  xa 15000000  15500000 500000
Erel     0.0333
xt 15000000 15000000
This simple example shows why both error measures are necessary to draw conclusions. The absolute error in the weight
estimate is 0.5 and 500000 in the distance estimate. One may incorrectly conclude that the weight measurement is more accurate.
However, when we compare the relative errors, both errors are exactly the same – 3.33%.
Now we will look at errors resulting from computer arithmetic – round-off errors and truncation errors.
Round-Off Errors
Round-off errors occur because a fixed number of bits are used to represent numbers. For example, the fraction 1/3 cannot be
represented exactly as a decimal when a fixed number of bits are available. Similarly, as we saw with Example 5.1(c), all decimal
numbers cannot be represented exactly using finite number of bits as binary numbers. Round-off errors can occur at all stages
of numerical computations and the cumulative effect can be large.
A floating-point number can be represented as
x  x a  10 n  x e  10 n  d  approx x  error
where d is the number of digits available for the mantissa. For example, consider a computer representation of floating-point
numbers with a fixed word length of 6 digits. Suppose we want to represent the number 199.05678 using such as representation.
The number can be represented as
199.05678  0.19905678  10 3
  0.199056  0.78  10 6   103
In this example, the digits after 6 are dropped (chopped off). When this procedure is used, we can bound the error as
Error  10 n  d (5.1.5)
n d
Hence the truncation error is x e  10  0.00078 . On the other hand, if symmetric roundoff is used, the error is usually
smaller. In symmetric roundoff, the last significant digit is rounded up by 1 if the first dropped digit is larger than or equal to 5.
Hence, going back to our example, we have
199.05678  0.19905678  10 3
 0.199057  10 3
Note that the first dropped digit is 7 and the last retained digit is 6 that is rounded up to 7. We can bound the error for symmetric
roundoff as
Error  x e  10 n  d x e  0.5
Error   x e  1  10 n  d x e  0.5
Hence,
Error  0.5  10 n  d (5.1.6)
which when compared to Eqn. (5.1.3) is at the worst case half as big.
Truncation Errors
Truncation errors occur because when an approximation is used in the place of an exact representation. For example, in the
evaluation of a transcendental function involving an infinite series, a truncated series is used in the numerical evaluation of its
value. Consider the following series to evaluate the sine value.
x3 x5 x7 x9
sin x  x      ..... (5.1.7)
3! 5! 7! 9!
Since all the terms in the infinite series cannot be evaluated, the series is truncated after a certain number of terms. Hence, the
term truncation error.

S. D. Rajan, 2000-24 5-115


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

Consider the case where sin 0.5 is evaluated. When the first three terms are used, we have
0.53 0.55
sin(0.5)  0.5    0.47942708
3! 5!
and when the first four terms are used we have
0.53 0.55 0.57
sin(0.5)  0.5     0.47942553
3! 5! 7!
A more accurate value computed using extended precision is 0.47942553860420300027328793521557.
Machine Epsilon or Precision
Another important quantity is known as machine epsilon. Machine epsilon,  is the upper bound on the relative error that
occurs when a nonzero real number, x is represented as a floating point number x a . In other words
x  xa
 (5.1.8)
x
We can customize the above expression for computer systems that use base b with d -digit mantissa. When truncation is used
  b  d 1 (5.1.9a)
and when symmetric rounding is carried out
  0.5  b  d 1 (5.1.9b)
There is another way of defining machine epsilon (also known as unit roundoff). Let x a be the smallest number representable
in the machine arithmetic that is greater than 1 (in the machine). The machine epsilon is then defined as
  xa 1 (5.1.10)
We will use this definition to estimate the machine epsilon and show that Eqns. (5.1.9) and (5.1.10) are equivalent.
Example Program 5.1.1 Computing Machine Precision
In the example shown below, the machine epsilon is estimated using Eqn. (5.1.10). In other words, we will add a floating-point
number to 1.0 and check to see if the sum is 1.0. If not, we will divide the number by 2 until the number becomes so small that
adding it to 1.0 will yield a result of 1.0.
main.cpp

S. D. Rajan, 2000-24 5-116


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

In line 14, the floating-point number (that is added to 1.0) is itself initialized to 1.0. In line 26, this value is halved. The process
is repeated until adding the number to 1.0 (line 20) does not change the result from 1.0. The program output is shown in Fig.
5.1.4.

Fig. 5.1.4 Machine epsilon in single and double precision

Computer Arithmetic
As we have seen earlier, floating point numbers can be represented to a finite precision. In the following examples, we will
illustrate errors resulting from computer arithmetic.

S. D. Rajan, 2000-24 5-117


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

Example Program 5.1.2 Effect of Precision


In this example we will examine the effect of finite precision that exists in any computer system. The problem is to compute
n
1
the sum of the series implied in  2 . The accuracy of the evaluation should increase with increasing values of n . Note that
i 1 i

the analytical result is  2 6 .


main.cpp

The program computes the sum of the infinite series using both single and double precision after having obtained the value of
n from the user. A sample output is shown in Fig. 5.1.5 for a relatively large value of n - 100 million. For both single and
double precision, the sum of the series is computed two different ways as follows
1 1 1 1
S forward      2 (5.1.11a)
12 2 2 32 n
1 1 1 1 1
Sreverse      2  2 (5.1.11b)
n 2
 n  1  n  2 
2 2
2 1

S. D. Rajan, 2000-24 5-118


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

One would expect that there would be no difference between the two procedures. In Eqn. (5.1.11b), the sum is obtained by
adding from the smallest to the largest number. However, the sample output shows that Sreverse is more accurate in the single
precision version and about the same in the double precision version.

Fig. 5.1.5 Sample output generated by using n as 100 million


Atkinson [1978] shows that if truncation is used rather than rounding, and if all numbers are positive, the strategy of adding
from smallest to largest minimizes the effect of these chopping errors.

Example Program 5.1.3 Numerical Errors


As we discussed earlier, there are different types of numerical errors that can take place in a computer program. In this example,
we will look at some of them. The source code artificially generates these errors – divide by zero, illegal operation, floating point
overflow and underflow, and an operation that yields a NaN.
main.cpp

S. D. Rajan, 2000-24 5-119


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

Fig. 5.1.6 shows the output generated by the program. In line 35, the program uses Microsoft-specific function _isnan to check
whether the result of the division (from line 34) stored in fE yields a valid number or not.

Fig. 5.1.6 Output generated by Microsoft C++ compiler


Finally, we will look at what C++ provides us with respect to integer and floating-point numbers.

Example Program 5.1.4 Numerical Limits


This example program shows how to extract the numerical limits for integer and floating point numbers on a particular operating
system and hardware.
main.cpp

S. D. Rajan, 2000-24 5-120


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

Since these constants are machine dependent, C++ provides a very convenient mechanism to find these machine dependent
values. A sample output is shown in Fig. 5.1.7.

Fig. 5.1.7 Machine-dependent values for Windows 10 using MSVS 2019 C++ compiler

5.2 Series Expansion


Having introduced the concept of errors including truncation error, we will now see one of the reasons why an understanding
of errors is helpful. We will look at Taylor Series expansion that will in later chapters be used in problems such as root finding,
numerical differentiation, minimization and maximization, ordinary differential equations etc. The basic motivation is to use
polynomials to approximate continuous functions.
The Weierstrass Theorem
The theorem states that if f ( x ) is a continuous function for a  x  b and   0 , then there is a polynomial p( x ) for which
f ( x )  p( x )   a x b
Polynomials are attractive because computation of their derivatives and indefinite integrals is easy and the results are also
polynomials.

S. D. Rajan, 2000-24 5-121


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

Taylor Series Expansion


Taylor’s Theorem states that if function f ( x ) and its (n+1) derivatives are continuous in the interval containing a , then the
function can be approximated as a polynomial of the form
f ( a ) f ( a )
f ( x )  f ( a )  f ( a )( x  a )  ( x  a )2  ( x  a )3  ...
2! 3!
f n  ( a )
( x  a )n  Rn (5.2.1)
n!
where the remainder is given by
x
 x  t n
Rn   f ( n 1) ( t ) dt (5.2.2)
a
n!
Eqn. (5.2.1) represents an infinite series, and the theorem provides a mechanism for constructing different types of
approximations (with finite number of terms) that can then be used in an effective manner to construct numerical solutions.
Consider the following three approximations.
(a) f ( x )  f ( a ) (5.2.3)
(b) f ( x )  f ( a )  f ( a )( x  a ) (5.2.4)
f ( a )
(c) f ( x )  f ( a )  f ( a )( x  a )  ( x  a )2 (5.2.5)
2!
Eqn. (5.2.3) represents a zero-order approximation. If points x and a are sufficiently close to one other, then the approximation
is reasonably good. Eqns. (5.2.4) and (5.2.5) represent the first and second-order approximations. In general, both provide better
approximations than the zero-order approximation but at an additional cost of having to evaluate more terms. We can rewrite
Eqn. (5.2.1) as
f ( x i ) f ( x i )
f ( x i 1 )  f ( x i )  f ( x i )( x i 1  x i )  ( x i  1  x i )2  ( x i 1  x i )3  ...
2! 3!
f n  ( x i )
( x i 1  x i )n  Rn (5.2.6)
n!
and
f ( n 1)  
 x i 1  x i 
n 1
Rn  (5.2.7)
( n  1)!
where x i and x i 1 are two different points. If we denote h  x i 1  x i , then we can write the infinite series as
f ( x i ) 2 f ( x i ) 3 f (n)(xi ) n
f ( x i 1 )  f ( x i )  f ( x i )h  h  h  ...  h  Rn (5.2.8)
2! 3! n!
and
f ( n 1)  
Rn  h n 1 (5.2.9)
( n  1)!
where x i    x i 1 .
Example 5.2.1 Function Approximation
Here are some examples of well-known functions using Taylor Series expansion.

x2 x3
ex  1 x   
2! 3!

S. D. Rajan, 2000-24 5-122


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

x3 x5 x7
sin( x )  x    
3! 5! 7!
x2 x4 x6
cos( x )  1    
2! 4 ! 6!
Sometimes Taylor series approximation is not particularly efficient. Consider a fourth-degree approximation of e x in the interval
 1,1 expanding about x  0 . Then
x2 x3 x4
P4 ( x )  1  x   
2 6 24
and the error estimate is given as

x5 e 5
 e x  P4 ( x )  x 0 x 1
120 120
e 1 5 1 5
x  e x  P4 ( x )  x 1  x  0
120 120
The error increases with increasing x and

e
Max e x  P4 ( x ) 
1 x 1 120

S. D. Rajan, 2000-24 5-123


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

Summary
This chapter is an introduction to numerical analysis – the business of finding approximate solutions via a numerical technique
that is implemented as a computer program. As we saw in the first four chapters, C++ provides the tools for implementing
numerical techniques as robust, fast and accurate computer programs. It is important to note that usually numerical solutions
are approximate for several reasons. In this chapter we looked at two such sources of error – truncation and round-off errors.
In the later chapters, we will see how these errors affect different numerical techniques.

Where to go from here?


This is the first chapter where we have formally looked at some of the underlying principles of numerical analysis. Numerical
solutions are not a panacea for problems where analytical solutions are tedious to find or do not exist. As we have seen, there
are different sources of numerical errors. Hence, one must be careful in generating numerical solutions using any programming
language including C++. Readers should note that generating solutions by hand (paper and pencil?) should precede developing
and writing the algorithm, and implementing the algorithm in the form of a computer program. There are numerous resources
available on the internet that discuss nuances and pitfalls in numerical calculations. Carry out a search and find out what the
messages are.

S. D. Rajan, 2000-24 5-124


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

Exercises
Most of the problems below involve the development of one or more functions. For each applicable case, the function
prototype is given. In each case (a) develop a plan to test the function(s), and (b) implement the plan in a main
program. The functions should not use cin or cout unless specified. Put the main program in a separate file and
the function(s) in separate files.

Appetizers
Problem 5.1
(a) Compute the binary form of the following decimal numbers. (i) -187 (ii) 3009 (iii) -199 (iv) 5789.
(b) Compute the decimal equivalent for the following binary numbers. (i) (-10011)2 (ii) (101010)2 (iii) (11111100001)2.
Problem 5.2
Fn 1
Fibonacci numbers, Fi , are defined as F0  F1  1 and Fi  2  Fi 1  Fi , i  1, 2, 3,... The ratio x n  is the Golden Ratio
Fn
1 5
with n   . It is known that lim x n  . Determine the relative error in approximating x  for n  1, 5,10 .
n  2
Function prototype
double REGoldenRatio (int n);
Problem 5.3
Expand the function f ( x )  2 x 4  1.5x 2  33.4 x  10.5 in Taylor’s series about x  0.5 . Use the resulting expression to
estimate the value of f ( x  1) by retaining 1, 2, 3 and 4 terms of the expansion. Determine the absolute and the relative errors
for each case.
Function prototype
void TSExpansion (int nTerms, double& dEst, double& dAbsError,
double& dRelError);
Problem 5.4
Compute the roots of the quadratic polynomial 3.3x 2  40.5x  1.8  0 using (a) float data type and (b) double data
type. Check the absolute error for each case by substituting the root back into the equation.

Main Course
Problem 5.5
Write a function to convert an integer value to its binary representation. Display the binary representation on the screen. Use
data from Problem 5.1 to test your function.
Function prototype
void ToBinary (int nInteger);
Problem 5.6
Write a function to convert a binary number to an integer value. Display the integer value on the screen. Use data from Problem
5.1 to test your function.
Function prototype
void ToInteger (int nBinary);
Problem 5.7
Consider the problem of estimating the area of a circle of radius R. One approach is to split the circle into a collection of uniform
triangles as shown in Fig. P5.7(a). A typical triangle that subtends an angle  at the center of the circle is shown in Fig. P5.7(b).

S. D. Rajan, 2000-24 5-125


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

b
h


 R

Fig. P5.7(a) Mesh A Fig. P5.7(b)

2
Since b  R sin( / 2) , h  R cos( 2) and   , we have, the area of one triangle and the estimate of the area of the
n
circle are given by
R2  2 
ae  sin  
2  n 
n
nR 2  2 
A(An )   a e  sin  
e 1 2  n 

Function prototype
double AreaTriangleA (int n);
Problem 5.8
Another approach to solving the problem discussed in Problem 5.7 is shown in Fig. P5.8. Derive the expression for the estimate
of the area AB( n ) .

Fig. P5.8 Mesh B


Function prototype
double AreaTriangleB (int n);

Numerical Analysis Concepts


Problem 5.9
Most computer systems represent negative numbers as 2’s complement. Find out what 2’s complement is and how arithmetic
operations are carried out using 2’s complement.

S. D. Rajan, 2000-24 5-126


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

Problem 5.10
Consider the area estimate problem discussed in Problems 5.7 and 5.8. Develop a procedure by which you can estimate the area
of a circle using either Mesh A, or Mesh B or both. The input to the procedure is (a) the radius of the circle and (b) the desired
accuracy. Assume that you do NOT know the exact area of the circle. The procedure must compute the estimate for the area
(within the prescribed accuracy) using the least computational effort. The computational effort, E , is defined as
q
E  100   ni2
i 1

where q is the number of times the procedure uses the formula from either Mesh A or Mesh B, and ni is the number of
triangles in the mesh. For example, if your procedure uses Mesh A twice with n  10 and n  20 , and Mesh B thrice with
n  10 , n  25 and n  50 , the total effort would be 3825.
The output from the procedure is the estimate of the area and the computational effort.
Function prototype
double TriangleAreaEstimate (double dR, double dError,
double& dComputeEffort);
Problem 5.11
In this problem we will investigate what happens with floating point operations where finite representation and rounding come
into play in what appears to be unpredictable ways. Let us assume that you are working with decimal arithmetic involving infinite
precision and 4 significant digits. Compute the following expressions and show why the evaluated expressions are not equal.
(a) 1/3 + 2/3 + 2/3
(b) 2/3 + 2/3 + 1/3

S. D. Rajan, 2000-24 5-127


N U M E R I C A L A N A L Y S I S : I N T R O D U C T I O N

References
Atkinson, An Introduction to Numerical Analysis, Wiley, 1978.
Burden and Faires, Numerical Analysis, PWS-Kent, 1988.
Press, Flannery, Teukolsky and Vetterling, Numerical Recipes in C, Cambridge Press, 1988.
Mathews and Fink, Numerical Methods Using Matlab, Prentice-Hall, 1999.
Chapra and Canale, Numerical Methods for Engineers, McGraw-Hill, 2002.
Schilling and Harris, Applied Numerical Methods for Engineers Using Matlab and C, Brooks/Cole, 2000.
Rao, Applied Numerical Methods for Engineers and Scientists, Prentice Hall, 2002.

S. D. Rajan, 2000-24 5-128


6
R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

Chapter

Root Finding, Differentiation and Integration


“Errorsusing inadequatedataaremuchlessthanthoseusingnodataatall.” Charles Babbage

“There isajokethat yourhammerwillalwaysfindnails tohit.Ifind thatperfectlyacceptable.”BenoitMandelbrot

“Mathematical reasoning may be regarded schematically as the exercise of a combination of two facilities, which we
maycall intuitionandingenuity.”AlanTuring

We will look at three numerical problems commonly encountered by engineers and scientists. First, we will examine the various
techniques to compute the roots of nonlinear functions. Note that the roots are obtained by solving the problem f ( x )  0 .
dy( x )
Next, we will look at numerical techniques to compute the (first) derivative of functions, i.e. that is particularly useful
dx
when analytical derivatives are difficult to compute. And, finally we will learn how to carry out numerical integration once again
b
useful when analytical integrations are difficult to compute. We will look at methods to compute single integrals, i.e.  f ( x )dx
a
d b
and double integrals, i.e.   f ( x , y )dxdy .
c a

Objectives
 To understand how to find the roots of nonlinear functions.
 To understand the concepts associated with numerical differentiation.
 To understand the concepts associated with numerical integration.

S. D. Rajan, 2000-24 6-129


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

6.1 Roots of Equations


The root of an equation is the set of values of the parameters for which the equation is satisfied. Many different types of
problems can be represented as root finding problems. For instance, determining the numerical value of n is equivalent to
finding the positive root of the equation x 2  n  0 . Drawing a circle of radius r is equivalent to plotting the roots of the
equation x 2  y 2  r 2  0 . Determining when a projectile undergoing uniform acceleration will hit the ground is equivalent
a
to finding the (larger) root of the equation t 2  v 0 t  h0  0 , where a is the acceleration, v 0 is the initial velocity, and h0 is
2
the initial height. Mathematicians have devised several methods, each with their own advantages and disadvantages, to compute
the roots of equations. Here we will focus on equations of the form f  x   0 . Roots of this equation are called zeros of the
function.

Bracketing, Bisection Method, and False Position Method


The bisection and false position methods rely on a general concept called bracketing. If the function whose zero is to be found
is continuous on an interval  a , b  , then according to the Intermediate Value Theorem, the zero exists in  a , b  if
f  a  f  b   0 . The latter condition means that the function takes on values of opposite signs at a and b . Conceptually,
bracketing works as follows, provided that at least one root exists in  a , b  .

Step 1: Pick a value in  a , b  (called c ) and construct two intervals,  a , c  and c , b  . At least one root exists in one of these
intervals.
Step 2: To determine which interval, use the Intermediate Value Theorem. A root exists in  a , c  if f  a  f  c   0 , and in
c , b  if f  c  f b   0 . One and only one of these conditions will be satisfied, because f  a  f  b   0 .
Step 3: We have reduced the problem of finding a root in an interval  a , b  to the (smaller) problem of finding a root in a smaller
interval. Therefore, depending on the result of 2, we can set c to a or b and iterate. Essentially, we ‘bracket’ the root in smaller
and smaller intervals – hence the name of the method.
Step 4: Continue 1-3 until the desired precision is reached. Clearly, if the final interval is  a , b  , then the root obtained by
bracketing cannot differ from the actual root by more than b  a in magnitude. The convergence criteria can be either the size
of the interval  a , b  or the magnitude of the function at c .
The bisection method and false position methods both use the pseudo-algorithm of Steps 1-4, but differ in their choice of c .
The bisection method picks c to be the midpoint of  a , b  - therefore, we halve the search interval at every iteration.
Consequently, of all methods using Steps 1-4, the bisection method’s worst case behavior is best.
The false position method uses the more promising idea of constructing a linear approximation to the function, and picking c
to be the root of the line. The line connecting  a , f  a   and  b , f  b   is
f (b )  f ( a )
y  f a   x  a  (6.1.1)
b a
Setting y to zero and solving yields
b a
c  a  f a  (6.1.2)
f (b )  f ( a )
Example 6.1.1
We will find the roots of the quadratic equation x 2  3  0 . We can use 1, 2  as the interval, as f  x  1 is negative and
f  x  2  is positive. The true root lying in the given interval is 1.73205081.

S. D. Rajan, 2000-24 6-130


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

Iteration Bisection False Position


a b a b
1 1 2 1 2
2 1.5 2 1.66666666 2
3 1.5 1.75 1.66666666 1.83333333
4 1.625 1.75 1.66666666 1.75
5 1.6875 1.75 1.70833333 1.75
6 1.71875 1.75 1.72916666 1.75
7 1.71875 1.734375 1.72916666 1.73958333
8 1.7265625 1.734375 1.72916666 1.734375
9 1.73046875 1.734375 1.73177083 1.734375
10 1.73046875 1.73242187 1.73177083 1.73307291
20 1.73204994 1.73205184 1.73205057 1.73205184
We terminate the search process when the size of the interval  a , b  becomes less than a specified tolerance of 10-5.
Bracketing is a useful root finding method because it is guaranteed to find a root, and is easy to implement. However, due to its
slow speed of convergence, mathematicians have devised faster methods. The drawback of these methods is that convergence
to a root may not be guaranteed.

Approximation Methods, Secant Method, and Newton-Raphson Method


The Newton-Raphson method and the Secant Method determine roots by constructing linear approximations of the function
in question. Provided that the approximation is good, the root of the linear approximation should be close to the true root. The
pseudo-algorithm for all approximation methods is provided below.
Step 1: Pick m points,  x 1 , f  x 1   …  x m , f  x m   , where x 1 … x m are reasonably close to the true root.
Step 2: Construct an approximation to the function in the neighborhood of the root using the m points. Determine the root of
this approximation, and call this root x m 1 .
Step 3: If the approximation was good, x m 1 should be closer to the root than any of x 1 … x m . Therefore, we can discard x 1 ,
and repeat 2 with x 2 … x m 1 .
Step 4: Continue iterating until the desired precision is reached. A typical convergence criterion is to use the magnitude of the
function at the current point and compare against the desired precision. It is helpful to use a recursive formula, if one exists, for
implementation. The formula should give the root of the approximation in terms of the points used to construct the
approximation: that is, x n  g  x n  m ,..., x n 1  .
The Secant Method and Newton-Raphson method each use Steps 1-4, with different choices of m , and different
approximations. Consequently, the recursive formula also differs.
The Secant Method uses the secant-line approximation to the function. Hence, m  2 , and the secant-line is
f ( x 2 )  f ( x1 )
y  f x1   x  x1  (6.1.3)
x 2  x1
x 2  x1
The root of this approximation is x 1  f  x 1  , and therefore the recursive formula is given by
f ( x 2 )  f ( x1 )
x n 1  x n  2
x n  g  x n  2 , x n 1   x n  2  f  x n  2  (6.1.4)
f ( x n 1 )  f ( x n  2 )
The Newton-Raphson method, on the other hand, uses calculus to construct a tangent-line approximation to the function.
Hence, m  1 . The tangent-line is
y  f  x 1   f '  x 1  x  x 1  (6.1.5)

S. D. Rajan, 2000-24 6-131


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

f  x1 
The root of the tangent-line is x 1  , so the recursive formula is
f '  x1 
f  x n 1 
x n  g  x n 1   x n 1  (6.1.6)
f '  x n 1 
Note that the Secant Method can be considered as a special case of the Newton-Raphson Method, where the derivative
f ( x 2 )  f ( x1 )
f '  x n 1  is replaced by the approximation .
x 2  x1
The advantage of the approximation methods is that if they converge, they generally converge with a greater speed than the
bracketing methods. This is because as x n  m … x n 1 approach the true root, the approximation matches the function more
and more closely in the neighborhood of the true root. In fact, both the Secant Method and the Newton-Raphson Method
have better than linear convergence in most cases.
The drawback of the approximation methods is that, unlike the bracketing methods, they do not guarantee convergence because
the root is not confined to an interval. Hence a poor initial guess or a badly behaved function may cause the methods to fail to
converge to a root. Additionally, convergence is poorer if the root is repeated (i.e., if the function can be written
f  x    x    g  x  , p  1 , where  is the root in question). The order of convergence of the Newton-Raphson Method
p

1 5
is quadratic (i.e., 2), whereas the order of convergence of the Secant Method is the golden ratio  1.618 . Calculation of
2
the derivative for the Newton-Raphson Method, either analytically or numerically, requires extra computation.
Example 6.1.2
We will find the roots of the quadratic equation x 2  3  0 using the Newton-Raphson Method. Note that f ( x )  x 2  3
and f ( x )  2 x . We will start with the initial guess for the root as x 0  1 .
n xn f ( x n ) f (xn ) f xn 
x n 1  x n 
f '  xn 

0 1 2 -2 2
1 2 4 1 1.75
2 1.75 3.5 0.0625 1.73214
3 1.73214 3.46428 0.00030898 1.73205
4 1.73205 3.4641 -2.7975(10-6) 1.73205
We terminate the iterations when the magnitude of the function is close to zero. It should be noted that the process can fail if
at any time the value of the derivative of the function is very close to zero.

Van Wijngaarden-Dekker-Brent Method


Perhaps the most widely used method for finding roots without the use of derivatives is the Brent Method. It combines the
advantages of Bisection and Secant methods while avoiding their disadvantages. As with other techniques, it requires bracketing
and the desired tolerance -  a , b  and  . The algorithm can be summarized as follows. Note that b is best current estimate of
c b
the root, a is the prior value of b and c is such that the root lies between b and c (initially, c  a ) and m  .
2
Step 1: Obtain values for a , b and  . Stop if f ( a ) f ( b )  0 since the interval does not contain the root.
Step 2: Check if f ( b )   or m   . If true, set the root to b and terminate the search.
b a
Step 3: Compute i  b  f ( b ) .
f (b )  f ( a )
Step 4: Set

S. D. Rajan, 2000-24 6-132


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

b"  i if b  i  b  m
b bm
"
otherwise
b'  b" if b  b "  
b '  b   sign( m ) b  b"  

Step 5: Set b new  b ' , a  b . If f ( b new ) and f ( b ) have the same sign, c is unchanged. Otherwise, c  b . Set b  b new . Go to
Step 2.
Example 6.1.3

We will find the root of the equation f ( x )   x  1 1   x  1  using the Brent Method taking a tolerance value of
2

t  10 5 . We will take  a , b    3, 0 .

main.cpp

We will write a simplified version of Brent’s Method. The function containing the equations whose roots will be evaluated is in
lines 24 through 36. There are four sample equations with the focus on the first one.
f ( x )   x  1 1   x  1 
2

f (x )  x 2  1
f ( x )  1  x 3  x  x  3  
 1 
 2
 ( x 1) 
f ( x )  ( x  1)e

S. D. Rajan, 2000-24 6-133


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

S. D. Rajan, 2000-24 6-134


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

The initial bracket is set in line 48 and checked in line 55. The convergence tolerance is set in line 49. The iterative loop starts in
line 67. Convergence check takes place in line 84. Steps 3, 4 and 5 are implemented in lines 94 through 117. Execution is
terminated with the iteration limit check in line 74. The number of function evaluations is tracked as an indicator of the
computational effort. The results from the execution of the program are shown in Fig. 6.1.1.

Fig. 6.1.1 Results from Example 6.1.3

6.2 Numerical Differentiation


As you may have seen from a calculus course, the derivative, or the rate of change of one variable with respect to another,
provides an understanding of the relationship between the problem variables. Numerical differentiation provides estimates for
first order and higher order derivatives when explicit differentiation is not readily available to generate the derivative expressions.

Taylor Series Approach


Taylor’s Theorem provides a means of approximating a non-polynomial function by a polynomial function. If the former
function is differentiable of order n  1 on an interval  a , b  , then it can be written as a sum of a known polynomial of degree
n and indeterminate error term – a polynomial of degree n+1. That is,
f  x   Pn  x   Rn 1  x  (6.2.1a)
1
Pn  x   f  x 0   f '  x 0  x  x 0   f ''  x 0  x  x 0  
2

2!
(6.2.1b)
1
...  f  n   x 0  x  x 0 
n

n!
x  x0 
n 1

Rn 1  x   f  n 1   (6.2.1c)
 n  1 !
where x , x 0   a , b  , and where    x 0 , x  . The function Pn  x  approximates f  x  more and more accurately in the
neighborhood of x 0 as n increases, assuming higher order derivatives are negligible compared to  n  1 ! . This is evident
f  n 1  
when considering the remainder term; if is bounded by a small number, and if x is close to x 0 , then the error in the
 n  1 !
approximation is bounded by a small number as well. Consequently, we can use a Taylor polynomial to approximate a function

S. D. Rajan, 2000-24 6-135


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

f  n 1  
in the neighborhood of a point for which the values of derivatives are known and for which the term can be ignored.
 n  1 !
The connection of Taylor’s polynomial to numerical differentiation is that if higher order derivatives are negligible compared to
 n  1 ! , we can approximate derivatives of any order using Taylor series. For instance, expanding f  x 0  h  and f  x 0  h 
in terms of a Taylor’s polynomial of degree 2, we obtain
h2 h3
f  x 0  h   f  x 0   f '  x 0  h  f ''  x 0   f '''   ,    x 0 , x 0  h 
2 6
2
(6.2.2)
h h3
f  x 0  h   f  x 0   f '  x 0  h  f ''  x 0   f '''   ,   x 0  h , x 0 
2 6
Subtracting one equation from the other, we get
f x0  h   f x0  h  h2
f ' x0    f '''   
2h 12 (6.2.3)
h2
f '''   ,    x 0 , x 0  h  ,   x 0  h , x 0 
12
This is known as the central difference formula, because it uses two points, f  x 0  h  and f  x 0  h  that are centered about
x 0 to calculate the derivative there. We can also derive forward difference and backward difference formulas by expanding
f  x 0  h  and f  x 0  h  in terms of a Taylor’s polynomial of degree 1. We have
f x0  h   f x0  h
f ' x0    f ''   ,    x 0 , x 0  h 
h 2
(6.2.4)
f x0   f x0  h  h
f ' x0    f ''   ,   x 0  h , x 0 
h 2
These formulas can be used to approximate the numerical derivative by assuming that second-order derivatives are negligible,
for the forward and backward difference formulas, and that third-order derivatives are negligible, for the central difference
formula.
Similarly, using the third-degree Taylor’s polynomials
h2 h3
f  x 0  h   f  x 0   f '  x 0  h  f ''  x 0   f '''  x 0 
2 6
h4
 f  4    ,   x 0 , x 0  h 
24
(6.2.5)
h2 h3
f  x 0  h   f  x 0   f '  x 0  h  f ''  x 0   f '''  x 0 
2 6
4
h
 f     ,   x 0  h , x 0 
4

24
and adding these equations gives
f x0  h   2 f x0   f x0  h  h2
f ''  x 0    f  4   
h2 24
(6.2.6)
h2
 f   ,    x 0 , x 0  h  ,   x 0  h , x 0 
4
24
which is the central difference formula for the second derivative. Using the second-degree Taylor’s polynomials

S. D. Rajan, 2000-24 6-136


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

h2
f  x 0  h   f  x 0   f '  x 0  h  f ''  x 0 
2
h3
 f '''   ,   x 0 , x 0  h 
6
(6.2.7)
 2h 2
f  x 0  2h   f  x 0   f '  x 0  2h  f ''  x 0 
2
 2h  3

,   x 0 , x 0  2h 
 f '''  
6
and subtracting twice the first formula from the second, we get:
f ( x 0  2h )  2 f ( x 0  h )  f ( x 0 ) h
f ( x 0 )  2
 f ( )
h 3
(6.2.8)
4h
 f ( ) ,    x 0 , x 0  h  ,   x 0 , x 0  2h 
3
The general tactic to find the forward-finite-difference approximation for f  n   x 0  is to manipulate the Taylor’s polynomials
of f  x 0  h  ... f  x 0  nh  , so that the terms f '  x 0  ... f  n 1  x 0  cancel out. Similarly, manipulation of Taylor’s
polynomials for f  x 0  h  ... f  x 0  nh  and the cancellation of f '  x 0  ... f  n 1  x 0  will result in the backward-finite-
difference approximation formulas. Finally, for the central-finite-difference approximation for f n   x 0  ,
n
f  x 0  mh  ... f  x 0  mh  should be manipulated so that lower-order derivatives disappear, where m  for n even, and
2
n 1
m for n odd. Clearly, these manipulations become increasingly more tedious as n increases.
2

Difference Formulas
Using difference operators provides a much easier approach to constructing the same formulas as above. The drawback is that
the formulas are approximations, and no explicit error term is present. The main idea is to use difference operators to
d
approximate the derivative operator, . The three approximations are:
dx
d  d  d 
  forward  ,   central  ,   backward  (6.2.9)
dx x dx  x dx x
where the operators themselves mean:
f  x i   f  x i 1   f  x i 
 f  x i   f  x i  1 2   f  x i 1 2  (6.2.10)
f  x i   f  x i   f  x i 1 
We will assume that the distance between successive x values (spacing) is constant, so x i  x i   x i  h . Using these
operators, we can easily obtain the approximations to the first derivative
df f f  x i 1   f  x i  f  x i  h   f  x i 
Forward :    (6.2.11)
dx x x h

Central :
df  f
 

f x i  1 2  f x i 1 2
 
f xi  h   f xi  h 
(6.2.12)
dx  x x 2h

S. D. Rajan, 2000-24 6-137


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

df f f  x i   f  x i 1  f  x i   f  x i  h 
Backward :    (6.2.13)
dx x x h
The reason that the step size is doubled for the central-finite-difference approximation is that the values of the function at
f  x i  h 2  and f  x i  h 2  may not be known.
Similarly, for the second derivative approximations,
d2 f   f  f  x i  2   2 f  x i 1   f  x i 
Forward :   
dx 2
x  x   x  2 (6.2.14)
f  x i  2h   2 f  x i  h   f  x i 
h2
d2 f    f  f  x i  1   2 f  x i   f  x i 1 
Central :   
dx 2
 x   x   x  2 (6.2.15)
f xi  h   2 f xi   f xi  h 
h2
d2 f   f  f  x i   2 f  x i  1   f  x i  2 
Backward :   
dx 2
x  x   x  2 (6.2.16)
f  x i   2 f  x i  h   f  x i  2h 
h2
In general, the error for the forward difference and backward difference formulae is of the order O( h ) . The error for the central
difference formula is of the order O( h 2 ) . Consequently, the central difference formula is more accurate than the other two
techniques as we will see in following example. It should be noted that the accuracy can be increased by using additional sampling
points around the point of interest.
Example 6.2.1 Forward Difference
Compute the derivative of function f ( x )  x 3  2 x 2  10 x  5 at x  2 .
Note that the analytical derivative is f ( x )  3x 2  4 x  10 and f ( x  2)  14 . The table below shows the calculations
f  f exact
using Eqn. (6.2.11). The relative error is defined as FD where f exact  14 .
f exact
h f(x+h) f(x) f'(x) Rel Error
1.000000E-15 1.5000000000000E+01 1.5000000000000E+01 1.065814E+01 -2.387042E-01
1.000000E-10 1.5000000001400E+01 1.5000000000000E+01 1.400000E+01 8.274037E-08
1.000000E-08 1.5000000140000E+01 1.5000000000000E+01 1.400000E+01 6.610792E-09
1.000000E-05 1.5000140000400E+01 1.5000000000000E+01 1.400004E+01 2.857156E-06
1.000000E-02 1.5140401000000E+01 1.5000000000000E+01 1.404010E+01 2.864286E-03
1.000000E-01 1.6441000000000E+01 1.5000000000000E+01 1.441000E+01 2.928571E-02

Example 6.2.2 Backward Difference


f BD  f exact
Redo Example 6.2.1. The table below shows the calculations using Eqn. (6.2.13). The relative error is defined as
f exact
where f exact  14 .
h f(x-h) f(x) f'(x) Rel Error
1.000000E-15 1.5000000000000E+01 1.5000000000000E+01 1.421085E+01 1.506105E-02
1.000000E-10 1.4999999998600E+01 1.5000000000000E+01 1.400000E+01 8.274037E-08

S. D. Rajan, 2000-24 6-138


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

1.000000E-08 1.4999999860000E+01 1.5000000000000E+01 1.400000E+01 6.610792E-09


1.000000E-05 1.4999860000400E+01 1.5000000000000E+01 1.399996E+01 -2.857130E-06
1.000000E-02 1.4860399000000E+01 1.5000000000000E+01 1.396010E+01 -2.850000E-03
1.000000E-01 1.3639000000000E+01 1.5000000000000E+01 1.361000E+01 -2.785714E-02

Example 6.2.3 Central Difference


f CD  f exact
Redo Example 6.2.1. The table below shows the calculations using Eqn. (6.2.12). The relative error is defined as
f exact
where f exact  14 .
h f(x-h) f(x+h) f'(x) Rel Error
1.000000E-15 1.5000000000000E+01 1.5000000000000E+01 1.243450E+01 -1.118216E-01
1.000000E-10 1.4999999998600E+01 1.5000000001400E+01 1.400000E+01 8.274037E-08
1.000000E-08 1.4999999860000E+01 1.5000000140000E+01 1.400000E+01 6.610792E-09
1.000000E-05 1.4999860000400E+01 1.5000140000400E+01 1.400000E+01 1.289521E-11
1.000000E-02 1.4860399000000E+01 1.5140401000000E+01 1.400010E+01 7.142857E-06
1.000000E-01 1.3639000000000E+01 1.6441000000000E+01 1.401000E+01 7.142857E-04

The following table shows the difference in results using the three techniques. The central difference technique is by far the
most accurate with the smallest error.

h FD BD CD Best
1.000000E-15 -2.39E-01 1.51E-02 -1.12E-01 BD
1.000000E-10 8.27E-08 8.27E-08 8.27E-08 All
1.000000E-08 6.61E-09 6.61E-09 6.61E-09 All
1.000000E-05 2.86E-06 -2.86E-06 1.29E-11 CD
1.000000E-02 2.86E-03 -2.85E-03 7.14E-06 CD
1.000000E-01 2.93E-02 -2.79E-02 7.14E-04 CD

Example Program 6.2.1 Difference-based Numerical Differentiation


Develop a computer program to implement the forward, backward and central difference schemes to compute the derivative
of the function listed in Example 6.2.1.

The first step is to create the function prototypes for the three methods.
NumDerivative.h

A new concept is used here – function pointers. We will discuss details of this approach at the end of this chapter. Each function
has 4 arguments – the location at which the derivative is computed, the selected function (number) from a list of functions, the
spacing, and a pointer to a user-defined function. The last argument – double(*userfunc)(int fnc, double dX), denotes a

S. D. Rajan, 2000-24 6-139


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

function whose return type is double and accepts two arguments (the selected function from a list of functions whose derivative
is to be computed and the location at which the derivative is to be computed. The syntax *usefunc indicates a pointer to a
function. We will see more about pointers in Chapter 8. The next step is to define the three difference methods.
NumDerivative.cpp

Eqn. (6.2.11) is coded in lines 7-15, Eqn. (6.2.12) in lines 27-35, and Eqn. (6.2.13) in lines 17-25. Finally, the main program in
which the problem is defined and the solution techniques are called, is developed.
main.cpp

S. D. Rajan, 2000-24 6-140


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

Lines 16-25 are used to define the two functions supported in the program. The function, MyFunction, will be called in the main
program when the evaluation f ( x ) needs to take place. The two arguments are the function to be used and the location, x ,
at which the function is being evaluated.

Examples 6.2.1-6.2.3 are replicated in the program. Lines 29-32 are used to help track the method being used. Line 34 shows
the different values of the spacing variable, h used in the program. The location at which the function is evaluated is specified
in line 35. The exact solution that is used later in the program to assess the accuracy, is defined in line 36.

The method that yields the best estimate of the derivative and the associated values are tracked via variables dBest, dBestDeriv,
dBesth and BestMethod (lines 45, 46, and 48). The number of different spacing values stored in vector dhTrial is found simply
by dividing the storage size of the vector (found by using the sizeof function) by the storage size of one of the elements of the
vector in line 47. Alternately, we could have defined the number of elements simply as const int nElements = 6. Note how

S. D. Rajan, 2000-24 6-141


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

the evaluation of the derivative takes place in lines 52, 61 and 70 via the calls to the three different techniques. The fourth
argument is the name of the function in which the evaluation f ( x ) needs to take place.

Details of each evaluation are displayed (lines 79-91). Finally, in lines 94-97, the best estimate is displayed. The program output
is shown in Fig. 6.2.1.

Fig. 6.2.1 Output from Example 6.2.1

6.3 Numerical Integration


Consider the problem of evaluating the integral
b
I   F ( x ) dx (6.3.1)
a

We will assume that we either know the function F ( x ) that is difficult to integrate exactly or that we have a set of discrete data
(points where the function has been evaluated numerically). In either case, we need to develop a numerical technique to evaluate
the integral.
The basic idea in numerical integration is to construct another function P ( x ) (usually a polynomial) that is a suitable
approximation of F ( x ) and is simple to integrate. The interpolating polynomial of degree n, denoted Pn , is such that it
interpolates the integrand at (n+1) points in the interval  a , b  . While there exist errors, E  F ( x )  Pn ( x ) , the error may not
always be of the same sign so that the overall error is small.

S. D. Rajan, 2000-24 6-142


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

F(x)
Pn (x)
F(x)

x
a=x0 x1 x2 x3 x4 b=x5
Fig. 6.3.1 Original function and its polynomial approximation
Hence the equivalent integral is given by
b b
I   F ( x ) dx   Pn ( x ) dx (6.3.2)
a a

Fig. 6.3.1 shows the situation where the discrete values are known at 6 points and the approximate function Pn ( x ) is made to
pass through those six points. In more general terms, we could have a number of scenarios.
(1) We know the data at exactly (n+1) points and we fit a polynomial of degree n that passes through those points (as
shown in Fig. 6.3.1).
(2) We know the data at more than (n+1) points and we fit a polynomial of degree n using a concept such as least-squares
fit. We will look at least-squares fit in Chapter 11.
(3) If, on the other hand, we know the function F ( x ) , then we can evaluate the function at (n+1) points and use the
approach associated with scenario (1).

Newton-Cotes
When the function to be integrated is known at equally spaced points, we can use the forward difference polynomial (see Section
6.2) and fit the data. Recall that
s  s  1
Pn  x 0  sh   f 0  s f 0   2 f 0  ...
2
(6.3.3)
s  s  1 s  2  ...[ s   n  1) n
  f 0  error
n!
where
x  x 0  sh (6.3.4a)
 s  n 1  n 1
error   h f   x0  x  xn (6.3.4b)
 n  1
Hence,
b b s(b )

I   F ( x ) dx   Pn ( x ) dx  h  P ( s ) ds
n (6.3.5)
a a s(a )

We can match the limits of integration by recognizing that the point x  a corresponds to s  0 and x  b corresponds to
s  s . Hence using Eqn. (6.3.5), we have

S. D. Rajan, 2000-24 6-143


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

s
I  h  Pn ( x 0  sh ) ds (6.3.6)
0

The value of n determines the different Newton-Cotes scheme and hence the obtained precision.

Trapezoidal Rule
We obtain this rule by fitting a linear polynomial to two discrete points. From Fig. 6.3.2, we need to compute the shaded area.
The upper limit of integration x 1 corresponds to s  1 . We can rewrite Eqn. (6.3.6) as
x1 x1
 s2 
I 1  h  ( f 0  s f 0 ) ds  h  sf 0  f 0  (6.3.7)
x0  2  x0
where the left hand represents the integral only for the first interval. Denoting f 0  f 1  f 0 , we can rewrite the above equation
as
1
I1  h  f 0  f1  (6.3.8)
2
and represent the entire integral as
n n
1
I   I i   hi  f i 1  f i  (6.3.9)
i 1 i 1 2
where
hi  x i  x i 1 (6.3.10)
We can further simplify the formula is we assume that the points are equally spaced.
1
I h  f 0  2 f 1  2 f 2  ...  2 f n 1  f n  (6.3.11)
2

F(x)

P1 (x)

x0 x1 x2 xn-1 x
x3 xn
Fig. 6.3.2 Trapezoidal Rule
We can compute the error by integrating Eqn. (6.3.4b) as follows.
s  s  1 2
1
1
Error  h  h f ( )ds   h 3 f ( )  O  h 3  (6.3.12)
0
2 12
The total error for equally space data is given by
n n
1  1 
 Error    12 h
i 1 i 1
3
f ( )  n   h 3 f (  ) 
 12 
(6.3.13)

S. D. Rajan, 2000-24 6-144


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

xn  x0
where x 0    x n . The number of increments n  . Therefore
h
1
Total Error    x n  x 0  h 2 f ( )  O( h 2 ) (6.3.14)
12

Example 6.3.1 Trapezoidal Rule


1

x dx . The exact answer is 0.4. We will compute the integral for different values of sampling points, n  1 . Note
4
Evaluate
1
b a
that h  .
n
1  ( 1) 1
(a) n  1 , h   2 . I  (2)  f 0  f 1    1  1   2
4 4

1 2
1  ( 1) 1 1
(b) n  2 , h   1 . I  (1)  f 0  2 f 1  f 2    1  2  0   1   1
4 4 4

2 2 2
1  ( 1)
(c) n  4 , h   0.5 .
4
1 1
I  (0.5)  f 0  2 f 1  2 f 2  2 f 3  f 4    1  2  0.5   2  0   2  0.5   1   0.5625
4 4 4 4 4

2 4

Simpson’s Rule
We obtain this rule by fitting a quadratic polynomial through three equally spaced points. Fig. 6.3.3 shows the shaded area arising
from this computation. We can write the integral as
s  s  1 2 
x
2
 1
I 1  h   f 0  s f 0   f 0  ds  h  f 0  4 f 1  f 2  (6.3.12)
x0 
2  3
where the left hand represents the integral only for the first interval. As before we can represent the entire integral as
1
I  h  f 0  4 f 1  2 f 2  4 f 3  ...  2 f n  2  4 f n 1  f n  (6.3.13)
3
F(x)

P2 (x)

x0 x1 x2 xn-1 x
x3 xn

Fig. 6.3.3 Simpson’s Rule


Example Program 6.3.2 Trapezoidal and Simpson’s Rule
1

x
4
Evaluate dx . We will write a computer program to evaluate the integral for different values of sampling points, n ,
1
1 1
x5
1  n  1024 . Note that  x dx   0.4 .
4

1
5 1

S. D. Rajan, 2000-24 6-145


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

main.cpp

We will write a slightly more general program to integrate two functions that can then be extended by the reader with support
for other functions. The function enhancement is shown in lines 16 through 24. The second integral is
2
2x 5  x  3
1 x 2 dx  9  ln 2  8.30685

Initialization of the program and the two methods takes place in lines 29 through 46. Line 29 can be edited to select the function
for integration. The number of supported functions is in line 30. Line 31 is used to specify the number of points used in
evaluating the integral. The limits of integration are stored in dVLow and dVHigh, and used in lines 45-46.
The main loop where the integral is computed starts in line 47 and extends to line 71 with the evaluation of the function taking
place first at the left end (line 51), then at the interior points (line 57 for Trapezoidal Rule and line 59 for Simpson’s Rule), and
finally at the right end (line 62 for Trapezoidal Rule and line 65 for Simpson’s Rule).

S. D. Rajan, 2000-24 6-146


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

The output from the computer program is shown in Fig. 6.3.4.

Fig. 6.3.4 Output from Example 6.3.2


As the results show, for the same computational effort, Simpson’s Rule is more accurate than Trapezoidal Rule.

Gauss-Legendre Quadrature
The traditional Newton-Cotes techniques such as Trapezoidal Rule or Simpson’s Rule are not as efficient or accurate as Gauss-
Legendre Quadrature (G-LQ).
In the G-LQ technique, the base points x i and the weights w i are chosen so that the sum of the (n+1) appropriately weighted
values of the function yields the integral exactly when F ( x ) is a polynomial of degree (2n+1) or less.
b 1 ^ n ^

 F ( x ) dx   F   d    wi F i 
a 1 i 1
(6.3.14)

where i are the base points (or, roots of the Legendre polynomial Pn  1 ( ) ), and
dx ^
F ( x )dx  F ( x ( ))  d   F ( x ( ))  J ( )d   F ( )d  (6.3.15)
d
where J is the Jacobian. The following two points should be noted.
(a) Gauss-Legendre is more efficient because it requires fewer base points to achieve the same level of accuracy as the
Newton-Cotes methods, and

S. D. Rajan, 2000-24 6-147


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

(b) The error is zero if the (2n  2)th derivative of the integrand vanishes. Or, a polynomial of degree n is integrated exactly
by employing ( n  1) 2 Gauss points.
To understand how Gauss-Legendre works, consider a rewrite of Eqn. (6.3.14)
1
I   f ( )d   w1 f (1 )  w 2 f ( 2 )  ......  wn f (n ) (6.3.16)
1

One-Point Formula
1
I  f ( )d   w
1
1 f (1 ) (6.3.17)

The integration is exact if f   is a linear polynomial, i.e. f    a 0  a1 . Hence the error, e is given by
1
e   ( a 0  a1 )d   w1 f (1 )  0 (6.3.18a)
1

or, e  2a 0  w1 ( a 0  a11 )  a 0  2  w1   w1a11  0 (6.3.18b)


The error is zero if
2  w1  0 (6.3.18c)
and w 1a 11  0 (6.3.18d)
Solving, we obtain, w1  2 and 1  0 .

Two-Point Formula
1

 f ( )d   w
1
1 f (1 )  w 2 f ( 2 ) (6.3.19)

The integration is exact if f    a 0  a1  a 2 2  a 3 3 is a cubic polynomial. Hence the error, e is given by
1
e   ( a 0  a1  a 2 2  a 3 3 )d   w1 f (1 )  w 2 f (2 )  0 (6.3.20)
1

2
or, e  2a 0  a 2  w1 ( a 0  a11  a 212  a 313 )  w 2 ( a 0  a1 2  a 2 22  a 3 23 )  0
3
 2 
or, e  a 0  2  w 1  w 2   a 1  w 11  w 2 2   a 2    w 112  w 2 22   a 3  w 113  w 2 23   0
 3 
The error is zero if
w1  w 2  2 (6.3.21a)
w 11  w 2 2  0 (6.3.21b)
2
w112  w 2 22  (6.3.21c)
3
w 113  w 2 23  0 (6.3.21d)
1
Solving, w 1  w 2  1 and 1   2  . The results of the derivation and more are summarized in Table 6.3.1.
3

S. D. Rajan, 2000-24 6-148


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

Table 6.3.1 Gauss points and weights


Order, n Weight Location
1 2.0 0.0
2 1.0 0.57735 02691
1.0 -0.57735 02691
3 0.55555 55555 0.77459 66692
0.55555 55555 -0.77459 66692
0.88888 88888 0.0
Example 6.3.3
1

x
4
Evaluate dx . The exact answer is 0.4.
1

We have F ( x )  F (  )  x 4 . No Jacobian is needed.


Using n  1 : w 1  2.0 and 1  0.0 . Hence I  (2.0)(0.0)4  0.0 .
Using n  2 : ( w 1 , 1 )  (1.0, 0.5773502691) and ( w 1 , 1 )  (1.0, 0.5773502691) .
Hence I  (1.0)(0.5773502691)4  (1.0)( 0.5773502691)4  0.222222222 .
Using n  3 : ( w1 , 1 )  (0.5555555555, 0.7745966692)
( w 2 ,  2 )  (0.5555555555, 0.7745966692)
( w 3 ,  3 )  (0.8888888888, 0.0)
Hence I  (0.5555555555)(0.7745966692)4  (0.5555555555)( 0.7745966692)4
(0.8888888888)(0.0)4  0.4
( n  1) 4  1
The original function is a polynomial of degree 4 and can be integrated exactly by Gauss Quadrature rule  3
2 2
(note that the rounding takes place to the next highest integer). The results bear out the rule.
Example 6.3.4
1
 2x  1 
Evaluate   2 dx .
1 
x  6x  13 
The integrand is not a polynomial. The exact solution is 0.1119 . We will solve with n  2 and n  3 rules as shown below.
Order, n wi i f ( i ) w i f  i 
2 1.0 0.57735 02691 0.015675 0.015675
1.0 -0.57735 02691 -0.128276 -0.128276
TOTAL -0.112601
3 0.55555 55555 0.77459 66692 0.0613458 0.034081
0.55555 55555 -0.77459 66692 -0.1397 -0.0776113
0.88888 88888 0.0 -0.0769231 -0.0683761
TOTAL -0.111906

Example 6.3.5
7 7
1 1
Evaluate  dx . Exact: I   dx  ln(7)  1.94591
1
x 1
x
To use G-Q Rule we must first construct a mapping function to map the given domain 1, 7 to the required domain  1,1
. The mapping function is

S. D. Rajan, 2000-24 6-149


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

1 1 1 1


x x1  x2  1   7   4  3
2 2 2 2
dx
3
d
Hence,
7 1 n
1 1
1 x dx   4  3 3d   
i 1
w i f i 
1

Using n=4 rule, we have the following results.


psi w f(psi) w*f(psi)
-0.861136312 0.34785484 2.11776 0.736673
-0.339981044 0.65214549 1.006692 0.65651
0.339981044 0.65214549 0.597616 0.389733
0.861136312 0.34785484 0.455691 0.158514
Sum= 1.94143
Note I 1  1.5 , I 2  1.846 , I 3  1.9245 and I 5  1.944981413 .

Two-Dimensional Functions: Functions that involve two independent natural coordinates are handled in a manner similar
to one-dimensional functions.
d b 1 1

  F  x , y  dx dy 
c a
  F  x ( , ), y( , ) J d  d
1 1

n n
  w i w j f i , j  (6.3.16a)
j 1 i 1

where f  i , j   F  x ( , ), y ( , )  J (6.3.16b)

 x y 
   
J  det( J ) where J 22   (6.3.16c)
 x y 
   

The values of the weights and natural coordinates are the same as shown in Table 6.3.1 except that  in the table refers to both
 and  .
Example 6.3.6
1 1

 x dx dy . The exact answer is 4 .


2
Evaluate 3
1 1

Using the n  2 rule, we have four Gauss points.


i j wi wj f (  i , j ) w i w j f (  i , j )
-0.5773502691 -0.57735 02691 1.0 1.0 13 13
0.5773502691 -0.57735 02691 1.0 1.0 13 13
0.5773502691 0.57735 02691 1.0 1.0 13 13
-0.5773502691 0.57735 02691 1.0 1.0 13 13
TOTAL 4 3

S. D. Rajan, 2000-24 6-150


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

( n  1) (2  1)
The answer obtained is the exact answer. Once again, the appropriate quadrature order is   2 and using the
2 2
rule leads to the exact answer.
Example 6.3.7
3
Evaluate I    x  y  dxdy
Rxy

1 Rxy

x
1 2 3
Analytical Approach: The sides of the domain are not parallel to the axes making it difficult to set the limits of integration.
However, the sides are such that
x  y  c1 and x  2 y  c2
Therefore, we can introduce two new variables and set up a mapping such that
u x y and v  x 2 y
The transformed domain is shown below.
v

u
1 4
2 3
-1
Ruv
-2

3 3
Hence, I    x  y  dxdy    u  det( J )dudv
Rxy Ruv

( x , y ) 1 1 1
det( J )    
( u , v ) ( u , v ) 1 1 3
( x , y ) 1 2
1 4
3 u3 765
Substituting, I    u  det( J )dudv    3 dudv   63.75
Ruv 2 1 12

S. D. Rajan, 2000-24 6-151


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

Numerical Approach: We will construct the appropriate mapping functions to map the given integration domain into a square
  [ 1,1],  [ 1,1] . Using the following mapping functions
4 4
x   i   ,   x i y   i   ,   y i
i 1 i 1

1 1
where 1  1   1    2  1   1   
4 4
1 1
3  1   1    4  1   1   
4 4
The given domain is such that  x 1 , y1   1, 0  ,  x 2 , y 2    3,1 ,  x 3 , y 3    2, 2  and  x 4 , y 4    0,1 . The jacobian
can be constructed as
1  4 2 3
J 2 2  det( J ) 
4  2 2  4
(a) We will use the one-point rule first.
i  j 1 w i  w j  2.0  ,    0, 0 
i j

 4   4  1 1  5
x  y   i x i    i y i    1  3  2  0    0  1  2  1  
 i 1   i 1  4 4  2
3
5 3
I   2  2       46.875
2 4
(b) Now the two point rule. The details of the calculations are shown below.
i j 1 2 3 4 x y
wi wj x  y 
3

-0.57735 -0.57735 1.0 1.0 0.622008 0.166667 0.044658 0.166667 1.21133 0.42265 4.362508
0.57735 -0.57735 1.0 1.0 0.166667 0.622008 0.166667 0.044658 2.36603 1 38.13748
0.57735 0.57735 1.0 1.0 0.044658 0.166667 0.622008 0.166667 1.78868 1.57735 38.13748
-0.57735 0.57735 1.0 1.0 0.166667 0.044658 0.166667 0.622008 0.63398 1 4.362508
I  det( J )  x  y 
3 63.75

Example Program NumericalIntegration


A detailed project called NumericalIntegration is available to the readers to understand the numerical integration methods
discussed in this section and more. The features in the program include the following:
(a) Gauss-Legendre quadrature for integration over one-dimensional space, two-dimensional space (quadrilateral,
triangular domains), three-dimensional space (wedge, tetrahedral and hexahedral domains), with multiple examples,
(b) Gauss-Laguerre quadrature (reader can use the framework to add test cases), and
(c) Using the numerical integration techniques to integrate over other shapes such as circles and spheres.
The reader is encouraged to examine, execute and enhance the capabilities of the program.

6.4 User-Defined Functions


In the preceding sections we saw three numerical analysis topics. Each topic has multiple solution techniques that one would
like to see implemented in a numerical analysis library in as general a manner as possible. Clearly we would like to separate the
implementation of the numerical analysis technique from the user provided or generated information. For example, Newton-
Raphson is a general technique for finding a root of any function not a specific function. If we develop the source code for
Newton-Raphson technique, then how do we compute the function and gradient values that are needed to obtain the solution?

S. D. Rajan, 2000-24 6-152


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

The solution is in the use of function pointers and callback functions. We will see pointers in Chapter 8 and understand how
best to use them. However, at this stage it should not prevent us from using the concept in developing and writing effective,
general-purpose source code. We will illustrate the idea using the Newton-Raphson technique and an example.

Example 6.4.1
We will illustrate the ideas for a general-purpose numerical analysis interface for obtaining values from user-defined code
through the use of Newton-Raphson technique. The specific example is to compute a root of the function
f ( x )  ( x  2.3)( x  4.56)( x  3.7)
Step 1: Construct the gateway to Newton-Raphson technique. By this we mean, we will develop the interface or function
prototype.
void NewtonRaphson (double& dRoot, int& nMaxIter,
const double dConvTol, int fnc,
void(*userfunc)(double dX, double& dFX, double& dDX));
The function arguments are as follows.
dRoot Root. Input is the initial guess, and the returned value is the estimate of the root.
nMaxIter Input: Maximum number of iterations; Output: Actual number of iterations taken.
dConvTol Convergence tolerance.
fnc Function from a list of functions to solve.
*userfunc Pointer to the function that will be called to compute the function value (dFX) and derivative value (dDX) at
the current point (dX). One must pay particular attention to this function – (a) the function prototype is like
any other function, and (b) it helps to have the minimum number of arguments in the function call. In this
case, the function is passed the value of the current point and the function returns the function and the first
derivative values at the current point.
The function prototype is as follows (newtonraphson.h):

Note the enumerated type in line 6 that will be used to handle exceptional situations connected with various aspects of the
technique – invalid user input, encountering a zero (or numerically very small) derivative during one of the evaluations of the
derivative, and not being able to converge to the root within the specified number of iterations. As we will see, adding the class
keyword with the enumerated type declaration will make it easier to refer to a specific exception especially in large programs.
Step 2: Create the function that implements the Newton-Raphson technique (newtonraphson.cpp):

S. D. Rajan, 2000-24 6-153


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

The first check for correctness of input is in lines 17-18 where an exception is thrown if the maximum number of iterations is
less than or equal to zero. The iterative loop is between lines 21 and 44. If the program is unable to compute the root within
those statements, an exception (non-convergence) is thrown in line 47. The user-defined function is called in line 24 that uses
df ( x )
the current value of the root, dRoot, as x and computes f ( x ), . Convergence check is carried out in lines 34-35 and if
dx
one of the conditions are met, the results are stored in dRoot and nMaxIterations, and control is returned to the calling program
(main).

Step 3: Create the code to test the technique (main.cpp). The first sub-step is to create the user-defined function, MyFunction.

S. D. Rajan, 2000-24 6-154


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

Recall that the user-defined function is used as the fifth argument in the call to the NewtonRaphson function. We have selected
the name of the function as MyFunction.

The main program is divided into three parts. The first part is the initialization part contained in lines 34-37. These lines should
be edited as required. The second part is the try block. In this block, first the user input is obtained in lines 42-47. Next the
NewtonRaphson function is called. If the execution is successful, the results are shown on the screen (lines 52-54). The third part
contains the catch blocks – a block to handle system-generated errors (logical errors such as invalid argument, out-of-range
when using containers, etc., run time errors such as overflow, underflow, etc., and many more).

Here is a brief list of the improvements that can be made to the program:
(1) Are there other user input checks that can be carried out in the NewtonRaphson function? Implement them by throwing
an appropriate exception.
(2) Are there checks that can be carried out in the MyFunction function? Implement them by throwing an appropriate
exception.
(3) Are there checks that can be carried out in the main function? Implement them by throwing an appropriate exception.

Section 13.4 has more advanced scenarios involving object-oriented design that we will see and use at the appropriate time.

S. D. Rajan, 2000-24 6-155


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

Summary
We looked at three very useful numerical techniques – finding roots of nonlinear equations, numerical differentiation, and
numerical integration. Note that the techniques are quite general with minimal restrictions. The Bisection and the False Position
methods require that the function be continuous in the interval containing the root. The Newton-Raphson method requires
that the function and its first derivative be continuous in the interval containing the root. The Brent Method is a derivative free
method and only requires that the function be continuous in the interval containing the root. Similarly, all the numerical
derivative techniques and integration techniques require that the function be continuous. This generality in terms of handling
any function requires that the program be written in such a manner that the user of the technique has little difficulty in providing
the value(s) needed by the technique. While some of the more advanced C++ concepts that facilitate this usage have not been
discussed so far, Section 6.4 shows how this can be done now.

Where to go from here?


The object-oriented ideas and techniques will be introduced and refined starting in the next chapter. It is important that the
reader revisit this chapter after learning how to write C++ programs using object-oriented ideas. The ideal time would be after
completing Chapter 13 so that reusable classes incorporating these numerical techniques can be developed and (re)used.

S. D. Rajan, 2000-24 6-156


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

Exercises
Most of the problems below involve the development of one or more functions. In each case (a) develop a plan to
test the function(s), and (b) implement the plan in a main program. The functions should not use cin or cout
unless specified. Put the main program in a separate file and the function(s) in separate files.

Appetizers
Problem 6.1
Solve for the roots of the following functions using Bisection Method.
(a) Solve for all the positive roots of the function f ( x )  x 3  2 x 2  6x  x  6 .
(b) Solve for the negative root of the function f ( x )  ( x  1)( x  2)( x  3) .
(c) Solve for the roots of the function f ( x )  0.3  cos( x ) .

Problem 6.2
Solve for the roots of the functions given in Problem 6.1 using the Newton-Raphson Method.
Problem 6.3
Compare the first derivatives computed using forward difference, central difference and backward difference for the following
functions.
(a) f ( x )  x 3  2 x 2  10 x  5 at x  4 .
(b) f ( x )  x 3  2 x 2  6x  x  6 at x  2 .

2x 3  4 x  10
(c) f (x )  at x  5 .
 x  1 x  2 
Problem 6.4
Compute the following integrals using Trapezoidal Rule with n  2, 4,8,16,...256 .
2 4

 x  2x  10  dx
1
 x
2
(a) (b) dx
0 2
2
 2 x  10 

Problem 6.5
Compute the integrals given in Problem 6.4 using Simpson’s Rule.

Main Course
Problem 6.6
Program the Bisection Method. Use the problems in Problem 6.1 as test cases.
Problem 6.7
Program the forward, backward and central difference techniques. Compare their performances by using the problems in
Problem 6.3 as test cases.
Problem 6.8
Modify the program Example6_3_2 by adding the Gauss-Legendre Quadrature technique. Use the program to compare the
performances of Trapezoidal Rule, Simpson’s Rule and the Gauss-Legendre Quadrature by using the functions in Problem 6.4
as test cases.

S. D. Rajan, 2000-24 6-157


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

Numerical Analysis Concepts


Problem 6.9
The Brent Method presented in Section 6.1 is an implementation following the original work discussed in R. Brent, Algorithms for
Minimization Without Derivatives, Prentice-Hall, 1973. Improvements have been made in two publications.
Zhang, Zhengqiu (2011), "An Improvement to the Brent's Method", International Journal of Experimental Algorithms, 2 (1): 21–26.
Stage, Steven A. (2013), "Comments on An Improvement to the Brent's Method", International Journal of Experimental Algorithms,
4 (1): 1–16.
Use the algorithms in the two publications and implement the improved Brent Method in the project Example6_1_3. Compare
the performances of the original and the improved implementations.
Problem 6.10
As we saw in Section 6.2, there are several techniques to compute numerical derivatives. One method that improves on the
forward, central and backward difference techniques is a fourth-order approximation of the derivative given as
 f  x  2h   8 f  x  h   8 f  x  h   f  x  2h 
f ( x )   O h 4 
12h
Implement the new approximation and compare its performance (compute effort and accuracy) with the backward difference
technique using a number of different functions.
Problem 6.11
Most techniques that are robust for finding a root of a nonlinear function, f ( x ) , require that the root be bracketed. For
example, a root exists between a and b if f ( a ) f ( b )  0 . Use heuristics to develop an algorithm to bracket the root.

S. D. Rajan, 2000-24 6-158


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

References
Atkinson, An Introduction to Numerical Analysis, Wiley, 1978.
Burden and Faires, Numerical Analysis, PWS-Kent, 1988.
Press, Flannery, Teukolsky and Vetterling, Numerical Recipes in C, Cambridge Press, 1988.
Mathews and Fink, Numerical Methods Using Matlab, Prentice-Hall, 1999.
Chapra and Canale, Numerical Methods for Engineers, McGraw-Hill, 2002.
Schilling and Harris, Applied Numerical Methods for Engineers Using Matlab and C, Brooks/Cole, 2000.
Rao, Applied Numerical Methods for Engineers and Scientists, Prentice Hall, 2002.

S. D. Rajan, 2000-24 6-159


R O O T F I N D I N G , D I F F E R E N T I A T I O N & I N T E G R A T I O N

S. D. Rajan, 2000-24 6-160


7
C L A S S E S : O B J E C T S 1 0 1

Chapter

Classes: Objects 101


“The person who knows “how” will always have a job. The person who knows “why” will always be his boss.” Diane
Ravitch

“Reallearningcomesaboutwhenthe competitivespirithasceased.”J.Krishnamurti

“Computerscienceisnomoreaboutcomputersthanastronomyisabouttelescopes.”EdsgerDijkstra

Classes and objects were introduced in Chapter 1. It cannot be overemphasized that proper definition and use of classes can
lead to increased productivity with all aspects of software engineering. In this chapter, we will begin the long, systematic process
of understanding what classes are and how to use them effectively in simple and complex computer programs. In other words,
we will try to understand why object-oriented programming (OOP) is the choice for developing useful programs. More
advanced concepts will be covered in Chapters 8, 9 and 13.

Objectives
 To understand what objects and classes are.
 To understand what data abstraction and encapsulation are.
 To learn how to define and use classes.
 To leverage the functionalities provided by the functions in the book’s library directory to write more robust computer
programs.
 To learn more about the standard string class.
 To understand and begin to use exception handling in OOP.

S. D. Rajan, 2000-24 7-161


C L A S S E S : O B J E C T S 1 0 1

7.1 A Detour – Why OOP?


It is unlikely that a program refers to a single object or entity. Most practical situations involve several different objects that
interact with each other. To understand what this means, we will look at a simple example.
Problem Statement: Given the relevant data of a typical planar truss, we are required to develop the procedure to draw the truss
on a graphing canvas. A truss is made up of one or more members. These members are straight and slender. Each member is
connected to two joints. Each joint is located in the coordinate space such that each joint has  x , y  coordinate values. To
identify these members and joints in a truss, we used simple integers starting at 1 so that the joints are numbered 1, 2, 3, … and
the elements are numbered 1, 2, 3, … It is further stipulated that no two joints would have the same identification number and
the same applies to the members.
To simplify the issue, we can look at the truss as a collection of points (joints) and straight lines (members). A graphical
representation of the truss in this manner is known in computer graphics as a wireframe representation as shown in Fig. 7.1.1(b).
a
1

a
1

3m
2 3 4 A

y 2
3m
a
Y
x
(a) 3 a
X
4m A

Computer Wireframe Representation


Model on a Graphical Canvas
(b)
Fig. 7.1.1 (a) A wood truss (b) A structural model suitable for a computer solution and its wireframe representation

Solution: We will now develop the general algorithm to meet our objectives.
Variable Dictionary
nPoints Number of points
nLines Number of lines
Algorithm
(1) Obtain the total number of points, nPoints, and total number of (straight) lines, nLines.
(2) Obtain the  x , y  coordinates of each point. Obtain the start point and end point numbers for each line. Compute the
smallest and the largest  x , y  values as  x min , x max  and  y min , y max  using all the points.
(3) Compute the scale for the graph based on  x min , x max  and  y min , y max  values and the dimensions of the graphing
canvas a  a so that the truss would fit completely on the graphing canvas. For an anisotropic scaling, the scaling values
are different in the x and the y directions. For an isotropic scaling, there is one value that is the smaller of the two scaling
values.
(4) Loop through all the lines, i .
(5) For the current line, obtain the start point number. Compute the graph coordinates  x gs , y gs  . Move to  x gs , y gs  .

(6) Obtain the end point number. Compute the graph coordinates  x ge , y ge  . Draw the line to  x ge , y ge  .
(7) End loop i .

S. D. Rajan, 2000-24 7-162


C L A S S E S : O B J E C T S 1 0 1

We will now tackle the problem of translating this algorithm into a computer program. An examination of the algorithm shows
that there are two major pieces of information (entities) that we must handle – data associated with points, and data associated
with lines. Several questions arise naturally. The most obvious one is “What data structure should be used?”, or “How should
the problem data be stored?”. Let us assume that we store the point and line data in arrays (vectors to be specific) as shown
below. We will name this approach Array-based Solution. The index of the vector will provide the data access mechanism.

Array-based Solution
Array Size Remarks
fVX nPoints Vector containing the global x-coordinates of all the points
fVY nPoints Vector containing the global y-coordinates of all the points
nVSP nLines Vector containing the start point number of all the lines
nVEP nLines Vector containing the end point number of all the lines

Example Program 7.1.1 Solution using Array-Based Approach


main.cpp

Since we have not defined the details of the drawing-related functions – scaling, computing the graph coordinates, move and
draw, we will simply provide their prototypes as shown in lines 18-23. Recall how C++ treats variables and functions – they
must be defined before they can be used. We will store the arrays in C++’s vector container. A container, as the name suggests,
not only provides storage space to store different data types but also provides operations that make data manipulation easy to
implement. The primary advantage of the vector container over vector declarations that we have seen earlier, e.g. double x[N];
is that x is statically allocated with a size that must be known when the program is written, not when it is executed. Hence, N
must be an integer constant. This is not desirable when writing general-purpose programs when the size of the model is an

S. D. Rajan, 2000-24 7-163


C L A S S E S : O B J E C T S 1 0 1

unknown quantity. The vector class is used to store the four vectors as shown in lines 41 and 59. Note that vector indexing
starts at 0 and goes up to N‐1, e.g. see lines 49 and 54.

As we have seen before, the values of the point coordinates and the start and end points of lines are obtained interactively using
the GetInteractive function. Some amount of error checking is carried out, but by no means is the error checking
comprehensive.

Once the input data is in place, the scale factor is computed (Step 3 of the algorithm) as seen in line 81 using a call to (as yet
undeveloped) function ComputeScale.

S. D. Rajan, 2000-24 7-164


C L A S S E S : O B J E C T S 1 0 1

Steps 4 through 7 are implemented in lines 84 through 98. Three other undeveloped functions GraphCoordinates, Move and
Draw are used to compute the graph coordinates given the point coordinates, move to a location on the graph, and draw to a
location on the graph. One can imagine the Move function as moving to a point on the graph with the pen up (or, raised) and
the Draw function as moving to a point on the graph with the pen down.
Strengths of the Array-Based Solution
 Easy to understand, visualize and code.
 Vector data structure requires minimal execution time and storage.
Weaknesses of the Array-Based Solution
 Typical engineering programs have tens if not hundreds of arrays. The management of all the arrays can be problematic.
 Extension of the program from two to three-dimensions by the simple addition of z coordinate values is likely to cause
ripple effects throughout the program. Line 41 needs to be changed. The block of three lines to obtain the z-coordinate
needs to be added. The z coordinate values need to be considered in the scaling factor function. Similarly, the
GraphCoordinates function needs to be changed. The process of transforming a three-dimensional object to an equivalent
view in a two-dimensional plane involves several more steps than just scaling and translation.
 Similarly, if newer objects such as curved beams (described by three points not two), are brought into the program, the
programmed drawing logic needs drastic changes.
 Finally, for engineering applications, it is mildly stating, awkward to count starting at zero. This is one of the drawbacks of
the vector class.

Data Abstraction
As program logic becomes more complex, it is certainly helpful to break the logic into smaller pieces. These smaller pieces are
typically implemented in well-defined functions. Modularization in program development also has other beneficial effects –
testing and debugging of smaller components is much easier, several different programmers can work simultaneously on a
project with minimal interaction or overlap, and program reuse is possible. However, as the complexity of data flow in a program
increases, the task of program development and maintenance can become expensive and fraught with dangers. It is now
necessary to think about program development in a completely different way. This does not imply that whatever we have learnt
in the preceding chapters is incorrect or useless. As we will learn in this chapter, it is easy to build on what we have learnt by
simply reorganizing our thought process and learning new language constructs.
Ideally, programmers would like to define their own data types depending on the type of application program. These data types
would be built using C++ built-in data types. The thought process to create these user-defined types is known as data
abstraction. Data abstraction is defined as separating the overall properties of a data type from its implementation. The
mechanism to implement the data abstraction is called encapsulation. Encapsulation also referred to information hiding, refers
to the technique by which data attributes and behavior-related operations are linked together such that the data can be
manipulated only through these operations not directly. The program or code that stores the data and implements the behavior
is called the server code.
Let us look at a simple example having to store and manipulate information dealing with points defined in a two-dimensional
space. Let this space be defined as the usual cartesian x-y space. Each point will be defined in terms of its x and y coordinates.
To store the two coordinates, we will use the standard float type. Recall from Chapter 1 that objects are identified by a name,
have attributes (defined as having properties) and behavior (capabilities to do something). The attributes of the point object are its
(x,y) coordinates. What are some of the data manipulations that we may want to carry out with points in a two-dimensional
space? For example, we may want to store and retrieve these values. We may also want to check whether the point is at the
origin of the coordinate system, to compute what is the distance from this point to another point, etc. These are the type of
information that a programmer needs to know to write a program that uses the point-related information. Such a code (that
uses the point information) is called the client code. The client code needs to know how to use the information but not how
the information is stored or how the functionalities (behavior) are implemented. Encapsulation also referred to information
hiding, refers to the technique by which data attributes and behavior-related operations are linked together such that the data
can be manipulated only through these operations not directly. The program or code that stores the data and implements the
behavior is called the server code.
In C++ terminology, point (or whatever name we assign) is a class. This class is a user-defined data type similar to the built-in
data types that C++ provides such as int, float, etc. We associate objects with a class similar to the way we have been

S. D. Rajan, 2000-24 7-165


C L A S S E S : O B J E C T S 1 0 1

associating variables to the built-in data types. The client code declares objects associated with classes and uses the operations
permitted by the class definition and implemented in the server code, to manipulate the information stored in the objects. One
can now appreciate why information hiding is useful. First, the programming task can be very nicely divided. Programmers who
have the knowledge and expertise in writing the client code can concentrate on getting their job done without having to worry
about the implementation details. These are left to the experts who know more about the class attributes and its behavior.
Second, the client code cannot inadvertently or otherwise make errors in setting or changing the values of the data. For example,
if the point class is designed for points in the positive (x,y) space, then the error detection can be easily implemented in the
server code and an appropriate action can be taken. Third, by separating the behavior from implementation, we minimize the
impact that changes in behavior would have in the maintenance of a program. Let’s go back to the point example. Let us assume
that users of the class now request that they would like to look at points in a cylindrical coordinate system through two attributes
 r ,   . Does this imply that we would have to rewrite the existing client code, or can we make changes to the server code (point
class) such that this new functionality is available without comprising on the existing functionalities?
In the rest of the chapter, we explain the “how and why” of object-oriented programming using C++.

7.2 Components of a Class


What is the difference between a class and an object? An instance of a class is an object. Hence, we have one and only one class
definition in a program and we could have several objects that are different instances of that class. The details of class definitions
are presented next.
7.2.1 Defining Classes
In C++, the simplest class definition is shown below.
class class_name {
public:
….
private:
….
};
A class can have public and private member variables and functions1. The public variables and functions are available outside
of the class, i.e., by functions that use the class-related objects. The private variables and functions cannot be accessed directly
outside of the class. A compiler error is generated if an attempt is made to access these variables and functions. Note that a
complete class definition requires that the member functions and variables be defined within the {}; including the semicolon
at the end.
We will go back to the point class discussed before. To manipulate the coordinate values, we will need member functions
(functions that are members of a class) that will define, redefine and access the coordinate values. Function or functions that
help initially define the values are called constructors (ctor for short). Similarly, the functions to redefine (or modify) the values
are called modifier (or mutator) functions, and the functions to access the values are called accessor functions. Here are the
statements to define the CPoint class. These statements are usually contained in a header file, e.g., point.h.
#pragma once
#include <string>

class CPoint
{
public:
// constructors
CPoint (); // default
CPoint (float, float); // overloaded
// helper function
void Display (const std::string&);
// modifier function
void SetValues (float, float);
// accessor function
void GetValues (float&, float&);

1 There is a third type – protected that we will see in Chapter 9.

S. D. Rajan, 2000-24 7-166


C L A S S E S : O B J E C T S 1 0 1

private:
float m_fXCoor; // stores x‐coordinate
float m_fYCoor; // stores y‐coordinate
};

Constructors are special member functions that do not require a return type. They have the same name as the class name. They
are automatically called when an object associated with the class is declared and created. They can be overloaded just as regular
functions. They are optional. If you do not define a constructor in your class definition, then C++ provides a constructor that
essentially does nothing as far as your specifications are concerned. Constructors are typically used to initialize the member
variables of the class. There is another special member function called the destructor (dtor for short). The destructor has the
same name as the class, does not have a return type, cannot be overloaded, is not required to be declared or defined, and is
automatically invoked when the object associated with the class goes out of scope. As an example, the destructor for the CPoint
class can be declared as follows.
public:
// constructors
CPoint (); // default
CPoint (float, float); // overloaded
// destructor
~CPoint ();

The ~ (tilde) symbol is used before the class name to denote the destructor. The default specification for member variables and
functions is private. In other words, if the keyword public or private is not used, the compiler assumes that the member function
or member variable is private. In the CPoint class, two constructors are used. The first (without any parameters) is called the
default constructor. We will use this constructor to set both the coordinate values to zero. In addition to the two constructors,
there are three public member functions. The Display function is designed to display the coordinates using standard output.
The SetValues and the GetValues functions are designed to redefine the coordinates and to obtain the coordinates, respectively.
There are no public member variables. As we saw with data abstraction, class declarations typically do not have public member
variables. The two variables that store the coordinates are declared as private variables. They cannot be accessed in any program
component outside of the five member functions in the CPoint class. We could create exceptions to this rule as we will see in
Chapter 9. There are no private member functions in the CPoint class declaration. The keywords public and private appear
only in the class definition. To implement (or define) these functions, we need to create the appropriate C++ statements. Here
are those statements (usually contained in a source file, say point.cpp).
#include <iostream>
#include <string>
#include "point.h"

// default constructor
CPoint::CPoint ()
{
// coordinates initialized to zero
m_fXCoor = 0.0f;
m_fYCoor = 0.0f;
}

// overloaded constructor
CPoint::CPoint (float fX, float fY)
{
// coordinates set to fX and fY
m_fXCoor = fX;
m_fYCoor = fY;
}
// modifier function
void CPoint::SetValues (float fX, float fY)
{
// coordinates set to fX and fY
m_fXCoor = fX;
m_fYCoor = fY;
}

S. D. Rajan, 2000-24 7-167


C L A S S E S : O B J E C T S 1 0 1

// accessor function
void CPoint::GetValues (float& fX, float& fY)
{
// coordinates returned in fX and fY
fX = m_fXCoor;
fY = m_fYCoor;
}

// helper function
void CPoint::Display (const std::string& strBanner)
{
// display the current coordinates
std::cout << strBanner
<< "[X,Y] Coordinates = ["
<< m_fXCoor << ","
<< m_fYCoor << "].\n";
}

Note the difference between the definition of a regular function and a member function that belongs to a class. The member
function definition needs a qualifier - the name of the class and the scope operator ::. The statements in each member function
are just as they would appear in any function with the difference that the member variables are declared in the class definition
and hence should not be defined in the body of the member function. It would be incorrect to write the function SetValues
as follows.
void CPoint::SetValues (const double dX, const double dY)
{
double m_dXCoordinate; // local to this function!
double m_dYCoordinate; // local to this function!
m_dXCoordinate = dX;
m_dYCoordinate = dY;
}

With the (incorrectly defined) function shown above, the values of dX and dY are assigned to the local variables m_dXCoordinate
and m_dYCoordinate not the variables (with the same name) that are a part of the CPoint class! Recall the scope rules from
Chapter 4.
The next obvious question is how can the class be used in an application program? We illustrate the usage using a simple
example.
Example Program 7.2.1 Using the CPoint class
Here is the main program that illustrates the usage of the CPoint class. We will use two objects associated with the CPoint class
– Origin and CarCoords. Origin will be declared using the default constructor and CarCoords will use the overloaded constructor.
The user will be prompted to enter the x, y coordinates and these values will be used with the SetValues member function to
set the user-defined coordinate values. The verification of the coordinate values will take place using the Display member
function. The usage of the GetValues function is left as an exercise.
main.cpp

S. D. Rajan, 2000-24 7-168


C L A S S E S : O B J E C T S 1 0 1

An object is declared just as any other variable in a program. In line 18, the object Origin is declared, and the coordinate values
are initialized to zero in the default constructor, a convenience that cannot be overlooked. This statement will not compile if a
default constructor is not defined – it is a good practice to define a default constructor for all classes one codes. In line 21, the
object CarCoords is declared, and the coordinate values are initialized using the overloaded constructor. Note how the member
functions are used in the program. In line 24, the Display member function is called. The general usage in using a member
function is
object.memberfunction (parameter list);
not
memberfunction (parameter list);

The . is referred to as the member selection operator and is used in accessing the member variables and member functions
outside the class. Every object is closely tied to the variables associated with the class. In other words, unless otherwise specified,
an individualized copy of the member variables is created for every object. However, only one copy of the member function is
created that can then be used by all the objects associated with that class.
The use of private variables precludes its access outside the class. For example, we cannot write the following statements in the
main function
CarCoords.m_fXCoor = fV[0]; CarCoords.m_fYCoor = fV[1];
instead of the original statement (line 31)
CarCoords.SetValues (fV[0], fV[1]);
Similarly, private member functions cannot be accessed outside the class. Let us review the important facts about class
definitions and usage.
1. The class definition is usually contained in a header file. A complete class definition requires that the member functions
and variables be defined within the {}; including the semicolon at the end.
2. Member variables and functions are, by default, private.
3. Definition of the constructor is optional. C++ defines a constructor if one is not defined. The constructor does not
have a return type and has the same name as the name of the class.
4. It is a good idea to define a default constructor.
5. Declaring a public member variable should be done with care. Unless the design of the class calls for a public member
variable, declare all member variables as private. Private member functions and variables cannot be accessed outside
of the class.
6. Member functions are defined using the scope resolution operator :: and they are referenced outside the class using
. the member selection operator.
We will now look at another example where (a) public and private member functions are used, and (b) some rudimentary error
trapping is necessary for the program to work correctly.

S. D. Rajan, 2000-24 7-169


C L A S S E S : O B J E C T S 1 0 1

Example Program 7.2.2 A Time-related Class


We will look at defining and using a time-related class. This class is described in terms of three private member variables - hours,
minutes and seconds. The client code will have access to these member variables via accessor and modifier functions. The
accessor functions will provide access to either the hour, or the minute, or the second, or all the three member variables.
Similarly, for the modifier functions. In addition, we will also have publicly available helper functions that will print the current
time and also compute the elapsed time between two given time values.
The header file that contains the class declaration is shown next.
time.h

Internally the time is stored in a 24-hour format. To help in computing the time difference, we have defined a public helper
function TimeDifference that uses two private helper functions to convert the time from the 24-hour format to seconds and
back (ConvertToSeconds and ConvertFromSeconds). In line 17 the output styles for displaying the time, e.g. hh:mm:ss, and hh
Hour(s) mm Minute(s) ss Second(s), are defined. The four public modifier functions have a return type of int. In the program
it is assumed that a return value of 0 denotes no error and a return value of 1 denotes an error. As we will see, this mode of
error checking is not desirable, and there are several improvements that can be mode to the program.

The implementation of the class-related functions is shown next.

S. D. Rajan, 2000-24 7-170


C L A S S E S : O B J E C T S 1 0 1

time.cpp

The default constructor is listed in lines 13 through 18. The default time is set as midnight. The overloaded constructor (lines
20-26) has three parameters for the hour, minute and seconds. The constructor does not check the input for errors since it is
not clear what to do if the specified time values are not valid. The destructor has no statements and can be deleted both from
the header and the source files. However, it is recommended that the destructor be included with every class just in case the
destructor may be needed for future enhancements to the class.

The Print member function prints the time as hour:minute:second (if Style is USINGCOLONS) or as hour Hour(s) minute
Minutes(s) second Second(s) (if Style is USINGDESCRIPTORS).
Time difference between the two time objects is computed in the TimeDifference function where it is assumed that if the time
difference is negative, then the end time has crossed midnight. For example, if the start time is 20:10:10 and the end time is
01:10:10, then the elapsed time is 5 hours. The TimeDifference member function illustrates how private member functions can
be used. In lines 49 and 50, the start time (or from time) and the end time (or to time) are converted from the 24 hour values
into seconds elapsed since midnight. If the two time values span midnight, then the adjustment is made in lines 54-55. Finally,
in line 58, the time difference in seconds is converted back to the 24-hour format. The private member functions
(ConvertToSeconds and ConvertFromSeconds) are defined in lines 61 through 73. An important question is how do lines 49 and
50 work since the ConvertToSeconds is a private member function? When an argument in a class member function is of the
same class type, the function can access the argument’s private member components (instead of using the public accessor

S. D. Rajan, 2000-24 7-171


C L A S S E S : O B J E C T S 1 0 1

functions). In this example, the objects TFrom and TTo are both CTime objects. Hence it is legal to use
TFrom.ConvertToSeconds()and TTo.ConvertToSeconds().

Examples of public accessor functions are shown in line 76-86.

Example of public modifier functions are shown in lines 99-119. The input values are checked if they are valid. If they are, the
values are used to modify the (private) member variables and a zero value is returned. If they are not, the current values of the
object are left intact, and a nonzero value (value of 1) is returned. As we will see in the main program, the user of the CTime class
(herein referred to as the client code) will have to assume the responsibility of taking the appropriate action upon error detection.

We will now see a sample main program, contained in file main.cpp, which uses the CTime class.

S. D. Rajan, 2000-24 7-172


C L A S S E S : O B J E C T S 1 0 1

main.cpp

The objects StartTime and EndTime are declared in lines 19 and 20 and are set as midnight since the default constructor is called.
The overloaded constructor will be used if the following declaration was used instead.
CTime StartTime (2, 15, 30);

Using the GetInteractive function, the start time and end time values are obtained. The modifier function SetTime is used to
set the time values. An error message is displayed if this function returns a nonzero value.

In line 43, a new object TimeDiff is declared. This is the object that will store the time difference between StartTime and
EndTime. The use of the TimeDifference function is illustrated in line 34. Finally, the Print member functions are used in lines
45 through 48 to display the results of the time computations.

Sample program outputs are shown in Fig. 7.2.1.

S. D. Rajan, 2000-24 7-173


C L A S S E S : O B J E C T S 1 0 1

Fig. 7.2.2 Program output from Example 7.2.1


Now that we have seen how to define and use simple classes, it is time for us to start formalizing this process.

7.3 Developing and Using Classes


There are much more to classes than we have seen in the previous section. In this section, we will look at a few of the advanced
features that will be useful in the development of numerical analysis-based computer programs.

Defining functions in header files


Sometimes member functions in a class have very few statements. Sometimes they have just one statement! In situations such
as this, it may be better to define the function in the class definition itself. Consider the CTime class from Section 7.2. We can
define the body of the function GetHour in the header file as
int GetHour () {return m_nHour;};

Note that we need a semicolon at the end of the definition – a feature not required when the function is defined in the class
source file. We can write the functions GetMinute, GetSecond, SetHour, SetMinute and SetSecond in a similar manner.

Inline Functions
As we saw in Chapter 4, functions make program development easier but at an added price of using additional resources of
storage and execution time. C++ provides a mechanism by which this execution time overhead can be reduced through the
use of inline functions identified by the inline keyword. For example, if the GetHour function is repeatedly used in a program,
then a more efficient way of executing the function would be to declare it as
inline int GetHour () {return m_nHour;};

The inline qualifier is merely a suggestion to the compiler. The compiler is able to decide how best to optimize the function
definition. One of the disadvantages of inlining a function is that the compiler inserts the same code at multiple locations where
the function is called thereby making the executable code larger. The inline qualifier can also be used with regular (non-
member) functions.

Using const Qualifier with Member Functions


As we have mentioned in the past, the const qualifier provides a safe mechanism in using and passing objects. For example, we
used the const qualifier in the CPoint class with the string parameter in the Display member function. We can extend this safety
mechanism when declaring and defining member functions. For example, with the CTime class, we should have declared the
TimeDifference member function as
void TimeDifference (const CTime&, const CTime&);

With this declaration, the two time objects that appear as the function parameters are passed as const references and hence
cannot be modified within the class. When a function is declared as a const as follows
return_value class_name::function_name (parameter list) const;

S. D. Rajan, 2000-24 7-174


C L A S S E S : O B J E C T S 1 0 1

then it cannot modify the object within the function body. Here is the modified CTime class definition that uses the const
qualifier (note all statements are not displayed).
class CTime
{
public:
CTime (); // default constructor
CTime (int, int, int); // constructor
~CTime (); // destructor

// helper functions
void Print (PStyle Style=PStyle::USINGCOLONS) const;
void TimeDifference (const CTime& TFrom, const CTime& TTo);
void TimeDifference (const CTime& TTo);

// modifier functions
int SetTime (const CTime&);

// accessor functions
void GetTime (int&, int&, int&) const;
void GetTime (CTime&) const;
int GetHour () const;
int GetMinute () const;
int GetSecond () const;

private: // store in 24‐hour format


….
// helper functions
int ConvertToSeconds() const;
void ConvertFromSeconds (int, int&, int&, int&) const;
};

Here is the definition of one of the const member functions.


int GetHour () const
{
return m_nHour;
}

We have also made a simple improvement to one of the member functions. The TimeDifference member function now
contains only one argument. The member function can be rewritten as follows.
void CTime::TimeDifference (const CTime& TTo)
{
// find the difference between the two times in seconds
int nTFrom = ConvertToSeconds();
int nTTo = TTo.ConvertToSeconds();
int nDiff = nTTo ‐ nTFrom;
// adjust if time crosses midnight
if (nDiff < 0)
nDiff = nDiff + 86400;
// now convert from seconds back to hr:min:s
ConvertFromSeconds (nDiff, m_nHour, m_nMinute, m_nSecond);
}

With this definition, the client code that uses this member function can be written as
CTime Time1(10, 55, 0), Time2(12, 45, 10);
Time1.TimeDifference (Time2);
The elapsed time between Time1 and Time2 is computed, and the results are stored in Time1. It should be noted that while C++
does not allow the const qualifier to be used with constructors and destructors, const objects can be initialized using the
constructor.

S. D. Rajan, 2000-24 7-175


C L A S S E S : O B J E C T S 1 0 1

Composite Classes: Classes within Classes


As we have seen so far, class variables can be any of the standard C++ data types. In this chapter, we have started defining our
own data types using the concept of classes. The class objects can be manipulated similar to the standard C++ data types. Hence
a logical question is whether a class can contain objects belonging to other classes? Yes! When an object belonging to another
class is declared as a member variable in another class, the object has a local scope within the defined class. When used correctly,
one can build classes hierarchies that can have complex and useful functionalities. This process is known as class composition.
We will illustrate the concepts associated with class composition in the following example. A class that has an object of other
classes as the member variables is called a composite class.

Example Program 7.3.1 Example of Class Composition


Problem Statement: The City of Urban monitors temperature distribution in the city to help understand the “heat island” effect. A
truck with the appropriate sensor-based electronic gear is sent out to obtain the temperature at different locations in the city at
random times. Urban is divided into a grid and the precise location where the temperature reading is taken is obtained as an
(x,y) pair. Develop a computer program to help the person collecting the data, enter and store the data suitable for processing
at a later time.
Solution: We will defer the development of a complete solution that includes storing and retrieving the gathered data. In this
example, however, we will develop the class declarations and definitions that will enable storage and retrieval at a later stage. In
addition, we will assume that no more than 4 readings will be taken and stored for later retrieval and display. Choosing 4 is done
mainly since this is a small number and can readily be used to illustrate the ideas behind the example.
The problem statement identifies the following attributes with every temperature reading – the location in the form of (x,y)
values, the time value and the temperature value. We have already defined the CPoint class to hold and manipulate data point-
related information. We have also defined the CTime class to hold and manipulate time information. We could use these two
classes in our solution. What is the advantage? The advantage is that this is a good example of class reuse. We will essentially
leverage the usefulness of the two developed classes in defining a new class that will contain objects belonging to those two
classes. For example, we can define three member variables as follows.
CTime m_TimeStamp; // to store time temperature taken
CPoint m_Location; // to store location temperature taken
float m_Temperature; // to store the temperature value

The objects, m_TimeStamp and m_Location will be contained in the yet undefined class. What is the disadvantage? The
disadvantage is that one needs to be familiar with the CPoint and CTime classes to use them. In other words, instead of developing
and learning about one class that would hold all the information we have now to deal with three classes.
The process of including objects from other classes into a class is known as class composition and the newly defined class is
known as a composite class. In this problem, we will define a new (composite) class called CSensor. This class will contain the
three member variables listed above. We will also define the usual accessor, modifier and help functions to manipulate the data.
No additional features are necessary at this stage. The detailed algorithm is not required since the main program will merely get
the sensor reading information from the user and display the information as a confirmation that the overall procedure is good.
The program uses slightly modified versions of previously defined classes – CTime and CPoint. The interested reader is urged to
look at the source code to see what the changes are. The header file that contains the CSensor class declaration is shown next.

S. D. Rajan, 2000-24 7-176


C L A S S E S : O B J E C T S 1 0 1

sensor.h

There are three private member variables that store the three pieces of information associated with every reading. These variables
are declared in lines 28-30. To manipulate the information, we use the usual modifier and accessor functions as shown in lines
22 and 25. The Display function is used to display the stored information. The class member functions are shown next.
sensor.cpp

The SetData and GetData member functions are straightforward. Both use the publicly available functions from the CPoint and
CTime classes to modify and to access the data.

S. D. Rajan, 2000-24 7-177


C L A S S E S : O B J E C T S 1 0 1

The Display member function leverages the display functions from the CPoint and the CTime classes to display the entire sensor
data – location, time and temperature.
Now we are ready to look at the main program used to obtain and store the information.
main.cpp

The first thing to notice is that the sensor data is stored as a vector – line 17. For each of the locations where the data is
supposedly acquired, the data is obtained interactively in lines 32-34. The CPoint object is created in line 23 and the CTime object
in line 24. Finally, in line 43, the values of the ith CSensor object data (SensorData) are set. The newly created data is displayed
one location at a time in line 49.

S. D. Rajan, 2000-24 7-178


C L A S S E S : O B J E C T S 1 0 1

This simple example serves to illustrate what we have achieved. Instead of having one big class with several components we
have three smaller classes with much smaller components and yet the same functionalities that, if necessary, can be extended
with need and use.

Fig. 7.3.1 Program output from Example 7.3.1


One must be aware of the C++ rules that govern the initialization of objects contained inside other classes. For example, the
following (within sensor.h) is invalid and will not compile.
class CSensor
{

CTime m_TimeStamp (10, 0, 0); // invalid initialization
….
}

These objects can be initialized only through an executable statement. For example, to initialize the m_TimeStamp variable we
could modify the CSensor constructor as follows.
CSensor::CSensor ()
{
m_TimeStamp.SetTime (10, 0, 0);
}

Another approach would be to overload the constructor.


CSensor::CSensor (const CTime& Time, const CPoint& Point)
{
m_TimeStamp.SetTime (Time);
m_Location.SetValues (Point);
}

Or, if appropriate copy constructors exist for the CTime and CPoint classes, then the following construct can be used.
CSensor::CSensor (const CTime& Time, const CPoint& Point) :
m_TimeStamp (Time), m_Location(Point)
{
}
Note that the CSensor constructor is called after all the data member class constructors (CPoint and CTime) are called. We will
see more about composition in Chapter 9.

S. D. Rajan, 2000-24 7-179


C L A S S E S : O B J E C T S 1 0 1

Copy Constructors
Earlier we saw two versions of the constructor – the default ctor and the overloaded ctor. A copy constructor is a special case
of the overloaded constructor where a copy of an existing object is used to create a new object. Here is an example of a copy
constructor for the CPoint class.
class CPoint
{
public:
CPoint (const CPoint&);
….
}
CPoint::CPoint (const CPoint& P) // copy constructor
{
m_fXCoor = P.m_fXCoor;
m_fYCoor = P.m_fYCoor;
}

Copies of an object are made under the following conditions.


(a) When a declaration is made with initialization from another object. Here are three examples.
CPoint P1 (1.0f, 2.0f); // constructor.
CPoint P1 = P2; // copy constructor.
CPoint P1 (P2); // copy constructor.

(b) Parameters are passed by value.


(c) An object is returned by a function (more on that in Chapter 9).
C++ calls the copy constructor to make the copy. If there is no copy constructor defined for the class, C++ uses the default copy
constructor that copies each field. The copy constructor takes a reference to a const parameter. It is const to guarantee that the
copy constructor doesn't change it, and it is a reference parameter since a value parameter would require making a copy!

7.4 Storage with std::vector class


At the beginning of this chapter we briefly saw a reference to the std::vector2 class. Now that we know more about classes,
let us look at understanding how to use the standard vector class. This class is a container that can store and manipulate data
in perhaps the most efficient fashion. It is an indexed container meaning that using an integer index, one can access elements
of the vector. Note that #include <vector> must be used to make the proper reference to the vector class.
Declaration and Initialization
The following statement declares an vector that can store up to n integers.
std::vector<int> a(n); // integer n > 0

The data type or object to be stored in the vector needs to be specified for a template class object to be created. The elements
of the vector can be initialized as the following example shows.
std::vector<int> a(n, ival); // ival is an int variable

In the above example, all the n elements of vector a are initialized with the value contained in ival.
Accessing vector elements
The operator [] can be used to access elements of the vector. For example, the following code shows how all the elements of
vector a are set to ival.
for (i=0; i < n; i++)
a[i] = ival;
Notably, range checking is not done when the [] operator is used. In other words, execution of the following statements is
unpredictable.
std::vector<int> a(4);

2 std::vector is a template class that we will examine in more detail in Chapter 9.

S. D. Rajan, 2000-24 7-180


C L A S S E S : O B J E C T S 1 0 1

for (i=0; i <= 4; i++) // should be i < 4


a[i] = 10;

Other functionalities
There are numerous operations that can be carried out using vectors and some of the more important ones are listed in the
table below.
Operation Remarks
vector<type> a(b) Copies the contents of an existing vector b and creates a new vector a, e.g., vector<CPoint>
P1(P2);
a.size() Returns the number of elements in vector a
a.empty() Returns true if a is empty, false otherwise
a = b Assigns the contents of b to a
a == b Returns true if a is equal to b, false otherwise. Other logical operators can also be used.
a.at(i) Returns the element at location i. An error message is generated if location i does not exist,
e.g., P = P1.at(n);
a.push_back (b) Increases the number of elements of a by one by appending a copy of b at the end of a.
a.clear() Removes all the elements in a. The vector is now empty.

Example Program 7.4.1 Using the std::vector Class


We will write a program to illustrate some of the features of the std::vector class that tie in what has been discussed earlier in
the book. The reader should explore the more advanced features at a later stage.

S. D. Rajan, 2000-24 7-181


C L A S S E S : O B J E C T S 1 0 1

Lines 10 and 11 are needed so that the features of the vector and string classes can be used in the program. A vector of strings
of zero size is created in line 25. In lines 29-31, three strings are created and stored in the vector. Element with index 0 is created
in line 29 and so on. In line 37, the size of the vector is increased to 4 so that an additional string can be stored. That additional
string is stored in line 38 using the [] and = operators. Lines 37 and 38 could have been simply replaced by an additional call
to the push_back function. Additional usage of the the [] and = operators is shown in lines 42-44.

7.5 String manipulation with std::string class


Next, we look at how to use the standard string class that provides a host of useful capabilities needed in manipulating strings.
Firstly, we are able to treat character strings as a single entity, an object. Secondly, we do not have to guess the number of
characters that the string might contain – the class automatically allocates the resources so that the string can grow or shrink on
demand. Thirdly, there are functionalities that are provided through member functions that make string manipulation easy to
carry out. In this section, we will look at the more important characteristics of the string class that will enable us to write useful
programs. Note that #include <string> must be used to make the proper reference to the string class.
Declaration and Initialization
As we have seen earlier in the book, strings can be declared as a standard C++ data type. Here are some examples.
string strInput1, strInput2; // empty strings
string strHeader = "Welcome to my world."; // initialized string
string strPrompt ("Input an integer: "); // initialized string
string strCpyPrt (strPrompt); // initialized string

Assigning values to strings


Once a string is declared, it can be populated just as a standard C++ type using the assignment operator and expressions. Here
are some examples.
strName = strOldName;
strFullName = strFirstName + strLastName; // concatenation
strFullName = strLastName + ", " + strFirstName; // concatenation
strFullName += strOldName;

The concatenation can also be carried out using the append member function.
strFullName = strFirstName;
strFullName.append (strLastName);

The " " pair is associated with the string class initialization even if the string contains a single character. In other words the
following declaration is invalid
string strGradeA = ‘A’; // invalid

but the following assignment


strGradeA = "A"; // valid assignment

is valid.
Accessing the individual components
The string class provides access to individual characters in a string through the use of [] operator similar to accessing an element
of a vector.
strName = "Tony";
strName[3] = "i"; // name is now Toni

One can also find the length of a string using the length member function.
string strHeader = " Hello ";
std::cout << "Length of " << strHeader << " is "
<< strHeader.length() << "\n";

The above statements create the following display.

S. D. Rajan, 2000-24 7-182


C L A S S E S : O B J E C T S 1 0 1

Length of Hello is 7

The member function size has the same functionality as the length member function.
Using as a function parameter
A string variable can be used as a function parameter just as any other variable. For example if the function prototype is
void Display (const string& strHeader, const float fV[], int nSize);

then the function can be called as


float fVA[10];
string strTitle = "Vector A";
Display (strTitle, fVA, 10);

If a string is to be modified by a function then it is preferable that it be passed as reference. An example usage in a program
segment would be
string strFileName;
AddExtension (strFileName);

when the function prototype is


void AddExtension (string&);

Here is how the function AddExtension may be written.


void AddExtension (string& strFName)
{
strFName += ".dat";
}

Comparing strings
Strings can be compared to each other using the logical operators (==, !=, >, <, >=, <=) as well as the compare member
function. String comparisons take place in a lexicographical sense – as words are arranged in a dictionary. For example, the word
list precedes listing, cooling precedes help etc.
Here are a couple of examples of string comparisons.
if (strName1 < strName2)
cout << strName1 << " occurs before " << strName2 << "\n";
else
cout << strName2 << " occurs before " << strName1 << "\n";
Or
int nResult = strName1.compare(strName2);
if (nResult == 0)
cout << strName1 << " is the same as " << strName2 << "\n";
else if (nResult < 0)
cout << strName1 << " occurs before " << strName2 << "\n";
else if (nResult > 0)
cout << strName2 << " occurs before " << strName1 << "\n";
Working with Substrings
The member function substr can be used to work with substrings. Let’s look at the following example.
string strFirstPart, strLastPart;
string strTitle = "Vector A";
strFirstPart = strTitle.substr (0, 6); // extracts Vector
strLastPart = strTitle.substr (7, 1); // extracts A

The first parameter in the function call is the position from which to extract the substring and the second parameter is the
number of characters to extract.

S. D. Rajan, 2000-24 7-183


C L A S S E S : O B J E C T S 1 0 1

Finding substrings
Several member functions are provided that help find characters or strings within strings. Here are some of those member
functions.
find: This function can be used to find a string within another string. The return value is the starting location where the string
is found first (lowest position). Otherwise the returned value is a special value that is stored in string::npos. Here is an example.
string strHeadline = "Aliens land on Mars";
int nPos = strHeadline.find ("land");
if (nPos == string::npos)
cout << "land is not contained in " << strHeadline << "\n";
else
cout << "land occurs at location " << nPos
<< " in string " << strHeadline << "\n";

With the above example, the function looks for the string “land”. The returned value is 7.
find_first_of: This function can be used to find the first occurrence of any character in a given string within another string at
or after a specified location. The default value of this location is 0. The return value is the starting location where the character
is found. Here is an example.
string strHeadline = "Lose weight or loose change";
int nPos = strHeadline.find_first_of ("os", 5);
if (nPos == string::npos)
cout << "could not find ‘os’ after location 5.";
else
cout << "‘os’ occurs at location " << nPos << " beyond loc 5.\n";

With the above example, the function looks for the characters o and s beyond location 5. The returned value is 12 corresponding
to the character o.
find_last_of: This function can be used to find the last occurrence of any character in a given string within another string at
or before a specified location. The default value of this location is the end of the string. The return value is the location where
the character is found. Here is an example.
string strHeadline = "Lose weight or loose change";
int nPos = strHeadline.find_last_of ("se");
if (nPos == string::npos)
cout << "could not find se.";
else
cout << "last se occurs at location " << nPos << ".\n";
With the above example, the function looks for the characters s and e from the end of the string. The returned value is 26
corresponding to the character e.
rfind: This function can be used to find a string within another string backwards. The return value is the location where the
string is found last (highest position). Here is an example.
string strHeadline = "Aliens land on Mars";
int nPos = strHeadline.rfind ("land");
if (nPos == string::npos)
cout << "land is not contained in " << strHeadline << "\n";
else
cout << "land occurs at location " << nPos <<
" in backward search of string " << strHeadline << "\n”;

With the above example, the function looks for the string “land” within “Aliens land on Mars” but starts the search from the
end of the string. The returned value is 7.
find_first_not_of: This function can be used to find the first occurrence (lowest position) at or after a specified location that
matches none of the characters in a given string within another string. The default value of this location is 0. The return value
is the starting location where the character is found. Here is an example.
string strHeadline = "xxxx wwwww xx xxxx change";
int nPos = strHeadline.find_first_not_of ("wx ", 5);
if (nPos == string::npos)
std::cout << "could not find anything but ‘wx ’ after location 5.";

S. D. Rajan, 2000-24 7-184


C L A S S E S : O B J E C T S 1 0 1

else
std::cout << "char not one of ‘wx ’ occurs at location " << nPos
<< " after loc 5.\n";

With the above example, the function looks for the first character that is not one of w, x and a blank space beyond location 5.
The returned value is 19 corresponding to the character c.
find_last_not_of: This function can be used to find the last occurrence (highest position) at or before a specified location that
matches none of the characters in a given string within another string. The default value of this location is the end of the string.
The return value is the starting location where the character is found. Here is an example.
string strHeadline = "xxxx wwwww xx xxxx change";
int nPos = strHeadline.find_last_not_of ("change ");
if (nPos == string::npos)
std::cout << "could not find anything but ‘change ’.";
else
std::cout << "char not one of ‘change ’ occurs at location " << nPos
<< " string searched backwards.\n";
With the above example, the function looks for the first character that is not one of the characters in ‘change ’ starting the search
at the end of the string. The returned value is 17 corresponding to the character x.
Inserting strings
The insert member function can be used to insert a string at a specified location in another string. Here is an example.
string strDigits = "123321";
string strAlphabets = "abcd";
strDigits.insert (3, strAlphabets); // insert before location 3

The result is that strDigits is now 123abcd321.


Obtaining the address of a string
We will see more about pointers in the next chapter. When a situation requires the memory address of where the string is stored
the c_str function can be used. We will see the usage of this function in the next example.
Case-related issues
Strings or components of a string can be manipulated with reference to upper and lower case representation of characters using
C++’s character-handling library. The header file to include is <cctype>. The functions toupper and tolower convert a character
to its upper-case representation and lower-case representation respectively, if the representation exists.
Here is a sample function that we can write to convert all the characters in a string to upper case using the toupper library
function.
void ToUpperString (string& strInput)
{
for (int i=0; i < strInput.length(); i++)
{
strInput[i] = toupper(strInput[i]);
}
}

There are other functions available in the C++ library and these are listed and described in Appendix D.
Next, we present an example program that uses the standard string class.
Example Program 7.5.1 Four-Function Calculator Using the std::string Class
We will rewrite the 4-function calculator developed in Section 4.6. We will improve the user input by obtaining an expression
to evaluate rather than bits and pieces. The program will evaluate an expression in its simplest form as
leftnumber operator rightnumber
For example, to evaluate 12  1.53 , the user is expected to enter the input as follows.
Type an expression or stop to terminate the program: 12*1.53

We will use the following algorithm.

S. D. Rajan, 2000-24 7-185


C L A S S E S : O B J E C T S 1 0 1

1. Obtain the expression from the user.


2. If the expression is stop, then terminate the program.
3. Look for +-*/ within the expression. If one of the operators is not found, then the input expression is invalid.
4. Obtain the left number (from the beginning of the expression to the location of the operator). Convert from character
string to a number.
5. Obtain the operator.
6. Obtain the right number (from the operator to the end of the expression). Convert from character string to a number.
7. Based on the operator carry out the operation between the two numbers and display the result.
The program that implements the above algorithm is presented below. Extensive use of the string library is made.
main.cpp

In line 17, the supported operators (or the four function-related operators) are defined and stored in the strOpers variable. The
string stored in strHelp will be displayed only once with the string stored in strPrompt will be displayed every time the user is
prompted to enter the expression to evaluate – see lines 21, 31-34. The GetInteractive function is called with the appropriate
prompt to obtain the user input in strUserInput. A maximum of 50 character input is assumed!
The program is terminated if the user input is stop. Otherwise, using the find_first_of member function, the location of the
operator within the string is found in line 45. If one of the characters +‐*/ is not found, an error message is issued (line 84).

S. D. Rajan, 2000-24 7-186


C L A S S E S : O B J E C T S 1 0 1

The substr member function (lines 49 and 57) is used to obtain the string form of the left and right numbers. To convert
from a string to a floating-point value, the GetDoubleValue function is used. This function returns a nonzero value if the input
is not a valid number. If either the left number or the right number is invalid, the bError variable is set to true, the expression is
not evaluated, and an error message is displayed (line 81). Otherwise, the operation is carried out (lines 65-75) and the result is
displayed (line 78).

Fig. 7.5.1 Output from Example Program 7.5.1


There are other string related functionalities (such as stream processing) that we will see later in the book.

S. D. Rajan, 2000-24 7-187


C L A S S E S : O B J E C T S 1 0 1

7.6 What is struct ?


Those familiar with the C language will recognize the struct keyword to define a structure that loosely resembles a class. It is a
user-defined data type. The syntax has two formats.
struct name_of_structure struct name_of_structure
{ {
type variable1; type variable1;
type variable2; type variable2;
…; …;
}; // note the semicolon } variable_name;

The above statement defines the (user-defined) type name that can then be used to define variables that have the defined
structure. For example, we can define a point structure as follows.
struct stPoint struct stPoint
{ {
float fXCoor; float fXCoor;
float fYCoor; float fYCoor;
}; } P1;

Using either definition, we can declare variables CarCoordinates and BikeCoordinates as follows.
stPoint CarCoordinates, BikeCoordinates;
We can declare and initialize variables as follows.
stPoint CarCoordinates = {‐55.1f, 0.0f};
stPoint CarCoordinates = BikeCoordinates;

To access the individual components of the structure, we will have to use the . (dot) operator. For example
CarCoordinates.fXCoor = 12.3f;

We can also equate one structure to another as


CarCoordinates = BikeCoordinates;

struct’s provide a mechanism to aggregate data (sometimes referred to as plain old data, POD). By default, unlike classes,
struct provides public access to its data and methods.

7.7 Object-Oriented Solution


We will now develop an object-oriented solution to the problem posed in Section 7.1.
Object-based Solution
An examination of the problem statement and the details in Fig. 7.1.1, show that there are two major entities – points and lines.
Identification of the attributes of these entities will help define the variables to store them.
Attributes
Point Line
Is identified by a point number. Is identified by a line number.
Has two coordinates - x-coordinate and y- Defined in terms of two unique points – a start point and
coordinate. an end point.

Behavior
Point Line
Obtain or define the x-coordinate, or the y-coordinate, Obtain or define the start point, or the end point, or both.
or both.
Compute the maximum and minimum coordinate Compute its length so as enable the drawing process on a
values, and hence their range. two-dimensional plane.

S. D. Rajan, 2000-24 7-188


C L A S S E S : O B J E C T S 1 0 1

Similarly, identification of the behavior will help define the methods to implement the behavior. Forces the program developer
to think in terms of classes and objects. Abstraction and encapsulation make program organization and development easier.
This leads to cleaner data visualization and organization. There are less data management issues compared to the array-based
solution. Fig. 7.7.1 shows a pictorial view of the problem solution.
attributes methods graphical
canvas (a, a)
(point and line data) WireFrame (1) Read and store
point and line data
(2) Convert model coordinates
(start point, end point) Line a
to canvas coordinates
(3) Move
(4) Draw a
(x, y) Point (0, 0)

Fig. 7.7.1 Abstraction of the object-oriented solution


One of the more formal methods to abstractly dissect the problem is to use Unified Modeling Language (UML) that helps
visualize the architecture of the software system. Interested readers can use the references at the end of the chapter to learn
more about UML.
We will use a bottom-up, top-down approach to identify the classes and their attributes and methods. At the top of the diagram
is the wireframe class (CWireFrame) designed to hold all the attributes of the truss being displayed. Using the idea of composition,
the wireframe class contains the two major objects – the points and the lines that formed by the points. At the bottom is the
point class where the (x, y) coordinates of a typical point. Two unique points are used to define a single line. The collection of
the points and lines are stored in the wireframe class. The point and the line classes provide methods to access (read and write,
get and put) individual points and lines. The overall algorithm is implemented in the wireframe class as we will see the details
next.
Example Program 7.7.2 Object-oriented Solution for the Truss Display Problem
The main program is shown first.
main.cpp

The main program is extremely short. In line 17, the CWireFrame object TwoDTruss is declared. Its two public member functions
are called in lines 20 and 26 where the truss model details are read via keyboard input and where the pseudo-drawing takes place
on a graphical canvas, respectively. The call to the function Display is not required but is made to inform the user of the model
details.

S. D. Rajan, 2000-24 7-189


C L A S S E S : O B J E C T S 1 0 1

The details of the CWireFrame class contained in wireframe.h file are shown next.

The dimensions of the canvas and its margin are declared in line 28. These values are for illustrative purposes only and can be
changed. The point and the line data are stored in std::vector objects (lines 32, 33). In order to manipulate the model, the
member variables to store the model limits in the x and y directions (lines 35, 36), the mid-point of the model (line 37), and the
scaling factor to map the model (line 38) to the canvas are used. Finally, a number of private member functions that help read,
store and manipulate the information are declared in lines 40-45.
Declarations in the two helper classes, CLine and CPoint are shown next. The CPoint class has very minor updates compared
to the version shown in earlier examples. The default constructor, copy constructor, helper, accessor and modifier functions are
declared as public member functions. While the variables that store the (x, y) coordinates are declared private, there are no
private member functions. Similarly, the CLine class has the same type of constructors, helper, accessor and modifier functions,
and uses two private member variables to store the starting point and the ending point numbers that define a line.

S. D. Rajan, 2000-24 7-190


C L A S S E S : O B J E C T S 1 0 1

point.h

line.h

Finally, we look at the manner in which program execution takes place by examining the CWireFrame member functions. All the
private member variables are initialized in the default constructor. There is no copy constructor since in this program, it is not
anticipated that a copy of the entire model will be made for subsequent use. The main task of obtaining the user data to create
the model takes place in SetModelSize function. Once valid values of the number of points and lines are obtained (lines 29
through 38), the sizes of the point and line vectors are set via calls to the resize member function in the std::vector class. The
initial size of each vector is zero. Once the actual sizes are set, the data to populate the CPoint object (m_PointData) is obtained
in function ReadPointData. This is followed by the call to ReadLineData where the CLine object (m_LineData) is filled with the
start and the end points of each line.

S. D. Rajan, 2000-24 7-191


C L A S S E S : O B J E C T S 1 0 1

The ReadPointData function obtains the (x, y) coordinate values and stores them as shown in line 65. Since vector indexing
starts at 0, the index is computed as [i‐1] rather than i, i.e. data for point i is stored in location [i‐1] in the vector.

S. D. Rajan, 2000-24 7-192


C L A S S E S : O B J E C T S 1 0 1

The theory behind the drawing process is discussed next. With reference to Fig. 7.1.1(b), the entire truss irrespective of the units
used to define the point coordinates must not only fit into the drawing canvas without infringing into the margin at the four
edges of the canvas, but be centered in the canvas. Let the coordinates of a specific point i be denoted as  x i , y i  and its
corresponding coordinates on the graphing canvas be denoted as  x gi , y gi  . Then to achieve our drawing objectives, we can
map the point coordinates to its corresponding graph coordinates as
 x gi    x i   X mid    A 2
 y   s         (7.6.1)
 gi    y i   Ymid    A 2
where
 X mid  1  X min  X max 
    (7.6.2)
 Ymid  2  Ymin  Ymax 
 Aa A a 
s  min  ,  (7.6.3)
 X max  X min Ymax  Ymin 
Note that (i) ( X min , X max ) are the minimum and maximum x-coordinates from all the defined points, and ( Ymin , Ymax ) are the
corresponding y-coordinates, (ii)  X mid , Ymid  are the (x, y) coordinates of the mid-point of the truss model, (iii) s is the scaling
factor between the drawing canvas and the point coordinate space in which the truss is referenced, and (iv) effective drawing
space in the canvas measures  A  a    A  a  . Quantities in Eqns. (7.6.2) and (7.6.3) are computed in the ComputeScale
function.

Note that (i) A  DRAWSIZE and a  MARGIN , and (ii) the apparent complexity of computing the scaling factor is due
to the fact that the truss could simply be one-dimensional, i.e., parallel to the x-axis or the y-axis.

S. D. Rajan, 2000-24 7-193


C L A S S E S : O B J E C T S 1 0 1

Steps (4)-(7) in the algorithm (see Section 7.1) are implemented in lines the function DrawModel.

And finally, the computation of the (x, y) coordinates on the graphical canvas using Eqn. (7.6.1) takes place in function
GraphCoordinates first and then the two primitive drawing operations take place in function Move and Draw.

A sample execution of the program is shown in Fig. 7.6.1 using the model from Fig. 7.1.1(b). The final drawing operations
involve three sets of (move-draw) pair since there are three lines (or truss members). A simple check should show that the (x,
y) values should lie between  a , a  and  A  a , A  a  , or with the values used in the program 1,1 and 14,14  .

S. D. Rajan, 2000-24 7-194


C L A S S E S : O B J E C T S 1 0 1

Fig. 7.7.2 A sample truss description and program generated drawing instructions using the program from
Example Program 7.7.2
While this is a much-improved solution, there are some deficiencies that are worth discussing.
(1) This version of the program is for an interactive mode of execution using the keyboard for input and a console window for
display. Creation of data is cumbersome and editing created data is impossible. A more user-friendly program would have a
pure Graphical User Interface (GUI) that would facilitate a graphical view of the truss. Such a program would provide the ability
to interactively add, delete, update, cut, and paste points and lines. It would also support input via other devices such as mouse
and pen, and allow for more advanced graphical operations to take place such as rotation, zooming in and out, etc.
(2) The CLine class does not have direct access to its point’s attributes. So, what would happen if a point were deleted? An
invalid line would exist. In addition, the CLine class should be the logical place to set up the draw function. The CWireFrame
class is used to get both the line and associated point information.
(3) How much additional effort would be needed to support lines in three-dimensional space?
(4) There is minimal error checking in the program. What checks would you carry out to maintain a consistent model?
(5) Would there a performance penalty if hundreds, or thousands, or million lines were created and manipulated?

S. D. Rajan, 2000-24 7-195


C L A S S E S : O B J E C T S 1 0 1

7.8 Exception Handling with Classes


We first saw C++’s exception handling in Chapter 4. It may be helpful to go ahead and review that short material. We will
conclude this chapter by looking at an example where exception handling in the context of objects is illustrated with introduction
to other C++ keywords and concepts.
Example Program DateandTime
A class, CDateandTime will be defined that handles both date and time as a single object. In other words, this object has two
components that go together – a date component consisting of year, month and day, and a time component consisting of hour,
minute and second as a twenty-hour clock.
The Julian calendar3 forms the basis of this class. The day, month, year, hour, minute and second are the six variables that define
the complete date and time. A 24-hour time is assumed. The requirement is to support at least these functionalities:
(1) Given two CDateandTime objects, compute the elapsed time in (i) days, hours, minutes and seconds, (ii) only in
seconds, and (iii) only in days.
(2) Find the day-of-the-week (DOW) given the date.
(3) Given the year and month, find the mth DOW, e.g., 2nd Monday of the month.
(4) Given the year and month, find the last DOW, e.g., last Sunday of the month.
(5) Given the year and month, find the DOW for the 1st of the month.
(6) And finally handle the more common questions such as given the year, find if the year is a leap year or not, or given
the year and month, find the number of days in the month.
Clearly, mistakes can be made in specifying both the date and time, and the program must be able to detect the error and handle
the error gracefully without exiting the program that we have done in the past.
The CDateandTime object is created using one of these methods:
CDateandTime (); // default ctor
CDateandTime (const CDateandTime& DT); // copy ctor
CDateandTime (int year, Month month, int day,
int hour=0, int minute=0, int second=0); // overloaded ctor
void SetDateTime (int year, Month month, int day,
int hour, int minute, int second); // modifier function

Even if we somehow ensure that only valid objects are created, it is certainly possible that an invalid object can be potentially
created with the overloaded constructor and the modifier function. Hence, two questions need to be asked and answered. First,
where do we detect that an invalid object is being created? Second, what do we do to ensure that the program continues to run
correctly?
First, we will take a look at the initial part of the CDateandTime header file.

3 https://en.wikipedia.org/wiki/Julian_calendar

S. D. Rajan, 2000-24 7-196


C L A S S E S : O B J E C T S 1 0 1

Enumerated classes are used to help track months (line 11), days of the week (line 12), the two major errors (line 13), and the
manner in which the date and time can be displayed (line 13). In lines 16-21, integer constants are declared for the attributes
associated with date and time. The keyword static is used as a prefix. A static variable exists for the lifetime of the program
unit in which it is defined. In addition, when the keyword is applied to a member of a class, only one instance of the variable
exists for that class. Hence, static const int MONTHS_PER_YEAR = 12; declares a single variable MONTHS_PER_YEAR that is shared
by all the CDateandTime objects. Its initial value is 12 and that value cannot be changed (const qualifier). This makes it possible
to define a vector of strings to store the names of the month as

Without the static qualifier, these vectors cannot be defined in the header file. The C++ compiler is able to recognize that the
length of the vector is 12 and is able to allocate the memory to store these character strings.

To implement the functionalities (1)-(6), helper functions are declared as follows:

We will now look at how we can handle invalid objects and not break the program. In the client code (main program), the try-
catch block is implemented so that if an error is caught, the program displays an error message and execution continues. Snippet
of the code from the main program illustrates how this can done.
try
{
if (i == 1)
// bad date, will throw an exception
CDateandTime DAT0 (0, CDateandTime::Month::JAN, 1, 0, 0, 0);
else if (i == 2)
// other statements follow

}
catch (CDateandTime::Error err)
{
if (err == CDateandTime::Error::INVALIDDATE)
std::cout << "main: Caught invalid date.\n";
else if (err == CDateandTime::Error::INVALIDTIME)
std::cout << "main: Caught invalid time.\n";
}
}

The try block has calls to the overloaded constructor with invalid date. The plan is to throw an appropriate error before the
object is created and catch the error in the main program and continue execution. Continuing execution means correcting the
error to create a valid object or doing something else that does not involve the invalid object.

S. D. Rajan, 2000-24 7-197


C L A S S E S : O B J E C T S 1 0 1

The code that detects the error and throws an exception in CDateandTime (DateandTime.cpp) is shown below.

The call to the overloaded constructor is channeled through the modifier function, SetDateTime. This is motivated by not having
to create duplicate code. The actual checks are carried out for the date in ValidDate function and for the time in ValidTime
function.

Note the syntax and style. A try‐catch block is introduced. In the try block, checks are made to validate the hour, the minute
and the second values. A time specification of 24:00:00 is not allowed, but 00:00:00 is taken as midnight. The catch block
displays the error message. For a large program, this display is probably not recommended since immediately another throw is
executed in line 140 that is intended to be caught in the client code (main program). A throw without an argument rethrows the
exact same exception that was just caught. This style is extremely efficient in error handling and we will look at the performance
implications of exception handling later in the book.
We will go back to the use of static keyword in the context of a variable of a class. A member variable is declared as

S. D. Rajan, 2000-24 7-198


C L A S S E S : O B J E C T S 1 0 1

static Format m_curFmt; // current format in effect


The intent of using this (private) variable modified via the (public) helper function SetFormat, is to format the date and time
components in four different styles – displaying only the date (DateOnly), showing the date as Day/Month/Year
(DayMonthYear), showing the date as Day/Month/Year (DayMonthYear), showing the date as Year/Month/Day (YearMonthDay),
and finally, showing the date as Month/Day/Year (MonthDayYear).
Since there is only one copy of this variable that is shared amongst all the CDateandTime objects, it must be initialized before it
can be used. This is done at the top of the DateandTime.cpp file.

indicating that the default style is MonthDayYear, e.g., Jan/ 1/2000.


The rest of the source code is not shown and discussed here. The reader is encouraged to go over the entire source code and
understand what are perhaps advanced features of C++, including operator overloading that will be discussed in greater detail
in Chapter 9.

Fig. 7.8.1 Output from Example Program DateandTime

S. D. Rajan, 2000-24 7-199


C L A S S E S : O B J E C T S 1 0 1

Summary
In this chapter we were introduced to the concept and usage of simple yet powerful idea – how to visualize the logic and data
encountered in a typical program through abstraction. This process helps in the defining classes where attributes are distinct
from behavior. By using classes, we eliminate a host of problems. First, we tie data to data access and manipulation very tightly.
We can make program development more systematic by reducing the proliferation of global functions and variables. Second,
we can have more control over the access to the data using public and private functionalities – information hiding is possible.
Third, we encourage the reuse of software. This is possible only if classes are designed with appropriate specifications. As we
will see in Chapter 13, programmers not involved in the initial design of a class can still add functionalities to an existing class
through the process of inheritance.
Programming Style Tip 7.1: There is no substitute for proper class design
It is essential that the class design take place properly. The interface and the implementation should be separated - this is the
idea behind encapsulation. From the client code development viewpoint, the programmer needs to know only the interface for
a successful implementation.
Programming Style Tip 7.2: Practice defensive programming
Be careful to define public and private variables and functions. Functions that can potentially expose the inner workings of a
call should be hidden from the client code by declaring them as private.
Programming Style Tip 7.3: Define a default constructor and a destructor
Even though C++ does not require a ctor, this is the place where one would initialize all the variables defined in a class. For
similar reasons, it is better to explicitly define a destructor where cleanup operations can take place.
Programming Style Tip 7.4: Define the copy constructor
Even though C++ does not require a copy constructor, you can speed up program execution and avoid run time errors by
providing a copy constructor.

S. D. Rajan, 2000-24 7-200


C L A S S E S : O B J E C T S 1 0 1

Exercises
Most of the problems below involve the development of one or more classes. In each case (a) develop a plan to test
the classes(s), and (b) implement the plan in a main program.

Appetizers
Problem 7.1
Develop a class CRectangle to handle rectangles in a two-dimensional (X, Y) space. In addition to storing the data that describe
the rectangle, this class should be able to (a) compute the area, (b) perimeter, and (c) recognize if the rectangle is a square.
Problem 7.2
Enhance the capabilities of the CPoint class by adding a member variable m_fZCoor (to store the z coordinate) and member
functions to carry out the following tasks. (a) Create the copy constructor as CPoint::CPoint(const CPoint&). (b) A predicate
function bool IsOrigin() to see if the point is at the origin of the coordinate system. (c) The distance to another point as float
DistanceTo (const CPoint&). (d) Unit vector to another point as void UnitVector (const CPoint&, float fVUVector[]).

Problem 7.3
The capabilities of the CTime class discussed in this chapter can be enhanced. Create additional member variables and functions
that will (a) recognize the time zone, and (b) print time in 24-hour format, or in the am or pm format, or with respect to UTC
(coordinated universal time; formerly known as Greenwich Mean Time).

Main Course
Problem 7.4
Develop a CFraction class to store fractions and support the following operations – addition, subtraction, multiplication and
division via Add, Subtract, Multiply, and Divide member functions. The prototype of a typical public member function is as
follows.
void Add (const CFraction&, const CFraction&);
Also, develop a void Display (const std::string& strMessage); public member function that will display the fraction in its
reduced form preceded by the std::string argument.
Problem 7.5
Develop a CTriangle class as a composite class using the CPoint class object. The triangle is described in terms of the  x , y 
coordinates of the three vertices. The class should store information on the triangle such as perimeter, area and the three angles
and have public accessor functions for these attributes. It should also have public predicate functions to test and see if the
triangle is an isosceles (bool IsIsosceles()) triangle, right triangle (bool IsRightTriangle()) triangle or an equilateral (bool
IsEquilateral()) triangle. Also, develop a void Display (const std::string& strMessage); public member function that
will display the all the stored properties of the triangle.
Problem 7.6
Develop a CQuadraticPoly class to find the roots of a quadratic polynomial ax 2  bx  c . Construct the default and an
overloaded constructor that accepts a , b , c . Store a , b , c as private member variables. Construct a public function bool
ComputeRoot (float&, float&) where the return value is true if the roots are real and false if the roots are imaginary. The two
parameters are the roots of the polynomial.

C++ Concepts
Problem 7.7
Develop a CNumDiff class to numerically differentiate a function using a difference formula. You should use the technique
discussed in Section 6.4 to support user-defined functions.
Problem 7.8
Develop a CNumIntegrate class to numerically differentiate a known function using one of the Newton Cotes techniques. You
should use the technique discussed in Section 6.4 to support user-defined functions.

S. D. Rajan, 2000-24 7-201


C L A S S E S : O B J E C T S 1 0 1

Problem 7.9
Develop two blueprints for a solid geometry program – one with and one without classes. The program should have the features
to compute quantities such as length, interior angles, perimeter, area, surface area, and volume for the following objects –
triangle, quadrilateral, tetrahedron, hexahedron, prism, pyramid, cylinder, and cube. Discuss the pro and cons of the two
programs.

S. D. Rajan, 2000-24 7-202


C L A S S E S : O B J E C T S 1 0 1

References
Cockburn, Writing Effective Use Cases (The Crystal Collection for Software Professionals), Addison Wesley, 2000.
Bergin, Data Abstraction – The Object-Oriented Approach Using C++, McGraw-Hill, 1994.
Page-Jones, Fundamentals of Object-Oriented Design in UML, Addison Wesley, 2000.
Pressman, Software Engineering: A Practitioner’s Approach, McGraw-Hill, 2001.
Schach, Classical and Object-Oriented Software Engineering with UML and C++, McGraw-Hill, 1999.
Lee and Tepfenhart, UML and C++: A Practical Guide to Object-Oriented Development, Prentice-Hall, 2001

S. D. Rajan, 2000-24 7-203


C L A S S E S : O B J E C T S 1 0 1

S. D. Rajan, 2000-24 7-204


8
P O I N T E R S

Chapter

Pointers
“Agreatmemorydoesnotmakeamind,anymorethana dictionary isapieceofliterature.”John HenryNewman

“Memoryisthesecondthingtogo.”

General-purpose programs handle objects whose size is known only at run time and whose size may change dramatically during
the course of execution. For example, a program that draws an X-Y graph is much less useful if it sets a predefined limit on the
number of points it can handle or, if it does not allow addition or deletion of points. In this chapter and the next, we will see
the basics of how to write programs where memory allocation to store scalars and arrays are handled dynamically at run time.
Nothing is free! The process of managing resources can be troublesome and can lead to unintended consequences. While the
resources on any computer system are finite, programmers are being asked to create programs that can handle bigger problems,
run faster, and yield more accurate results.

Objectives
 To understand the concept associated with pointers.
 To understand more about dynamic memory allocation.
 To understand and practice writing C++ programs where memory allocation and deallocation are managed.

S. D. Rajan, 2002-24 8-205


P O I N T E R S

8.1 Memory Management


It is quite helpful to understand how memory is managed in a typical computer so that we can develop programs that manage
memory resources intelligently. In this section, we present a simplified but nevertheless, useful view.
In a typical desktop computer or a workstation, information containing both data and instructions, flows through a number of
hardware devices. Fig. 8.1.1 shows a schematic diagram. A computer program typically resides on a permanent storage device
such as a (hard) disk. Program execution starts when the information from the computer program is loaded into the RAM via
the disk I/O bus. From the RAM, information is passed onto the CPU for processing through the memory bus. Memory has
its own hierarchy. There are registers in the CPU that store information. These are located closest to the logic units within the
CPU where the information is processed. Cache memory forms the next level in the hierarchy. Physically, they are located either
on the CPU chip or on a separate board and have a direct link to the CPU. There are different levels of cache identified as L1,
L2 or even L3 cache. These are very high-speed memory (hence more expensive) but can store much less information than
RAM. L1 cache is on the same chip as the microprocessor whereas the other cache memories are on a separate chip. Typically,
the cache sizes vary from a few kilobytes (KB) to a few megabytes (MB) whereas RAM typically varies between a few MB to a
few gigabytes (GB).

CPU Cache
Memory Bus

RAM

Disk I/O Bus

Disk

(a)
Fig. 8.1.1 (a) Simplified Memory Hierarchy (b) Intel Core i7 Processor Layout
The contents of the RAM can be thought of as being divided into two distinct parts – as used by the operating system and those
used by one or more application programs (Fig. 8.1.2).
RAM

Operating Application
System Program

Fig. 8.1.2 Memory Usage


Let’s look at the simplest case where the amount of RAM is greater than the amount of memory required by the operating
system and a single active application program. The program instructions and data are transferred from the RAM to the CPU
on demand. What is the purpose of the cache? The CPU first looks to see whether the information it needs is in cache memory.
If it is (cache hit), it fetches the information for processing. This may take a few nanoseconds. If the information is not in cache
(cache miss), then the information must exist in the RAM and appropriate instructions are issued. This operation may take tens
and hundreds of nanoseconds. One can see from this simple example, that everything else being the same, a program will run
faster if there are more cache hits than misses. A program with more cache hits than another program is said to have more
locality of reference.

S. D. Rajan, 2002-24 8-206


P O I N T E R S

Application
RAM Program

Operating 1 2 3 4 5
System

X MB

1 Hard Disk
2
Virtual
3
Memory Other Files
4 (Paging File)
5 Y GB
Z GB
6
7
Application Virtual Memory
Page Table
Fig. 8.1.3 A simplified virtual memory scenario
Let us look at the next case where the amount of RAM is less than the amount of memory required by the operating system
and a single active application program. This situation is depicted in Fig. 8.1.3. An operating system that handles such a scenario
is called a virtual memory operating system. Examples include Microsoft Windows, Linux, the different flavors of Unix, etc.
Conceptually, the memory is divided into pages. The size of a page (in bytes or KB or MB) is a function of the OS. In the figure,
let us assume that after the entire operating system is loaded into RAM, five pages can be loaded into RAM. We will label these
pages as 1, 2, …, 5. Let us now assume that we wish to execute an application that requires a memory equivalent of 7 pages.
Pages 1 and 2 are pages that contain program instructions, and the rest of the pages contain program data. In a virtual memory
OS, a special part of hard disk is set aside for virtual memory related operations. Typically, the paging file size is much larger
than the size of RAM. The purpose of the paging file is to maintain a copy of the program’s information. This information is
then made available to the CPU on demand from the disk to the CPU. This operation may take a few milliseconds to complete
– three orders of magnitude slower than the amount of data transfer time from the RAM to the CPU. In the example, pages 1,
2, 3, 5 and 7 are called swapped-in pages since they are currently in RAM, and pages 4 and 6 are called swapped-out pages since they
are currently not in RAM.
To maintain coherence between the different copies of program information, several strategies are implemented in a typical OS.
One of the techniques is to use a virtual memory page table. This page table contains the mapping between the pages in the
RAM and the location on the disk (part of the virtual memory paging file system) associated with the application. When an
application issues a request for information, the system computes the memory address as a page number and checks to see if
the page is in RAM. If it is (page hit), it fetches the information for processing. If the information is not in RAM (page miss),
then the information must exist on the hard disk and appropriate instructions are issued. One can see from this simple example
that everything else being the same, a program will run faster if there are more page hits than misses. A program with more page
hits than another program, is said to have more locality of reference.

S. D. Rajan, 2002-24 8-207


P O I N T E R S

Memory management becomes more complex as we begin to change the assumptions made with the previous examples. How
is memory to be allocated and deallocated if these operations take place during the execution of the program? Most OS use heap
or free store to carry out dynamic memory management. This is the memory area that gets affected due to the use of new and
delete operations. We have memory leak if new is used without a corresponding delete. Similarly, we have access violation, if the
program tries to access memory that is not allocated or refers to a non-existent memory location or address.
There are several types of objects and non-objects that a typical C++ program controls in different memory areas – const data,
heap, free store, stack, and global and static memory areas.
const data: This is a special memory area that is protected and cannot be modified (read only). Only primitive data types whose
values are known at compile time can be stored here. Objects cannot be stored. The data here is available for the entire duration
of the program.
Stack: The stack is used to store automatic variables. The objects are created as soon as memory is allocated and destroyed
immediately before memory is deallocated.
Free store: This memory area is used for dynamic memory allocation and is affected by new and delete operators. Memory for
objects is allocated but this memory may not be immediately initialized. This memory may be accessed and manipulated outside
of the object’s lifetime but while the memory is still allocated.
Heap: Heap is the second memory area used for dynamic memory allocation and is affected by malloc and free operators
more commonly associated with C.
Global/static: Storage allocation for global or static variables takes place at program startup and the initialization takes place
subsequently. For instance, a static variable in a function is initialized only the first-time program execution passes through its
definition.
Finally, it is helpful to understand what happens when objects are repeatedly created and destroyed on the free store or heap.
Memory when used in this fashion, becomes fragmented. Unlike some other languages, C++ does not carry out garbage collection.
Garbage collection is the process of recycling the memory space when that memory space is no longer needed. Programmers
understanding and developing programs for numerical analysis must recognize that memory management is an important issue
as are speed of execution and accuracy of results.
In the rest of the chapter, we will start looking at how to use free store during dynamic memory allocation.

8.2 Pointers
As people can locate places and things based on their addresses, pointers in C++ can be used to manipulate information based
on memory addresses. Pointer data types represent a reference to an object, or a location and pointers may be specialized by
the type of the object referred to. In this chapter, we will see pointers represented by a memory address; however, they can be
more complicated as we will see in later chapters.
Here is a declaration of a pointer variable.
int *pnX;

The variable pnX can hold the memory address of a variable of type int. Note that the declaration requires an asterisk * in front
of the variable name. The above style is preferable to the following:
int* pnX;

The problem occurs when multiple variables are declared with the same statement. For example, what is implied by the following
statement?
int* pnX, pnY;

As it turns out, pnX is a pointer variable but pnY is a regular int. If the intent is to declare both of them as pointer variables, one
must use the following declaration.
int *pnX, *pnY;

So how does one use a pointer variable? Consider the task of pointer variable pnX required to point to an int variable nX. One
could generate the following statements.
int *pnX, nX; // pnX is a pointer (variable) to an integer, nX is an integer variable
nX = 10; // nX now has a value of 10
pnX = &nX; // pnX now contains the memory address of nX

S. D. Rajan, 2002-24 8-208


P O I N T E R S

In the first statement, variable pnX is declared as a pointer variable and nX is declared as an int. In the second statement, the
value of nX is set as 10. In the third statement, the value (memory address) of pnX is set by using the address symbol & (recall this
is the same symbol that is used when a function argument is declared as passing-by-reference) along with the variable that the
pointer variable is pointing to. Let us now look at an example to learn more about pointers.
Example Program 8.2.1 Example Showing Pointer Usage
In this example we will see the use of int variables and int pointers. The basic ideas can be extended to other data types. Simple
pointer arithmetic is also illustrated.
main.cpp

The program uses three variables that are declared in lines 22, 23 and 24. Two of these variables are pointer variables. The value
of the int variable, nV1, is assigned in line 26. The memory address of this variable is obtained in line 27 and assigned to the
pointer variable, pnV1. Then the function ShowValues is called to display three lines of output associated with the int variable
and the pointer variable associated with that int variable.
In line 12 we see how pointers can be used as function arguments. As a matter of style, we will use what is shown in line 3 rather
than
void ShowValues (int nV, int* pnV)

just to drive home the point that the second parameter is a pointer to an int. In line 17, we display the int value at the memory
address pointed to by pnV. When the usage (with a pointer variable) involves the asterisk symbol, *, the symbol is referred to as

S. D. Rajan, 2002-24 8-209


P O I N T E R S

the dereferencing operator. The pointer variable must be dereferenced when the value stored at the memory location needs
to be accessed. In other words, pnV refers to the memory address and *pnV refers to the value in memory pointed to by pnV.
Look at line 31. The value at the memory location pointed to by pnV1 is set to 100. In the subsequent display of the values (see
Fig. 8.2.1) we see that the value of nV1 is also changed because of this statement. The memory address is expressed in hex
(hexadecimal) and is likely to be different on different machines. In line 35, we see an assignment statement involving pointer
variables. Note that
pnV2 = pnV1; // address assignment
is not the same as
*pnV2 = *pnV1; // value assignment
In line 36, the value at the memory location pointed to by pnV2 is set to 400. In the subsequent display of the values (see Fig.
8.2.1) we see that the value nV1 and consequently, *pnV1, is also changed because of this statement.

Fig. 8.2.1 Sample output created by Example Program 8.2.1


As we can see from this simple example, the use of pointer variables can have unintended consequences if used incorrectly.

8.3 Dynamic Memory Allocation


Memory can be allocated and deallocated (or released) at run-time in C++ using two operators – new and delete.

new operator
In its simplest form, the new operator has the following syntax.
new type‐name;

where type‐name is a valid object. The new operator can be used to allocate objects and arrays of objects. This allocation takes
place from a program memory area called the “heap” or “freestore.” We will see more about memory-related issues in the next
section. When new is used to allocate a single object, it yields a pointer to that object; the resultant type is type‐name*. When new
is used to allocate a singly dimensioned array of objects, it yields a pointer to the first element of the array, and the resultant type
is type‐name*. Here is an example of declarations and usage.
float *pfX, fY; // pfX is a pointer variable, fY is a float variable
pfX = new float; // allocates memory equal to storage of one float variable
*pfX = 43.5f; // use as any other float variable with dereferencing operator
fY = *pfX; // now fY has the same value as pointed to by pfX

delete operator
It is a good practice to release the memory occupied by an object when that object is no longer needed. This is accomplished
in C++ using the delete operator. In its simplest form, the delete operator has the following two forms.
delete pointer‐object;
delete [] pointer‐object;

S. D. Rajan, 2002-24 8-210


P O I N T E R S

where the first form is used for one object and the second form is for deallocating a number of objects. Here is an example
illustrating both the new and the delete operators.
float *pfX, *pfVY;
pfX = new float; // allocates memory equal to one float
pfVY = new float[40]; // allocates memory equal to 40 floats
….
delete pfX; // releases memory back to freestore
delete [] pfVY; // releases memory back to freestore

It is time now to look at a small but complete example.


Example Program 8.3.1 Using new and delete
We saw how classes were defined and used in Chapter 7. Recall the CPoint class from Section 7.2. To define and use this class
we used statements such as
CPoint Point12;

The compiler generates the appropriate statements to obtain the memory required to store the instructions associated with the
CPoint class and the memory locations to store the data associated with the object Point12. In this example, we will see another
approach in obtaining the resources (memory) dynamically.
main.cpp

In line 15, pPoint12 is declared as a pointer to a CPoint object. However, the memory to hold a CPoint object is allocated in line
19. The coordinate values are set in line 22. Note the usage of the member function SetValues. When an object is used to
invoke the member function, the member selection operator . is used. When a pointer to an object is used, the member selection
via pointer operator ‐> (or arrow pointer) is used. C++ provides another way to write the statement in line 13 as
(*pPoint).SetValues (1.2f, ‐17.65f);

Finally, note line 28. The memory allocated in line 19 using new must be deallocated using the delete operator.
We will see more useful usages of dynamic memory allocations in later examples.

S. D. Rajan, 2002-24 8-211


P O I N T E R S

Functions with pointer parameters


We saw two different ways of passing parameters in Chapter 4 – call by value and call by reference. Now we will see examples
of call by pointers (or call by reference via pointer arguments). Here is a simple example that illustrates the important concepts.
Example Program 8.3.2 Function Calls and Pointer Arithmetic
In this example we will see two aspects dealing with pointers
(a) call using pointer arguments, and
(b) pointer arithmetic with vectors.
We will also see how to use the const qualifier to protect against inadvertent changes.
main.cpp

We define two int variables nA and nB in lines 21 and 22. The function CallViaPointers is invoked in line 27. As we can see
from the function definition in lines 12 through 16, the first argument is a constant int pointer and the second argument is a
pointer to an int. In the function, the value of the second argument is set as twice the value of the first parameter. Note the
expression in line 6 and the manner in which the dereferencing operator is used. While the following statements will work
*nB = 2**nA;
*nB = 2* *nA;

they are more difficult to read and understand.

S. D. Rajan, 2002-24 8-212


P O I N T E R S

In line 34, a double precision vector dVX of length 4 is declared and initialized. In line 35, a double precision pointer is declared,
and the address is set as the starting location of the dVX vector - when the vector index is not defined, the first location or [0],
is implied. Values in the vector can be accessed using pointers as in line 42. With this example, the four memory locations in
the dVX vector can be accessed as *(pdVX),*(pdVX+1),*(pdVX+2) and *(pdVX+3). In other words, by adding an integer to the
memory address stored in a pointer, we are able to access other memory locations as long as the data type does not change, and
the resulting memory address is a legal address.

nullptr pointer
C++ has pointer literal of type std::nullptr_t called nullptr. When a pointer is assigned as a nullptr it implies that the pointer
does not point to an object. For example, if memory allocation takes place via new and if the memory allocation fails, the pointer
variable holding the memory address is assigned a nullptr value. The programmer must check to see if the memory allocation
was unsuccessful and take appropriate action.
Example Program 8.3.3 Vector Data Type
We can now finally show an example where the memory for a vector data type can be allocated and deallocated dynamically.
The example is a modification of the second part of the previous example.
main.cpp

S. D. Rajan, 2002-24 8-213


P O I N T E R S

Instead of a static definition of the vector as


const int n=4; // n must be declared as const int
float fVX[n];

we have
float *fVX;
int n = 4; // n need not be declared as const int
fVX = new float[n];

Note that the vector is dynamically allocated. We could have obtained, say from the user of the program, the size of the vector
to be used in the program as follows.
int nSize;
GetInteractive (“What is the size of the vector? “, nSize);
float *fVX;
fVX = new float[nSize];

Once past the initial declaration and allocation, we have exactly the same usage for fVX in the program. Access to the elements
of the vector is via the [] operator. In other words
*(fVX+i)

is equivalent to
fVX[i]

At the end of the program when fVX is no longer needed, memory is deallocated in line 43.
Tip: Vectors and pointers are intimately related. For example, if we have
float fVX[4];
float* pfVX = fVX;

then
pfVX = &(fVX[0]) is the same as pfVX = fVX
fVX[i] is the same as *(pfVX + i)
&(fVX[i]) is the same as (pfVX+i)
pfVX[i] is the same as fVX[i]

We will now take these ideas and develop a class to handle vector data type.

8.4 Case Study: A Simple Vector Class


We finish this chapter’s examples with a case study involving the development of our own vector class. As we have seen before,
the C++ vector is statically allocated. In other words, the size of the vector must be declared when writing the program and
cannot be changed.
Example Program 8.4.1 A Simple Vector Class
In the last section we saw how to dynamically allocate and deallocate storage space using new and delete operators. In this
section, we will show how we can start improving the features of dynamically controlled vector as follows.
(a) First, we will declare and define a vector class, CMyVector used to store floating point numbers. The memory allocation
and deallocation will take place within the class without the user having to explicitly allocate and deallocate. We will
finally show where the class destructor can be used. This will take care of one of the problems with dynamic memory
management – memory leaks, where memory is allocated but not deallocated.
(b) Second, we will control the access to the vector elements in two different ways. The first method will be via a public
member variable – the pointer containing the memory address of the vector. The second method will be via a member
function that will check to see if the vector index has a valid value. An assertion failure will take place if the index has
an invalid value.
(c) Third, we will show an example of a reference return type for a function.

S. D. Rajan, 2002-24 8-214


P O I N T E R S

(d) Finally, we will illustrate how one can start building vector operations that are a part of the class’s publicly available
member functions.

We will first look at the design of CMyVector class.


CMyVector.h

The default constructor is shown in line 11. The overloaded constructor in line 12 has a single argument – the number of rows
or elements in the vector. The copy constructor in line 13 also has a single argument. For the first time as we will see below, the
destructor shown in line 14, will not be empty. There are four helper functions. The GetSize function returns the number of
rows in the vector. The At function used to access the elements of the vector is overloaded. In line 20, we declare the version
used to obtain a floating-point value of the ith element. In line 21, this version of the At function returns a float reference to
the ith element of the vector. The Display function is designed to display the elements of the vector. Finally, we show the genesis
of a numerical-oriented vector operations library in the form of the DotProduct function. We will leave the development of
other useful vector-related functions as an exercise.
The attributes of the class, the pointer to contain the memory address and the number of rows in the vector, should be declared
as private. However, we will declare the pointer, m_pData as a public variable to illustrate how the vector elements can (not
should) be accessed.
Here is the implementation of the CMyVector class.
CMyVector.cpp
The default constructor is used for initializing the member variables – number of rows to zero and the pointer variable to
nullptr; no memory is allocated. However, the overloaded constructor uses the value of the argument (nRows) to dynamically
allocate the memory. In both the overloaded constructor and the copy constructor, the try-catch block is used to detect if
memory allocation did not take place. The destructor is defined in lines 52 through 59. Memory deallocation takes place here
automatically in the sense that the destructor is called automatically when the CMyVector object goes out of scope. We could
have also written the check (line 54) as
if (m_pData != nullptr)

S. D. Rajan, 2002-24 8-215


P O I N T E R S

In the design of the class, zero-sized vectors cannot be defined or copied.

The value-based vector access function is defined in lines 62 through 76. A check is made in line 67 to see if the vector index
has a valid value. The reference-based vector access function is defined in lines 79 through 93. In this function, the memory
reference is returned. What is the difference between lines 70 and 87 that appear to be identical? The use of the value-based
function can take place only when the vector element is not being modified. In other words, if we had only the value-based
function, the following statement
fVA.At(i) = 12.06f;

S. D. Rajan, 2002-24 8-216


P O I N T E R S

would not compile since a value cannot appear as a l-value (left of the assignment operator). On the other hand, pointers and
references can appear as both l-value and r-value. When a reference to a value is returned from a function, the address (or
reference) operator is not required. What is one of the advantages of the reference return type? With the reference version of
the At function, we can have cascading statements of the form
fVX.At(i) = fVX.At(i+1) = 3.1415926f;

Finally, a note about the development of vector-related functions. The DotProduct function is defined in lines 110 through 130.
Checks are made in the function to ensure that the operation is valid (see line 116).

S. D. Rajan, 2002-24 8-217


P O I N T E R S

The vector elements are accessed in line 121 and the At function is used as a safety precaution (exception handling). We could
have rewritten the statement as
fDP += m_pData[i] * fV.m_pData[i];

Finally, we will see how to use the CMyVector class in a program. In this program we will dynamically allocate and populate two
vectors and then compute their dot product.
main.cpp

The two vectors are declared in line 22 both of the same size. We show two different ways of populating the vectors in lines 28
and 31. The disadvantage with the usage on line 28 is that the vector index value is not checked for correctness (it can be if the
[] operator is overloaded that we will see in Chapter 9) whereas the check is carried out in the At function. The dot product
function is called in line 35. One could also have written the statement as
float fDotP = fVY.DotProduct (fVX);

since X  Y = Y  X . The dot product result is displayed in lines 39 through 41 using the Display member function to display
the contents of the two vectors. The copy constructor usage is briefly illustrated in line 43.

S. D. Rajan, 2002-24 8-218


P O I N T E R S

There are two catch blocks – one for memory allocation error (new fails) and one for the errors emanating from the CMyVector
class.

A retrospective look at what we have done in this example is necessary to lay the foundation of what we can do to improve the
program, some of which we will see in the next chapter.
(1) There is a deficiency in the class definition. What if we declared a vector as
CMyVector fVA;
The default constructor would be called setting the number of rows in the vector to zero. How do we then set the
size of the vector later in the program? The solution to this problem is to define a member function, say void SetRows
(int nRows) that would behave like the overloaded constructor. However, it would first check to see if memory for
the vector has been allocated before. If memory has been allocated, it would then deallocate the memory and allocate
new memory as specified by the function parameter.
(2) As we mentioned in Chapter 4, functions can be made more useful if they can be converted to a function template.
With this example, we can declare and manipulate only vectors containing float values. We will learn how to declare
and define template classes in the next chapter.
(3) Ideally, the pointer variable m_pData should be a private member variable since it would be disastrous to change the
address either inadvertently or willfully. Furthermore, it is awkward to use the At function to access the elements of
the vector. Ideally one would like to access elements of the vector using () or [] such as
fVX(i) = static_cast<float>(log(i+1));
Both these issues can be addressed by using operator overloading as we will see in the next chapter.

S. D. Rajan, 2002-24 8-219


P O I N T E R S

Summary
In this chapter we saw more about memory management and especially dynamic memory management. We learnt about
pointers, pointer arithmetic, the new and delete operators, and the development of classes where memory allocation and
deallocation can take place in a consistent and safe manner. There are much more advanced techniques in C++ to safely use
(and misuse) pointers such as smart pointers described in C++ standard library (e.g., unique_ptr, shared_ptr, weak_ptr etc.)
but that discussion is outside the scope of this book.

Where to go from here?


Broadly speaking, there are three important traits to numerical techniques – robustness, accuracy, and efficiency. Robustness
implies that the technique produces predictable results and is more or less, immune from issues such as initial guesses. Accuracy
refers to the correctness of the final solution. Finally, efficiency refers to the computational effort needed to obtain the final
result. In this chapter, one of the foci was on efficiency – how the algorithm implemented in a particular hardware affects the
time it takes to carry out the computations. For those readers who are looking for additional challenges, search the internet for
articles that describe how to write more efficient algorithms and how machine architecture affects the efficiency. A very short
list is provided in the References section.

S. D. Rajan, 2002-24 8-220


P O I N T E R S

Exercises
Most of the problems below involve the development of one or more classes. In each case (a) develop a plan to test
the classes(s), and (b) implement the plan in a main program.

Appetizers
Problem 8.1
What is the output that is generated by the following statements?
int *pnA;
int nA = 5;
pnA = &nA;
std::cout << "nA is " << nA << " and *pnA is " << *pnA << '\n';
*pnA = 10;
std::cout << "nA is " << nA << " and *pnA is " << *pnA << '\n';
Problem 8.2
What is the output that is generated by the following statements? Is there an error in this program segment?
int *pnA, *pnB;
pnA = new int;
pnB = new int;
*pnA = 100;
*pnB = 110;
std::cout << *pnA << “ “ << *pnB << ‘\n’;
pnA = pnB;
std::cout << *pnA << “ “ << *pnB << ‘\n’;
*pnA = 300;
std::cout << *pnA << “ “ << *pnB << ‘\n’;
delete pnA;
delete pnB;
Problem 8.3
Will the following statements compile? If not, correct the statements. What is the output that is generated?
int *pnVA;
int nVA[4] = {0, 1, 2, 3};
pnVA = nVA;
std::cout << *pnVA << ‘\n’;
std::cout << pnVA[2] << ‘\n’;
++pnVA;
std::cout << *pnVA << ‘\n’;
std::cout << pnVA[2] << ‘\n’;

Main Course
Problem 8.4
Extend the capabilities of the CMyVector class by adding a member function
void SetRows (int nRows);
that would dynamically either set the size or reset the size of a vector.
Problem 8.5
Extend the capabilities of the CMyVector class by adding other vector operations as public member functions.
Function Prototype Remarks
float MaxNorm ();
Computes the maximum absolute value. x   max  x i 
float TwoNorm(); n
Computes the length of the vector as x 2  x i 1
2
i

float MaxValue (); Compute the largest value.


float MinValue(); Computes the smallest value.

S. D. Rajan, 2002-24 8-221


P O I N T E R S

void UnitVector (const CMyVector& fVB, Computes the unit vector from current point to point fVB and
CMyVector& fVUnitVector);
stores the result in fVUnitVector.
void CrossProduct (const CMyVector& fVB, Computes the cross product between the current vector and fVB
CMyVector& fVR); and stores the result in fVR.

C++ Concepts
Problem 8.6
It is possible to have a pointer to a pointer. For example, the following
int **pMA;

defines a pointer pMA that points to a pointer. Consider the following code segment.
int *pVA[3]; // a vector of int pointers
int **pMA; // pointer to a int pointer
int nV1[2] = {11, 12};
int nV2[2] = {21, 22};
int nV3[2] = {31, 32};

pVA[0] = nV1; // stores the address of vector nV1


pVA[1] = nV2; // stores the address of vector nV2
pVA[2] = nV3; // stores the address of vector nV3

for (int i=0; i < 3; i++)


{
pMA = &pVA[i]; // grab the address stored in pVA[i]
for (int j=0; j < 2; j++)
{
std::cout << "[" << i << "," << j
<< "] = " << pMA[0][j] << '\n';
// notice how the dereferencing takes place above
// using two indices
}
}
There are two key statements in the above code.
pMA = &pVA[i]; // grab the address stored in pVA[i]
In the above statement the address stored in pVA[i] is assigned to pMA. The next important statement is the statement in which
the int value is accessed via
pMA[0][j]
In general, pMA[i][j] would imply the value stored at the memory location that is accessed as follows - Locate the memory
address stored in pMA[i], to that address add j and finally, get the value that is stored at that memory location!
Use this idea to create a class for storing two-dimensional arrays.
[Hint: See the book, Press et. al., Numerical Recipes in C, Cambridge Press.]
Demonstrate your implementation using the following class definition.
class CMyMatrix
{
public:
CMyMatrix (); // default constructor
CMyMatrix (int nRows, int nCols); // constructor
~CMyMatrix (); // destructor

// helper functions
int GetRows();
int GetColumns();
float At (int, int) const; // for use as rvalue
float& At (int, int); // for use as lvalue
void Display (const std::string&) const;

private:

S. D. Rajan, 2002-24 8-222


P O I N T E R S

float **m_pData; // where the matrix data are


int m_nRows; // # of rows
int m_nColumns; // # of columns
};

S. D. Rajan, 2002-24 8-223


P O I N T E R S

References
http://www.agner.org/optimize/
https://www.youtube.com/watch?v=L7zSU9HI-6I (a very long video)
https://lwn.net/Articles/250967/
https://www.youtube.com/watch?v=YQs6IC-vgmo

S. D. Rajan, 2002-24 8-224


9
C L A S S E S : O B J E C T S 2 0 2

Chapter

Classes: Objects 202


“I'm not smart, but I like to observe. Millions saw the apple fall, but Newton was the one who asked why.” Bernard
MannesBaruch

“Knowledgeisnot achieveduntilshared.”Anon

“Ifalittleknowledgeisdangerous, whereisthemanwhohassomuchastobeoutofdanger?” ThomasH. Huxley

Ideas dealing with classes and associated objects were introduced in Chapter 7. In Chapter 8, we looked at pointers and resource
(memory) allocation issues. We will build on those ideas in this chapter by learning about operator overloading, static member
functions and variables, template classes, and arrays where memory is dynamically allocated. We will use exception handling to
ensure that potential problems are detected and handled appropriately.

Objectives
 To build on the earlier concepts associated with classes and objects.
 To understand the concepts associated with operator overloading, template classes, and dynamically managed arrays.
 To learn more about software engineering and the development of a matrix toolbox necessary for numerical analysis.
 To understand how to routinely use C++ exception handling mechanism.

S. D. Rajan, 2000-24 9-225


C L A S S E S : O B J E C T S 2 0 2

9.1 Operator Overloading


We learnt how to define our own data types using classes. C++ extends this capability by allowing operators normally associated
with the standard data types to be also associated with user-defined classes. This is known as operator overloading. Let us look
at an example to understand what this means. The CPoint class was developed in Chapter 7. It would be nice to extend the
capabilities of the class to be able to carry out the following operations via these statements.
CPoint P1(10.2f, ‐4.9f), P2(‐3.2f, 55.0f), P3;
P3 = P1 + P2; // overloaded operator +
cout << P3 << “\n”; // overloaded operator <<

Unless the operators + and << are overloaded, the above statements will not compile.
Before we learn the details of operator overloading, we need to understand (a) a special pointer called the this pointer, (b) what
is meant by friend classes and functions, (c) how the const qualifier should be used, and (d) the reference operator &.

this pointer
There is a special pointer called this that provides every object access to itself. This pointer can be used to refer to both the
member variables and member functions. Here is an example that uses this pointer in the (rewritten version of the) Display
member function in the CPoint class.
// helper function
void CPoint::Display (const std::string& strBanner)
{
// display the current coordinates
std::cout << strBanner
<< "[X,Y] Coordinates = ["
<< this‐>m_fXCoor << ","
<< (*this).m_fYCoor << "].\n";
}

In this function, the use of this pointer is an unnecessary concoction since m_fXCoor is equivalent to this‐>m_fXCoor, etc.
Also note that, this‐> is equivalent to (*this). as we obtain the value of the two coordinates two different ways. We will see
a much better example of the use of this pointer with operator overloading.

friend classes and friend functions


The private member variables and functions as we saw in Chapter 7 cannot be accessed outside of the class. However, there
is an exception. A friend of a class also has access to the private members. One can declare a function or an entire class to be
a friend of a class. Here is an example of CTriangle class that is declared to be a friend of the CPoint class.
class CPoint
{
friend class CTriangle; // CTriangle class is a friend

public:
// constructor
CPoint (); // default
….
private:
float m_fXCoor;
float m_fYCoor;
};

The CTriangle class is declared to be a friend of the CPoint class in the CPoint class header file. The friend keyword is associated
with neither public nor private qualifiers. Hence it should be declared as shown in the above example. With this declaration, we
could have the following statements in a CTriangle class to access the two private member variables.
#include "point.h"
class CTriangle
{
public:

S. D. Rajan, 2000-24 9-226


C L A S S E S : O B J E C T S 2 0 2

// constructor
CTriangle (); // default
….
private:
CPoint m_Vertex[3];
};
…..

void CTriangle::ComputeSide ()
{
float fSide1 = Distance (m_Vertex[0].m_fXCoor, m_Vertex[0].m_fYCoor,
m_Vertex[1].m_fXCoor, m_Vertex[1].m_fYCoor);
….
}

A few things to note about the “friend” concept. If CPoint declares CTriangle as a friend, then CPoint does not automatically
gain the friend status of CTriangle. This declaration must be explicitly made in the CTriangle class. In other words, a class must
be explicitly identified as a friend class; the reciprocity idea does not apply here. Also, if CTriangle is a friend of CPoint class
and CPrism is a friend of CTriangle class, then one cannot infer that CPrism is a friend of CPoint class. Proper use of the friend
class concept can enhance the readability and performance of a program.
Finally, one can also declare and define friend function of a class. This is a function that is defined outside the class but has
access to all the members of a class. We will see examples of this friend function with the overloaded << and >> operators.

Proper use of const


Proper use of the const qualifier introduces a defensive programming mechanism within the source code. Let’s review what we
have learnt so far by looking at a few function prototypes.
int Add (int n1, int n2); // no const used here
int Add (const int n1, const int n2); // alternate version

Both these declarations will work fine since the arguments are passed by value. However, the second version adds an additional
check so that if an attempt is made in the Add function to change the values of either n1 or n2, the compiler will issue an error
message.

void DisplayMessage (std::string strMessage); // no const used here


void DisplayMessage (std::string& strMessage); // no const used here
void DisplayMessage (const std::string& strMessage); // const used here

The three versions are quite different. The first case has pass-by-value argument. The copy constructor is called and a copy of
the variable strMessage is created and used in the function. If strMessage is modified in the function, the changes do not
propagate back to the calling function. However, additional instructions are executed due to the creation of a local copy of the
variable. The second version has pass-as-reference argument, and no local copy of the variable is made. If the intent is to just
display the message in the function, it is possible to inadvertently modify the message. However, this cannot take place in the
third version due to the use of const qualifier.
Now let us look at the following case where another function is called from a function.
void DisplayMessage (std::string& strMessage); // no const used here
….
int ComputeAbsSum (const std::string& strMessage,
int nV[], int nValues)
{
int nSum = 0;
if (nValues <= 0)
DisplayMessage ("Invalid number of values");
else
{
for (int i=0; i < nValues; i++)
nSum += abs(nV[i]);
DisplayMessage (strMessage);
}

S. D. Rajan, 2000-24 9-227


C L A S S E S : O B J E C T S 2 0 2

return nSum;
}

This code will not compile since strMessage is declared as a const variable within the ComputeAbsSum function and cannot be
passed to the DisplayMessage function where it is not a const string. In other words, if an argument is declared as a const in
one function then it must be declared as a const in all functions where it is used. We can also use the const qualifier with return
values from a function. For example, the following function prototype
const int HowMany ();
signifies a function that returns an integer. However, the returned value cannot be modified because of the const qualifier used
before the function return type. Finally, we can use the const qualifier with calling objects. We can have a member function
designed so that it does not modify the value of the calling object.
class CPoint
{
public:
void Display () const;
…..
private:
int m_nItems;
}

In this example, by placing const at the end of the Display function declaration, we tell the compiler not to allow the function
to change the value of the calling object. The following function body is incorrect if m_nItems is a member variable of the CPoint
class.
void CPoint::Display () const
{
++m_nItems; // NOT correct
std::cout << "Number of items is " << m_nItems << "\n";
}

The correct function body is as follows.


void CPoint::Display () const
{
std::cout << "Number of items is " << m_nItems << "\n";
}

References
We saw the reference operator & used and discussed in Chapters 4 and 7. To make operator overloading work, we need to
understand references to functions. Returning a reference to a function is similar to returning an alias to a variable. If an object
returned by a function is to be an l-value then it must be returned by reference. Let T denote a class type. Consider the following
function prototypes.
(a) T function (); where a value is returned and cannot be used as l-value. The returned value can be changed.
The copy constructor is used in this case.
(b) const T function (); is the same as (a) but the returned value cannot be changed.
(c) T& function (); where a reference is returned can be used as l-value. The returned value can be changed. The
copy constructor is not used in this case.
(d) const T& function (); where a const reference is returned cannot be used as l-value. The returned value
cannot be changed. The copy constructor is not used in this case.
The differences will become clear as we look at a specific example.

Operator overloading
As we mentioned at the beginning of this section, C++ allows operator overloading with user-defined data types makes
programs easier to read and maintain. The overloaded operators can be implemented either as global functions or as member
functions of a class. The syntax is as follows.

S. D. Rajan, 2000-24 9-228


C L A S S E S : O B J E C T S 2 0 2

operatorx
where operator is a C++ reserved keyword and x is the operator. Here are the definitions for the addition operator and the
stream insertion operator.
operator+
operator<<

The general syntax, when used in the context of a function, is as follows.


returntype classname::operatorx ()
where x is the operator to be overloaded.

Operators that can be overloaded


The following operators can be overloaded.
Operator Name Type Operator Name Type
, Comma Binary ++ Increment Unary
! Logical NOT Unary += Addition/assignment Binary
!= Inequality Binary – Subtraction Binary
% Modulus Binary – Unary negation Unary
%= Modulus/assignment Binary –– Decrement Unary
& Bitwise AND Binary –= Subtraction/assignment Binary
& Address-of Unary –> Member selection Binary
&& Logical AND Binary –>* Pointer-to-member Binary
selection
&= Bitwise AND/assignment Binary / Division Binary
( ) Function call — /= Division/assignment Binary
* Multiplication Binary < Less than Binary
* Pointer dereference Unary << Left shift Binary
*= Multiplication/assignment Binary <<= Left shift/assignment Binary
+ Addition Binary <= Less than or equal to Binary
+ Unary Plus Unary = Assignment Binary
>= Greater than or equal to Binary == Equality Binary
>> Right shift Binary > Greater than Binary
>>= Right shift/assignment Binary |= Bitwise inclusive Binary
OR/assignment
[ ] Array subscript — || Logical OR Binary
^ Exclusive OR Binary ~ One’s complement Unary
^= Exclusive OR/assignment Binary delete delete —
| Bitwise inclusive OR Binary new new —

Operators that cannot be overloaded


Here is the list of operators that cannot be overloaded - . (member selection), .* (pointer-to-member selection), :: (scope
resolution), ?: (conditional), # (preprocessor), and ## (preprocessor).
There are two operators that do not explicitly require operator overloading. The first is the assignment operator = and the second
is the address operator &. C++ automatically provides default behavior for both these operators.
We will now look at an example where we will overload operators to be used with the CPoint class including the = operator.

S. D. Rajan, 2000-24 9-229


C L A S S E S : O B J E C T S 2 0 2

Example Program 9.1.1 Operator overloading


We will now develop the functionality within the now familiar CPoint class so that the following operations can be carried out
using CPoint objects, P1, P2 and P3.
P1 = P2; // overloaded operator =
P3 = P1 + P2; // overloaded operator +
P3 = P1 ‐ P2; // overloaded operator ‐
if (P1 == P2) // overloaded operator ==
if (P1 != P2) // overloaded operator !=
cout << P3 << "\n"; // overloaded operator <<
cin >> P3 ; // overloaded operator >>

First, we define the header file. Look at lines 31 through 35 to see how operators are overloaded as public functions.
Point.h

We will pay particular attention to the new features shown in lines 13-14 (operator overloading via friend functions), and
operator loading shown in lines 35-39, all of which will make writing the client code cleaner and easier to understand.
Now the rest of the server code only dealing with overloaded operator functions only is shown below.

S. D. Rajan, 2000-24 9-230


C L A S S E S : O B J E C T S 2 0 2

point.cpp

Lines 76-87 show how the reference return type (CPoint&) is used. By having the return type as a reference (&) to a CPoint object,
it is possible to have statements such as the following that involve multiple assignments with CPoint objects as
P1 = P2 = P3;
In other words, the following prototype
void CPoint::operator= (const CPoint& PRight);
would permit only the following assignment
P1 = P2;
but not the two following statements
CPoint P1 = P2;
P1 = P2 = P3;
Line 79 is a defensive programming statement to avoid problems with statements like
P1 = P1;
and shows a nice use for the this pointer. The addition overloaded operator + has a CPoint return type to support statements
such as
P1 = P2 + P3 + P4;
The function body clearly shows that a CPoint object has to be created in the function (line 97) to facilitate this addition. This
is the downside to overloaded operators. In the case of a CPoint object not much additional resource (storage spaces for x and
y coordinates) is going to be used temporarily. However, with larger objects, creating temporary objects will be resource
intensive.

S. D. Rajan, 2000-24 9-231


C L A S S E S : O B J E C T S 2 0 2

The boolean operators (== and !=) check if two CPoint objects are equal or not as we will see in the example. The overloaded
operators << and >> are implemented as non-class friend functions. Once again, by having the return type as a reference type, it
is possible to output multiple CPoint objects using a single statement or read multiple points using a single statement.
And finally, here is an example program (client code) where the use of the overloaded operators is shown using the CPoint class.
main.cpp

S. D. Rajan, 2000-24 9-232


C L A S S E S : O B J E C T S 2 0 2

The use of copy constructor is illustrated in line 22. The usage of overloaded operators is illustrated is several statements - line
26 for the addition operator, line 30 for one of the boolean operators, lines 36-37 for the stream extraction operator, line 40 for
the subtraction operator, and finally line 43 for the stream insertion operator.

9.2 More about Classes


In this section we will see a couple more class-associated features.
static members
We first looked at the static keyword in Chapters 4 and 7. Here we will reinforce the concepts here.
There are times when a member variable defined in a class needs to be accessed by all the objects belonging to that class. Such
variables are called static variables. This variable acts like a global variable for class objects. First, note that the static member
variable is declared in the class declaration.
Here is an example to illustrate its usage. The static variable m_nObjectsDefined is designed to track how CPoint objects exist
during the program execution.
class CPoint
{
public:
// constructor
CPoint (); // default
….
private:
float m_fXCoor;
float m_fYCoor;
static int m_nObjectsDefined; // static member variable
};

Second, the static variable is initialized outside of the class. Usually this is done at the beginning of the file containing the class
definitions. Third, in this example, it is used in the class constructor in a manner similar to any class variable.
int CPoint::m_nObjectsDefined = 0;
CPoint::CPoint ()

S. D. Rajan, 2000-24 9-233


C L A S S E S : O B J E C T S 2 0 2

{
++m_nObjectsDefined;
}
…. // rest of the class definitions follow

In a similar manner, static member functions are member functions that do not access an object’s data. In other words, if a
member function does not access the member non-static variables, then it is possible to declare that member function as static.
Continuing with the previous example, let us assume that we need a public function to obtain the current number of CPoint
objects defined and that function is NumObjects(). The modified header file is as follows.
class CPoint
{
public:
// constructor
CPoint (); // default
static int NumObjects (); // static member function
….
private:
float m_fXCoor;
float m_fYCoor;
static int m_nObjectsDefined;
};

The static keyword is used to declare the member function but is not used in the member function definition itself. In other
words, the static member function is defined as follows.
int CPoint::NumObjects ()
{
return m_nObjectsDefined;
}

The static function can access m_nObjectsDefined since that variable is a static variable.

forward class definitions


When one class is a friend of another, it is common for both classes to refer to the other class in the class definitions. This
requires the use of forward declaration. Consider class A that uses class B as a friend class.
class B; // forward declaration
class A
{
friend class B;
….
}

The syntax is friend followed by the keyword class followed by the class name and a terminating semicolon.

Example Program 9.2.1 Using Static Member Functions and Variables


In this example we will develop functions to add, subtract, multiply and divide two real numbers that are used as double precision
variables. We will also track how many times each of these functions is called (as our own form of operations counter). These
functionalities will be implemented in a class called CMath.
We first present the entire server code – header file and functions of the CMath class.
math.h
The important thing to note is that the class has no non-static data. There are five public static member functions in the class –
Add, Subtract, Multiply, Divide and DisplayCounters. Each one of them is declared static. Recall that “static member
functions are member functions that do not access an object’s data”. In the context of this example, these five functions simply
are designed to carry out the four math operations and display the number of times each operation has been carried out in the
program. The variables that store these counters are also declared as static. In other words, there are only four variables that
store these values irrespective of the number of these operations carried out by all the objects that belong to the CMath class.

S. D. Rajan, 2000-24 9-234


C L A S S E S : O B J E C T S 2 0 2

While this class shows all member variables and functions to be static, we can design classes where only one or more functions
and variables are declared to be static in a similar fashion.
math.cpp

Statements 9-12 initialize the values of the static variables outside the body of the member functions.

Note that the static keyword is not used with the function definition. The static keyword is used only in the header file with
the function declarations.

S. D. Rajan, 2000-24 9-235


C L A S S E S : O B J E C T S 2 0 2

Defensive programming would dictate that we check for math errors; for example, we should check to see if d2 is zero before
dividing (line 35). Finally, we present the client code.
main.cpp

There is no explicit variable (object) associated with the CMath class! The scope resolution operator :: is used to invoke the static
member functions that are treated differently that non-static member functions. A transient object is created automatically and
destroyed.

9.3 Template Classes


In a fashion similar to template functions that we saw in Chapter 4, template classes can be defined. Let us go back to the CPoint
class. Let us assume that the point coordinates can be of any of the following data type – integer, float or double. We can define
a template CPoint class as follows (contained in point.h).
template <class T>
class CPoint
{
// friend overloaded operator functions
friend std::istream &operator>> (std::istream&, CPoint&);
friend std::ostream &operator<< (std::ostream&, const CPoint&);

public:
// constructor
CPoint (); // default
CPoint (const CPoint&); // copy constructor
CPoint (T, T); // overloaded

S. D. Rajan, 2000-24 9-236


C L A S S E S : O B J E C T S 2 0 2

// helper function
void Display (const std::string&) const;
// modifier function
void SetCoordinates (T, T);
void SetCoordinates (const CPoint&);
// accessor function
void GetCoordinates (T&, T&) const;
void GetCoordinates (CPoint&) const;

// overloaded operators
const CPoint &operator= (const CPoint&);
CPoint operator+ (const CPoint&) const;
CPoint operator‐ (const CPoint&) const;
bool operator!= (const CPoint&) const;
bool operator== (const CPoint&) const;

private:
T m_XCoor; // stores x‐coordinate
T m_YCoor; // stores y‐coordinate
};

There are just two differences when this version is compared to the float-based CPoint version shown in the last section. The
first line reads
template <class T>

signifying that we have one template argument for the class that is identified as T. The symbol T is merely a placeholder. It can
be any other symbol. Second, wherever we had specified float as the data type in the non-template version of CPoint, we now
specify the corresponding data type as T. The template function body must follow a slightly different syntax. For example, the
SetCoordinates function would be defined as follows.
template <class T>
void CPoint<T>::SetCoordinates (T fX, T fY)
{

}

And the copy constructor would be defined as follows.


template <class T>
CPoint<T>::CPoint (const CPoint<T>& P)
{

}

In addition to the first line containing the template keyword, the <T> needs to be appended to the class name before the function
name and everywhere else it is used (e.g., as a function parameter).
To use the CPoint template class in a program we would do the following.
#include “point.h”
….
CPoint<int> nPA, nPB; // integer‐valued points
CPoint<float> fPA, fPB; // float‐valued points

instead of the usual


CPoint fPA, fPB; // float‐valued points

Example Program 9.3.1 Rewriting the CPoint Class as a Template Class


We will rewrite the CPoint class as a template class supporting operator overloading. We will illustrate the usage with a simple
program.
Note that the entire class needs to be defined in a header file.

S. D. Rajan, 2000-24 9-237


C L A S S E S : O B J E C T S 2 0 2

point.h

A few new concepts are shown here. The stream insertion (>>) and extraction (<<) operations are overloaded (lines 14-15) so
that point data can be read and displayed. Next, the operation to scale the (x,y) coordinates by a constant c is shown via the *
operator. The result of this operation is a (new) CPoint object that is created (line 21) and returned (line 26). Here is an example
of its usage
CPoint<int> P (2, 3), Pc;
Pc = 10*P;

S. D. Rajan, 2000-24 9-238


C L A S S E S : O B J E C T S 2 0 2

Only a select few member functions are shown and discussed here.

Note how the overloaded constructor is declared in lines 84 and 85 with the use of the T symbol. This syntax will be used in all
the member functions. Next, the assignment operator = and the addition operator + overloaded functions are shown.

Overloaded operators that return a bool value work similarly as shown with the == operator.

Finally, the multiplication overloaded operator * is shown next. This function will be called for the following statement
Pc = P*10.0;

S. D. Rajan, 2000-24 9-239


C L A S S E S : O B J E C T S 2 0 2

The main program to illustrate the usage of the templated CPoint class is listed below.

One way of learning how these overloaded operators and copy constructors and destructors work, is to use the debugger and
step through one statement at a time. For example, how are the statements in line 29 executed? The overloaded operator=
function is called first with P1 as PRight (line 143 in point.h). The P6 values are set to those of P1. Again, the overloaded
operator= function is called and the P5 values are set. In other words, the evaluation is from right to left.
Similarly, let’s examine how line 29 is executed. Overloaded operator‐ function is called first with P2 as PRight. A local CPoint
object is created that contains P1‐P2. This local CPoint object is then used with the copy constructor to create a copy of this
local object (say PCopy1) for later use, and the local CPoint object is destroyed. This saved copy is then used with the overloaded
operator+ function with P3 as PRight. Once again, a local CPoint object (say PCopy2) is created to store the results from the next
operation, PCopy1+P3. The copy constructor and destructor are called to save PCopy2. Finally, the overloaded operator= is used
with PCopy2 as PRight to transfer the values from PCopy2 to P4. The destructor is called to release the memory associated with
PCopy1 and PCopy2. Most of these operations take place behind the scenes without any user-supplied code.
Some of the remaining overloaded operator functions are shown next. Of these, the two that require additional explanations
are lines 58 and 62. For line 58, P5=2.3*P1;, the overloaded friend function is called.

S. D. Rajan, 2000-24 9-240


C L A S S E S : O B J E C T S 2 0 2

For line 62, P5=P1*0.5;, the overloaded * function is called

The rest of the statements in the main program are shown below.

9.4 Arrays
One of the most important objects or data structures that we will deal with is the area of numerical analysis is an array – vectors
and two-dimensional matrices. Almost always, it can be assumed that the algorithm is able to ascertain a priori, the size (number
of rows and/or columns) of the array. With this scenario, arrays provide the most convenient data structure to store and
manipulate engineering data.
We will use the template approach in defining vector and matrix classes.
9.4.1 Vector Container Class1
In Section 8.3, we defined and used the CMyVector class. In this section, we will use a similar but improved version of that class
called CVector.
We will now define the attributes and behavior of arrays starting with the vector class. Recall that a vector either is a row vector
or a column vector. The CVector class is general enough to store any data type. It has the following properties.
(a) Both row and column vectors will be stored as a vector with n storage locations.
(b) The indexing will start at 1. In other words, the indexing will be between 1 and n (both inclusive). The () operator will be
overloaded and will be used to access the elements of the vector, e.g. nV(j) will point to the jth element of the vector.
(c) Exception handling will occur if the vector index does not have a legal value. This check will be carried out only for the
DEBUG version of a program.

1 Strictly speaking, containers have several properties such as iterators, overloaded operators etc. that we do not support with the CVector and
CMatrix classes. However, see Section 10.10.

S. D. Rajan, 2000-24 9-241


C L A S S E S : O B J E C T S 2 0 2

(d) Member functions will be provided to dynamically allocate as well as change the size of the vector.
(e) The = operator will be overloaded.
Our template-based vector class is defined below and is followed by a sample program that uses the CVector class. Some of the
member functions are discussed below.
Function Name Remarks
SetSize The initial size of the vector is determined by the constructor used. The default constructor sets
the size as zero. This public function can be used to set or reset the size. If the size is reset, the
original contents are destroyed.
GetSize This public function returns the current size of the vector.
Set This public function is used to set the specified value for all the elements of the vector.
ErrorMessage This public function is used to display the error message in the catch block if an error is thrown.
Release This protected2 function is used to release the memory allocated to store the elements of the
vector.
We first present select portions of source code for the CVector template class that is contained entirely in a header file.
vectortemplate.h

There are seven constructors to ease the way CVector objects can be created – size is used to signify the number of nonzero
elements in the vector, iv is the initial value for all the elements in the vector, name is the name of the vector and is used only
for identification purposes. Instructions for debugging this class when used with Microsoft Visual Studio are shown in the top
of the file.
Several helper functions and overloaded operator functions provide added benefits. These are listed in lines 48-60.

2 We will see protected functions in greater detail in Chapter 13. However, keep in mind that access to member functions and variables is
restricted as follows.
public protected private
Members of the same class or friend classes Yes Yes Yes
Members of the derived class Yes Yes No
Others Yes No No

S. D. Rajan, 2000-24 9-242


C L A S S E S : O B J E C T S 2 0 2

The listing of the member functions is not shown, and the reader is strongly urged to go through and understand the
implementation. Adding additional functionalities such as overloading the << and >> operators, overloading other operators,
using member initializer list, support for vector operations such as dot product, cross product, etc. are left as an exercise.
Example Program 9.4.1 Using the CVector class
In this example, we write a program to define two float vectors A and B, add the contents of the two vectors, store the result
in a third vector, C, and display the contents of C. We will store and manipulate these vectors using the CVector class. This
example can then be generalized for other types of applications.
main.cpp

The client code (in main.cpp) enhances the capabilities by adding two non-member functions – Display and AddVectors. The
reader is encouraged to compare how incorporating the two functionalities as member functions of the class would affect ease
of use and execution of the program versus this version. The main program defines the values in three float vectors A, B and
C in lines 40-32. The initial values are set to zero for vectors A and B. The actual values of these two vectors are defined in line
lines 47-51 and displayed via calls to the Display function.

S. D. Rajan, 2000-24 9-243


C L A S S E S : O B J E C T S 2 0 2

The interesting part of the program is in lines 59-68 where three incorrect uses of the CVector class are illustrated – the reader
should uncomment only one of these at a time and comment the other two. Line 61 shows use of an invalid index – size of C
is three, but the 4th location is being accessed. Lines 64-65 shows the same vector addition operation but with the result to be
stored in vector CC whose length is only 2. And finally, in line 68, a vector of zero length is being created.

Note that we could have asked the user to specify the size of the A and B vectors at run time and then dynamically allocated
memory for the three vectors as follows.
int nVecSize;
std::cout << "What is the size of the vectors? ";
std::cin >> nVecSize;
CVector<float> fVA(nVecSize),fVB(nVecSize),fVC(nVecSize);
Or
fVA.SetSize(nVecSize), fVB.SetSize(nVecSize), fVC.SetSize(nVecSize);
Tip: Note how line 61 fails because of the illegal (vector) index value. This is the most common programming error with the
usage of vectors and matrices. Placing a breakpoint in the ErrorHandler member function in the CVector class helps in detecting
the offending statement! The standard usage with C++ vector and std::vector fails to detect such errors.
float fVX[10];
vector<float> fVXX(10);
….
fVX[10] = 23.5; // illegal access

S. D. Rajan, 2000-24 9-244


C L A S S E S : O B J E C T S 2 0 2

fVXX[10] = 23.5; // illegal access

9.4.2 Matrix Container Class


In a manner similar to the CVector class, we will now define a CMatrix class. The CMatrix class is general enough to store any
data type. It has the following properties.
(a) The matrix has n rows and m column.
(b) The indexing will start at 1. In other words, the indexing will be between 1 and n (both inclusive) for the row number and
1 and m (both inclusive) for the column number. The ( operator will be overloaded and will be used to access the elements
of the matrix, e.g. nMC(n,3).
(c) Exception handling will occur if either the row or the column index does not have a legal value. This check will be carried
out only for the DEBUG version of a program.
(d) Member functions will be provided to dynamically allocate and change the size of the matrix.
(e) The = operator will be overloaded.
To understand the manner in which the CMatrix class is implemented requires us to understand what is meant by “pointer to a
pointer”. For example, the following
int **pMA;
defines an int pointer pMA that points to an int pointer. Recall that a pointer variable is designed to hold a memory address of
the variable that it is pointing to. Consider the following code segment and Fig. 9.4.2.1.
int* pVA[3]; // a vector of int pointers
int** pMA; // pointer to an int pointer
int nV1[2] = {11, 12};
int nV2[2] = {21, 22};
int nV3[2] = {31, 32};

pVA[0] = nV1; // stores the address of vector nV1


pVA[1] = nV2; // stores the address of vector nV2
pVA[2] = nV3; // stores the address of vector nV3

for (int i=0; i < 3; i++)


{
pMA = &pVA[i]; // grab the address stored in pVA[i]
for (int j=0; j < 2; j++)
{
std::cout << "[" << i << "," << j
<< "] = " << pMA[0][j] << '\n';
// notice how the dereferencing takes place above
// using two indices
}
}
nV1
11 12
nV2
pMA pVA 21 22

nV3
31 32
Fig. 9.4.2.1 Memory map

S. D. Rajan, 2000-24 9-245


C L A S S E S : O B J E C T S 2 0 2

The memory map is shown in Fig. 9.4.2.1. Arrows emanate from pMA and pVA since the pointer variables point to memory
locations. There are two key statements in the above code.
pMA = &pVA[i]; // grab the address stored in pVA[i]
In the above statement, the address stored in pVA[i] is assigned to pMA. These refer to the three dotted lines in Fig. 9.4.2.1. The
next important statement is the statement in which the int value is accessed via
pMA[0][j]
In general, pMA[i][j] would imply the value stored at the memory location that is accessed as followed. Locate the memory
address stored in pMA[i], to that address add j and finally, get the value that is stored at that memory location!
We can improve on this implementation by combining what the variables pVA and pMA do (that is to store memory addresses).
A refined memory map is shown in Fig. 9.4.2.2. If we store the starting address of each row in the matrix in pMA, then &pMA[i]
would have the starting address of row(i-1). Since values in each row are stored in adjacent memory locations, pMA[i][j] would
point to the value stored at row i and column j!
pMA First row of matrix
11 12 ... 1m
Second row of matrix
... 21 22 ... 2m

...
... ... ... ...

Last row of matrix


n1 n2 ... nm
Fig. 9.4.2.2 Refined memory map (n: rows, m: columns)
The actual implementation in the CMatrix class is slightly different. nR is the number of rows and nC the number of columns in
the matrix. The first two statements of interest are as follows.
T** cells;
cells = new T *[nR + 1];
The first statement declares a pointer to a pointer of type T (e.g., int). The second statement allocates memory for (nR+1)
locations to store the starting address of each row. The plus 1 is to facilitate counting from 1 rather than zero; hence we have
one wasted memory space! Next, we compute the total number of memory locations needed to store the entire matrix.
int size = nR*nC + 1;
Once again, + 1 is to facilitate our indexing scheme for rows and columns that starts at 1. Now we allocate the memory space
for the entire matrix.
cells[0] = new T[size];
Note cells[0] contains the starting memory address for the entire matrix or in other words, the first row. Since our indexing
scheme starts at 1 not zero, we need to adjust the addressing scheme starting with
cells[1] = cells[0];

so that when we use cells[1] it does point to the starting address of the first row. This is followed by setting the starting address
for each row as follows.
for (i=2; i <= nR; i++)
cells[i] = cells[i‐1] + nC;

S. D. Rajan, 2000-24 9-246


C L A S S E S : O B J E C T S 2 0 2

The above statement simply computes the starting address of rows 2 through the last row, as the starting address of the previous
row plus the number of columns in that row (or the matrix) as shown in Fig. 9.4.2.3.
cells[1] cells[1][1]
1st row
...
2nd row
...
...

last row
...

Fig. 9.4.2.3 Memory mapping implementation diagram for CMatrix class (wasted space not shown)
The above implementation is for a row-oriented matrix. However, a similar implementation can be devised for a column-
oriented matrix.
Our template-based matrix class is defined below and is followed by a sample program that uses the CMatrix class. We first
present select portions of source code for the CMatrix template class that is contained entirely in a header file.
matrixtemplate.h

There are seven constructors to ease the way CMatrix objects can be created – rows and cols are used to signify the number of
nonzero rows and columns in the matrix, iv is the initial value for all the elements in the matrix, name is the name of the matrix
and is used only for identification purposes. Instructions for debugging this class when used with Microsoft Visual Studio are
shown at the top of the file. Several helper functions and overloaded operator functions provide added benefits. These are listed
in lines 47-59.
The listing of the member functions is not shown, and the reader is strongly urged to go through and understand the
implementation. Adding additional functionalities such as overloading the << and >> operators, overloading other operators,
using member initializer list, support for matrix operations such as addition, transpose, inverse, determinant, etc. are left as an
exercise.

S. D. Rajan, 2000-24 9-247


C L A S S E S : O B J E C T S 2 0 2

Example Program 9.4.2 Using the CMatrix class


In this example, we write a program to define two float matrices A and B, add the contents of the two matrices, store the result
in a third matrix, C, and display the contents of C. We will store and manipulate these matrices using the CMatrix class. The
program structure is very similar to Example Program 9.4.1.
main.cpp

The client code enhances the capabilities by adding two non-member functions – Display and AddMatrices. The main program
defines the values in two matrices A and B and then stores the sum of those two matrices in another matrix, C.

The three matrices are defined as having 3 rows and 2 columns with the initial values in A and B being zero.

S. D. Rajan, 2000-24 9-248


C L A S S E S : O B J E C T S 2 0 2

Note that we could have asked the user to specify the size of these matrices at run time and then dynamically allocated memory
for the three matrices as follows.
int nR, nC;
std::cout << “How many rows and columns?”;
std::cin >> nR >> nC;
CMatrix<float> fMA(nR, nC),fMB(nR, nC),fMC(nR, nC);
Or
fMA.SetSize(nR, nC), fMB.SetSize(nR, nC), fMC.SetSize(nR, nC);

Tip: The overloading of the operators () makes it possible to detect indexing errors with the CMatrix class just as we saw the
CVector class handle this problem.

9.4.3 Move Semantics


C++ is a value-based language because it stores all data as values, rather than references. This means that when you pass a value
to a function, the function receives a copy of the value, rather than a reference to the value. This can be useful for performance
reasons, as it can avoid the overhead of copying large objects. With added functionalities come more responsibilities. The
examples in this section will use the CMatrix objects to illustrate the concepts though the observations and recommendations
are valid for other types of objects handling resources.
Let’s review what the copy constructor does. With the following statements
CMatrix<double> AA = { 2,3, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0 };

S. D. Rajan, 2000-24 9-249


C L A S S E S : O B J E C T S 2 0 2

auto BB = AA;

the following operations take place: (1) AA is constructed with storage allocated to store 6 double precision values, (2) the copy
constructor is then called, a nullptr is assigned to BB, then 6 storage locations are allocated for BB updating the resource pointer,
and (3) values from AA are copied into the locations allocated for BB. Storage allocation and copying values are slow processes.
Sometimes, the source object (i.e., AA) isn't needed anymore. With move semantics, the compiler detects cases where the old
object is not tied to a variable, and instead performs a move operation. For example, since C++11, if the intent is to copy the
contents of AA into BB and not have to use AA anymore, one could write
auto BB = std::move (AA); // requires #include <utility>
and the move constructor would be used where the following operations take place: (1) AA is constructed with storage allocated
to store 6 double precision values, (2) the move constructor is then called, the resource pointer for BB is set to point to the same
location as that for AA, and (3) the resource pointer for AA is set to nullptr. No additional resource allocation and deallocation
takes place. These actions must be taken by the person writing the code as C++ compiler does not fully understand how the
resource is used in the code.
Moving objects only makes sense if the object type owns a resource of some sort, e.g., the CMatrix class allocates memory in
freestore and then deallocates when no longer required. If all data is contained within the object, the most efficient way to move
an object is to just copy it.
Let’s see how this takes place in the CMatrix class. Here are some key statements taken from matrixcontainerEXH.h file.
CMatrix (CMatrix<T>&&); // move constructor
CMatrix<T>& operator= (CMatrix<T>&&); // overloaded move = operator
The first statement shows the move constructor syntax. Note that, when appropriate, the move constructor is automatically
called by the C++ compiler generated code. It is explicitly called when, for example, std::move is used. It is implicitly called
when the compiler recognizes that it is needed. The second statement shows that the CMatrix class is providing its own
overloaded assignment operator.

Note how the move constructor works. Lines 229-232 simply copy the essential member variables from A (the source) to the
current CMatrix object (the target). The state of A is altered in lines 236-238 so that while object (A) exists, it cannot be used to
carry out any useful matrix operations.
The overloaded move assignment operator function works in three stages. In the first stage, error checks are carried out in lines
700-705. The current matrix and A must be compatible. Second, if the current matrix is a matrix holding data, the memory is
released by calling the Release() function (line 707). And finally, lines 710-713 show the resource pointer for the current matrix
being set to those of A, the properties of A being copied, and lines 717-718 show the resource pointer for A being set to nullptr
with the number of rows and columns reset to zero (line 718).

S. D. Rajan, 2000-24 9-250


C L A S S E S : O B J E C T S 2 0 2

Example Program 9.4.3 Using the CMatrix class with and without move semantics
In this example, we will show how the move semantics works in the CMatrix class and why it is a useful feature in C++. To
make the comparison between using and not using move semantics, the preprocessor variable __NOMOVECTOR__ needs to be
defined when the move semantics should not be used. The variable can be defined and removed using the Project-
>Example9_4_3 properties menu option, selecting C++, then Preprocessor item, and adding or removing __NOMOVECTOR__
from the Preprocessor Definitions field.
By defining three rectangular matrices - A m n  Aij  1.0, B m n  Bij  2.0, C m n  C ij  3.0 , a fourth matrix is defined as
D m n  Dij  2 Aij  3Bij  4C ij
A very easy to read statement would be to use overloaded operators and write the C++ statement as
D = 2.0*A - B*3.0 + 4.0*C;
Handling an expression like this is where the move semantics shows its efficiency.
The storage requirement n B (in bytes) for a typical matrix A m n is
n B   m  1 * sizeof  T *   ( m  n  1) * sizeof ( T )

If T is double, then nB  8  m  mn  2  .

S. D. Rajan, 2000-24 9-251


C L A S S E S : O B J E C T S 2 0 2

main.cpp

We will find the execution time for this program using an unsophisticated but reasonably accurate technique of getting the
system time at two points in the program (line 20 and line 86) and taking the difference between the two. The four matrices are
declared in lines 27-30. Memory allocations take place, but the matrices are not initialized with any value. The initial values are
set in lines 38-40 for matrices A, B, and C.

If the program is set to use overloaded operators (see line 22 where the variable can be set as true or false), lines 41-47 show
how the result is computed. In the displayed version of the program, the move semantics is deactivated, and the overloaded
operators are used.

Here is the sequence of operations if the move semantics are not turned on – compute TM1=4.0*C and store the result in TM1,
a temporary matrix; compute and store TM2=B*3.0; compute and store TM3=2.0*A; compute and store TM4=TM3‐TM2, then using
the copy constructor transfer the result to TM5; compute TM6=TM4+TM1, then using the copy constructor transfer the result to TM7;
finally, assign the contents of TM7 to D. A total of 7 temporary matrices are used.
Here is the sequence of operations if the move semantics are turned on – compute TM1=4.0*C and store the result in TM1, a
temporary matrix; compute and store TM2=B*3.0; compute and store TM3=2.0*A; compute and store TM4=TM3‐TM2, then use the

S. D. Rajan, 2000-24 9-252


C L A S S E S : O B J E C T S 2 0 2

move constructor; compute TM5=TM4+TM1, then use the move constructor; finally, assign the contents of TM5 to D. A total of 5
temporary matrices are used, two less than when the move semantics is not used. The alternate approach is to carry out the
evaluation as shown in lines 51-57 where the operator overloading is not used.

Additional code is used to ensure that the calculations are correct as shown in lines 60-67.

Finally, the remaining statements are shown below. Line 84 and 86-87 are used to show memory allocation/deallocation and
the execution time taken, respectively.

Finally, Table 9.4.1 shows the performance of the code when executed using different combinations of move semantics and
overloaded operators.
Table 9.4.1 Resource Usage Comparison
Move semantics Yes No No
Overloaded operators Yes No Yes
Freestore Memory Allocation and 36 003 600 000 16 001 600 000 44 004 400 000
Deallocation (bytes)
Wall Clock Time (sec) 11 6 15
Code Readability Excellent Good Excellent

S. D. Rajan, 2000-24 9-253


C L A S S E S : O B J E C T S 2 0 2

9.5 Command Line Arguments


So far, we have assumed that a typical main program is

int main () // or int main (void)


{
…. // one or more return statements returning an integer value
}
C/C++ standards specify that console programs containing the main program can have arguments passed to it. A typical main
program has the following structure.
int main (int argc, char *argv[])
{
…. // one or more return statements returning an integer value
}

where argc is the number of command line arguments and argv is a character array containing the command line arguments.
Consider the following scenario - you have created a console application search and you wish to specify the file from which to
search for a specific string. Now assume that you launch the program from command line as follows

search address.dat Iowa

With this example, argc is 3 – there are 3 command line arguments. The first argument (contained argv[0]) is search, the second
argument (contained in argv[1]) is address.dat and the last argument is Iowa (contained in argv[2]).

Example Program 9.5.1 Handling Command Line Arguments


We will create a sample program called Example9_5_1 and pass it two additional arguments.
main.cpp

The program is simple and self-explanatory. It should be clear that the minimum value of argc is 1 so that at least the program
name is available in argv[0]. One or more blank spaces separate one argument from the next. The sample output is shown in
Fig. 9.5.1.

S. D. Rajan, 2000-24 9-254


C L A S S E S : O B J E C T S 2 0 2

Fig. 9.5.1 Sample output from the program

9.6 Exception Handling


We first saw exception handling in Chapter 4. Let us try to understand the advantages of using exception handling by examining
what one would do in a program without exception handling. The usual approach is for a function to return some form of an
error code and expect the client code to take appropriate action. In Example 6.4.1, the function to compute the root could have
been declared as
bool NewtonRaphson (double& dRoot, const int nMaxIterations,
const double dConvTol,
void(*userfunc)(double dX, double& dFX, double& dDX));
The function would return false if a root cannot be found, and the client code would call the function as follows:
if (NewtonRaphson (dRoot, MAXITERATIONS, CONVERGENGETOL, MyFunction))
std::cout << "Root is : " << dRoot << '\n';
else
std::cout << "Cannot find the root\n";

In more complex scenarios, it is possible that this manner of handling errors can result in code that is complicated, difficult to
read and maintain – imagine what would need to happen to function A calls function B that in turn calls functions C, D and E,
with each one capable of generating an error! The reader is encouraged to read the contents of this webpage:
https://isocpp.org/wiki/faq/exceptions.
Furthermore, there are C++ constructs that do not permit error handling the usual way – constructor, destructor, etc. So what
is the solution? To find one, we first start by studying exception handling features provided by C++. Fig. 9.6.1 shows the
standard exceptions.
std::exception

std::logic_error std::runtime_error std::bad_alloc


std::bad_cast

std::overflow_error std::bad_exception
std::domain_error
std::bad_function_call
std::invalid_argument std::range_error
std::bad_typeid
std::length_error std::system_error std::bad_weak_ptr
std::out_of_range std::underflow_error std::ios_base::failure

Fig. 9.6.1 C++ standard exceptions

Example Program 9.6.1 Handling system-generated errors


We will look at handling a few errors shown in Fig. 9.6.1 – logical error (std::out_of_range), bad allocation error
(std::bad_array_new_length), logical error (std::length_error) and logical error (std::invalid_argument). First, note that
#include <stdexcept> is needed. The first example (Fig. E9.6.1(a)) deals with an error in the use of a std::string function

S. D. Rajan, 2000-24 9-255


C L A S S E S : O B J E C T S 2 0 2

append. The second argument (position of the first character in the first argument that is copied; first character is denoted as 0,
not 1) has a valid value between 0 and 5 since there are 6 characters in oosoft.
main.cpp

The first example shows an exception generated by the std::string class. In line 34, the intent is for the string in rstr to be
appended to the string in MS. The append function extends the string by adding additional characters at the end and has three
arguments – the string to use to append (in this case rstr), position of the first character (in this case 7 which is invalid since
rstr has only 6 characters and the second argument can be between 0 and 5) in the string being used (rstr)and the length of
the substring to be copied (in this case std::string::npos indicates all characters until the end of rstr).

The second example shows an error that occurs when an invalid value is used in the new statement. In both the second and
third examples, the error is due to the fact that the value used in resource allocation becomes negative.

The third example shows a similar type of error but one using std::string class object.

S. D. Rajan, 2000-24 9-256


C L A S S E S : O B J E C T S 2 0 2

Finally, the last example shows how to throw and catch and error from a user-defined function. Rect_Area is created to compute
the area of a rectangle. The two arguments to the function are the values of the length and the width of the rectangle. Three
error conditions are flagged – the length and/or the width values are less than or equal to zero, and if the width is greater than
the length.

The program output is shown in Fig. 9.6.2.

Fig. 9.6.2. Output from Example Program 9.6.1


Tip: So, what happens if a throw has no catch associated with it? For example, what happens by commenting lines 69, 76-80?
The answer is that the program will terminate abnormally.

Tip 1: Make catch block robust


Always include a default catch block. For example, if the catch block includes handling system-generated exceptions, user-
generated exceptions, then add a default catch block at the end as shown below.
catch (std::exception &err)
{
std::cout << "Caught: " << typeid(err).name()
<< " : " << err.what() << "\n";
}

catch (...)
{
std::cout << "Sorry, could not catch the error whatever it is.\n";
}
Note that catching by value may result in an invalid object being created (e.g. slicing problem). Hence the exception should be
caught by reference.
Tip 2: Use noexcept carefully
C++ permits tagging the keyword noexcept to function declarations to indicate that the function does not throw exceptions
and is not designed to handle exceptional situations. Here is an example.
#include <iostream>
#include <string>

int bar () noexcept


{
std::string MS ("Micro");
std::string rstr ("oosoft");
MS.append (rstr, 7, std::string::npos); // generates an exception

return 1;
}

S. D. Rajan, 2000-24 9-257


C L A S S E S : O B J E C T S 2 0 2

int main ()
{
try
{
int x = bar ();
}
catch (std::exception& err)
{
std::cout << "Caught: " << err.what() << "\n";
}
std::cout << "Reached here ...\n";

return 0;
}
In this example, the program will terminate abnormally.

S. D. Rajan, 2000-24 9-258


C L A S S E S : O B J E C T S 2 0 2

Summary
The second more advanced look at classes and objects begins to show the strength of OOP and C++. Ideas associated with
the this pointer, proper use of const qualifier, friend classes, use of the reference operator with functions, operator
overloading, and template classes. The development of two classes to manage vectors and two-dimensional matrices were
studied. These classes will prove to be indispensable in handling numerical analysis algorithms. Both classes assume that the
elements of the vectors and matrices are stored at contiguous locations.
Programming Style Tip 9.1: Pass objects including arrays by reference
To avoid making a copy of the object being passed, especially if the object is complex and resource hungry, pass objects by
reference. For example, the AddMatrices function uses pass-by-reference technique. This process is preferable since a local copy
is not created if passed by reference. Making a local copy can be time-consuming if the size of the matrix is large. If the matrix
should not be modified, use the const qualifier.
Programming Style Tip 9.2: Ask yourself if overloaded operators are really necessary?
There is no doubt that the use of overloaded operators in client code makes the code easier to read and hence maintain.
However, when resource allocation is an issue, one should be careful in implementing and using overloaded operators. As we
saw, sometimes a copy of the object is made during the execution. If this execution is going to consume additional scarce
resources, then it may not be a good idea to implement overloaded operators with that class.
Programming Style Tip 9.3: Need for the Big Five
C++ programmers recognize the need for the Big Five – copy constructor, overloaded copy assignment operator, move
constructor, overloaded move assignment operator, and destructor, when designing and implementing a class. In other words,
if you write your own copy constructor then you should write your own assignment operators and the destructor. The default
functionality provided by the C++ compiler (in the absence of your version) may not provide the exact functionality that is
necessary for a robust, efficient and correct code.

Where to go from here?


An excellent way to understand and use more advanced C++ concepts is to solve the problems in the exercises that follow. In
addition, the reader is encouraged to look at newer C++ functionalities, some of which are discussed in Chapter 13.

S. D. Rajan, 2000-24 9-259


C L A S S E S : O B J E C T S 2 0 2

Exercises
Most of the problems below involve the development of one or more classes. In each case (a) develop a plan to test
the classes(s), and (b) implement the plan in a main program.

Appetizers
Problem 9.1
Problem 7.4 dealt with the development and implementation of the CFraction class. Now enhance the capabilities of the class
by overloading the following operators. F1, F2 and F3 are CFraction objects.
F1 = F2; // overloaded operator =
F3 = F1 + F2; // overloaded operator +
F3 = F1 ‐ F2; // overloaded operator ‐
F3 = F1 * F2; // overloaded operator *
F3 = F1 / F2; // overloaded operator /
if (F1 == F2) // overloaded operator ==
if (F1 != F2) // overloaded operator !=
cout << F3 << “\n”; // overloaded operator <<
cin >> F3 ; // overloaded operator >>

Problem 9.2
Convert the statistical functions discussed in Example 4.4.1 into member functions of a CStatistics class. However, use the
CVector class to handle vectors instead of C++ arrays. This way you will be able to deal with any number of data values.
Write your main program in such a way that you have a mini-statistical package to support the functionalities discussed in the
example. In other words, the program you develop should ask the user for the number of data values, allocate the memory
dynamically to store those values, and then call the CStatistics member functions to compute all the statistical values and
display them on the screen. Set up the program for one time execution only.
Problem 9.3
Change the CStatistics class developed in Problem 9.2 to a template class.

Main Course
Problem 9.4
The differential equation for a transverse deflection of a beam of length L subjected to an arbitrary loading, w ( x ) is given by
d 4 y w( x )
 . Take w ( x ) as a uniform load, w .
dx 4 EI
(a) The solution for a simply-supported beam is given as
wx
y( x )  
24 EI
 x 3  2Lx 2  L3 
(b) The solution for a fixed-fixed beam is given as
wx 2
y( x )  
24 EI
 x 2  2Lx  L2 
(c) The solution for a cantilever beam is given as
wx 2
y( x )  
24 EI
 x 2  4 Lx  6L2 
Obtain the values of L , E, I , w and the units for force and length from the user. Divide the length of the beam into 20 equally
spaced points. Display a table on the screen. The table should have four columns – location and corresponding displacement
for the three beam types. Store the table data in a matrix (use the CMatrix class).

S. D. Rajan, 2000-24 9-260


C L A S S E S : O B J E C T S 2 0 2

C++ Concepts
Problem 9.5
Complex numbers are used a variety of engineering and scientific calculations. A complex number z is written as
z  x  iy
The operations between two complex numbers z 1 and z 2 can be expressed as
z 1  z 2  ( x 1  x 2 )  i  y1  y 2 

z 1  z 2  ( x 1  x 2 )  i  y1  y 2 
z 1z 2  ( x 1x 2  y1 y 2 )  i  x 1 y 2  x 2 y1 

z 1  x 1 x 2  y1 y 2   x 2 y1  x 1 y 2 
 i 
z 2  x 22  y 22   x 2  y2 
2 2

z  x  iy  x 2  y 2
Implement the CComplex class whose class definition is shown below.
#pragma once
#include <iostream>

class CComplex
{
public:
CComplex ();
CComplex (double dR, double dI);
CComplex (const CComplex&);
~CComplex ();
friend std::ostream &operator<< (std::ostream&, const CComplex&);
friend double fabs (const CComplex&);

// accessor functions
double Real ()const;
double Imaginary ()const;
void GetValues (double&, double&) const;
double Modulus ()const;

// modifier functions
void Real (double);
void Imaginary (double);
void SetValues (double, double);

// overloaded operators
CComplex operator+ (const CComplex& dR) const;
CComplex& operator+= (const CComplex& dR);
CComplex operator‐ (const CComplex& dR) const;
CComplex& operator‐= (const CComplex& dR);
CComplex operator* (const CComplex& dR) const;
CComplex& operator*= (const CComplex& dR);
CComplex operator/ (const CComplex& dR) const;
CComplex& operator/= (const CComplex& dR);
CComplex& operator= (const CComplex& dR);
CComplex& operator= (const double);
CComplex operator‐ () const;

private:
double m_dReal;
double m_dImaginary;
};

S. D. Rajan, 2000-24 9-261


C L A S S E S : O B J E C T S 2 0 2

Problem 9.6
Add exception handling to all the problems in this chapter using the style from Example 9.6.2.
Problem 9.7
Both the CVector and the CMatrix implementations waste storage locations in order to start indexing the contents with 1, i.e.
fVA(1) is the first element in float vector A, and fMA(1,1) is the first element in float matrix A. One storage location is wasted
for every CVector object, and two storage locations for every CMatrix object. However, noting that addresses can be manipulated
by very simple math operations, modify that source code in vectortemplate.h and matrixtemplate.h so that there are no
wasted storage spaces.

S. D. Rajan, 2000-24 9-262


C L A S S E S : O B J E C T S 2 0 2

References
http://www.stroustrup.com/C++.html
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf
https://isocpp.org/std/status
https://isocpp.org/wiki/faq/exceptions
https://www.acodersjourney.com/top-15-c-exception-handling-mistakes-avoid/
http://www.gotw.ca/publications/mill22.htm
http://stdcxx.apache.org/doc/stdlibref/2-3.html
https://en.cppreference.com/w/cpp/error/nested_exception
https://en.cppreference.com/w/cpp/error/throw_with_nested
https://dzone.com/articles/some-useful-facts-to-know-when-using-c-exceptions

S. D. Rajan, 2000-24 9-263


C L A S S E S : O B J E C T S 2 0 2

S. D. Rajan, 2000-24 9-264


10
M A T R I X A L G E B R A

Chapter

Matrix Algebra
“To me, mathematics, computer science, and the arts are insanely related. They’re all creative expressions.” Sebastian
Thrun

“There isgeometry inthehummingofthestrings,there is musicinthe spacingofspheres.”Pythagoras

“The art of doing mathematics consists in finding that special case which contains all the germs of generality.” David
Hilbert

In Chapter 6, we saw the solution of an equation in a single unknown. The challenge was to find the roots of a nonlinear
equation. More often than not, engineering problems are described by several (hundreds, perhaps million) equations. These
equations are usually linear in nature or can be approximated as such. In this chapter, we will start by first reviewing the basics
of matrix algebra – vector and matrix operations. With that background, we will start to look at different methods to obtain the
solution of linear algebraic equations. Finally, we will be in a position to develop an object-oriented matrix toolbox based on the
CVector and CMatrix classes developed in Chapter 9. This toolbox will be used in almost all the later chapters dealing with
numerical analysis.

Objectives
 To understand matrix algebra.
 To understand the role of matrix algebra in engineering and scientific analysis.
 To understand and implement the steps to solve a system of linear algebraic equations.
 To understand and implement a template matrix toolbox.

S. D. Rajan, 2000-24 10-265


M A T R I X A L G E B R A

10.1 Fundamentals of Matrix Algebra


The solution methodology to be discussed in the chapter is numerical in nature. The initial, intermediate and final steps usually
involve matrices. While knowledge of linear algebra is essential in understanding the material in this chapter, we will focus on a
narrower topic – matrix algebra.
10.1.1 Definitions
Matrix: A two-dimensional matrix is a rectangular array of numbers. Each number or element of the matrix is identified by its
location – a row number and a column number. Consider the following example.
 A11 A12 ... ... A1n 
A A22 ... ... A2 n 
 21 
A m n   ... ... ... ... ...  (10.1.1-1)
 
 ... ... ... ... ... 
 Am 1 Am 2 ... ... Amn 
A typical element of the matrix A is designated Aij where i the row number and j is the column number. We will usually,
but not always, denote matrices with an upper case alphabet.
Vector: A vector is a special instance of a matrix. It has either one row or one column. We will usually, but not always, denote a
vector with a lower case alphabet.
Row Vector: A vector with one row is called a row vector. Consider the following example.
a1n   a1 a2 ... a n  (10.1.1-2)
Column Vector: A vector with one column is called a column vector. Consider the following example.
 a1 
a 
 2
a m 1    (10.1.1-3)
 ... 
a m 
Null Vector: A null vector is such that all the elements of the vector are zero. For example
a1n   0 0 ... 0 (10.1.1-4)
Square Matrix: A square matrix has the same number of rows and columns. For example,
 12 3 1 
B33   5 8 0 
 55 1 22 
is a square matrix with integer elements.
Symmetric Matrix: A square matrix such that Aij  A ji for any i , j is a symmetric matrix. For example,

 12 3 1 
B33   3 8 0 
 1 0 22 
is a symmetric matrix.
Diagonal Matrix: A square matrix such that Aij  0 if i  j is a diagonal matrix. For example,

12 0 0
B33   0 8 0 
 0 0 22 

S. D. Rajan, 2000-24 10-266


M A T R I X A L G E B R A

is a diagonal matrix.
Identity Matrix: A diagonal matrix such that Aii  1 , Aij  0, i  j is an identity matrix and is denoted In n . For example, the
following is an identity (or, unit) matrix of order or size 3.
1 0 0 
I 33   0 1 0 
 0 0 1 
Upper Triangular Matrix: A square matrix such that Aij  0 if i  j is an upper triangular matrix. For example,

 12 55 0 
B33   0 8 10 
 0 0 22 
is an upper triangular matrix.
Lower Triangular Matrix: A square matrix such that Aij  0 if i  j is an lower triangular matrix. For example,

 12 0 0 

B33   55 8 0 
 0 10 22 
is a lower triangular matrix.
Positive Definite Matrix: A square matrix such that all its eigenvalues are positive. We will look at eigenvalues in Chapter 16.
Orthogonal Matrix: A square matrix such that its transpose is equal to its inverse. In other words, A T A  AA T  I .
Hermitian or Self-Adjoint Matrix: A square matrix such that it is equal to its complex-conjugate of its transpose. In other words,
A  A † . For a real matrix, Hermitian means the same as symmetric.

10.1.2 Operations
Addition and Subtraction: Two matrices of the same size can be added or subtracted from one another. For example, if
A m n  Bm n  Cm n then Aij  Bij  C ij (10.1.2-1)
and, if
A m n  Bm n  C m n then Aij  Bij  C ij (10.1.2-2)
Consider the following example. Let
 12 3 1   0 12 1
B33   3 8 0  and C33  15 8 1 
 1 0 22   11 0 7 
Then
12 9 0   12 15 2 
A  B  C  12 16 1  and A  B  C   18 0 1 
12 0 29   10 0 15 
Multiplication: Two matrices can be multiplied as follows
A m n  B m o Co n (10.1.2-3)

S. D. Rajan, 2000-24 10-267


M A T R I X A L G E B R A

provided the number of columns in B is equal to the number of rows in C . This condition makes the two matrices
conformable. The resulting matrix A has its number of rows equal to the number of rows in B and number of columns equal
to the number of columns in C . To generate the elements of the resulting matrix A we need
o
Aij   BikC kj (10.1.2-4)
k 1

In other words, the product of the corresponding elements from row i of B with the elements from column j of C yields
Aij . This operation is similar to computing the dot product.
For example, let
 12 3 1   0 12 
B33  
  3 8 0  and C32  15 8 
 1 0 22  11 0 
Then A 32  B33C32 can be computed by writing the three matrices as follows.
 0 12 
 
15 8 
11 0 
 
 12 3 1   A11 A12 
   
 3 8 0  =  A21 A22 
 1 0 22  A A32 
   31 
where A11 = the product of the first row of B times the first column of C
 (12)(0)  ( 3)(15)  (1)(11)  34
A12 = the product of the first row of B times the second column of C
 (12)(12)  ( 3)(8)  (1)(0)  120
A21 = the product of the second row of B times the first column of C
 ( 3)(0)  (8)(15)  (0)(11)  120
A22 = the product of the second row of B times the second column of C
 ( 3)(12)  (8)(8)  (0)(0)  28
A31 = the product of the third row of B times the first column of C
 (1)(0)  (0)(15)  (22)(11)  242
A32 = the product of the third row of B times the second column of C
 (1)(12)  (0)(8)  (22)(0)  12
Transpose: The transpose of matrix A m n is denoted A nT m . The transpose matrix is constructed such that
AijT  A ji (10.1.2-5)
As can be seen from Eqn. (6.1.2-5), the transpose matrix is obtained by interchanging the rows and columns of the original
matrix. Let

S. D. Rajan, 2000-24 10-268


M A T R I X A L G E B R A

 0 12 
 0 15 11
C32  15 8  . Then C 2T3   
11 0  12 8 0 

Determinant: The determinant of a square matrix A n n is denoted det( A ) and is given by


n
det( A )   a ij M ij for any i  1, 2,..., n (10.1.2-6a)
j 1

n
or, det( A )   a ij M ij for any j  1, 2,..., n (10.1.2-6b)
i 1

where minor M ij is the determinant of the ( n  1)  ( n  1) submatrix obtained by deleting the ith row and jth column, and
cofactor a ij associated with M ij is defined to be a ij  ( 1)i  j M ij . While it will not be necessary for us to compute the
determinant of a matrix, we still need to understand the concept. Let
 4 3   8 3
A 2 2    and B22  16 6 
 1 6   
Then using Eqn. (10.1.2-6a) with i  1 ,
n
det( A )   a ij Aij  a11 A11  a12 A12  4 a11  3a 12
j 1

a11  ( 1) 1 1
det 6  (1)(6)  6 a12  ( 1)1 2 det 1  ( 1)(1)  1
det( A )  4 a11  3a12  4(6)  3( 1)  27
n
and, det( B )   bij Bij  b11 B11  b12 B12  8b11  3b12
j 1

b11  ( 1) 1 1
det 6  (1)(6)  6 b12  ( 1)1 2 det 16  ( 1)(16)  16
det( B )  8b11  3b12  8(6)  3( 16)  0
Since the determinant of B is zero, B is known as a singular matrix. In the next section we will see a more efficient way to
compute the determinant of a matrix.

10.2 Solving Linear Algebraic Equations


Several engineering and scientific problems require the solution of linear algebraic equations of the form
A m n xn 1  b m 1 (10.2.1)
where A m n is the coefficient matrix with constant coefficients, bm1 is the right-hand side (RHS) vector and x n1 is the vector
of unknowns. A unique, nontrivial solution, x n1 can be obtained if and only if
(a) the number of unknowns, n is equal to the number of equations, m such that A is a square matrix,
(b) A n n is nonsingular, and
(c) bm1 is not a null vector (there is at least one non-zero term in b ).

If m  n , we have an over determined system. On the other hand, if m  n then it is possible that there are no unique solutions.
Equation solvers can be categorized as being either direct or iterative. Direct solvers are those that solve Eqns. (10.2.1) in a non-
iterative fashion. The procedure typically involves one pass through the n equations. On the other hand, iterative solvers
transform the problem into an equivalent problem and the solution is improved iteratively.

S. D. Rajan, 2000-24 10-269


M A T R I X A L G E B R A

There are at least four major issues in solving Eqn. (10.2.1).


(a) How much of storage space will be used especially by the coefficient matrix?
(b) How can numerically accurate solutions be obtained?
(c) How much time will be taken to obtain the solution?
(d) How much additional effort is needed if a solution is to be generated for a new right-hand side vector?

There are other issues such as parallelizing and vectorizing the solution procedure on specialized hardware and software systems,
but they are beyond the scope of the discussions here.
10.2.1 Storage Scheme
The coefficient matrix A n n can take on many different forms depending on the application. The matrix can be full meaning
that all the elements in the matrix are nonzero. At the other end of the spectrum the matrix can be sparse. For example, solution
of some of the popular partial differential equations involves solving a system of equations where the nonzero entries make up
a few percent (1-10%) of the entire matrix. The coefficient matrix can have other forms and properties – symmetric, banded,
anti-symmetric, skyline, positive definite, indefinite, etc. The implementation efficiency of an algorithm can be tied to its storage
scheme.
Full: This is the simplest storage form where the matrix can be stored either rowwise or columnwise. If A n n is stored rowwise
then the elements of the matrix are stored as
 A11 , A12 ,..., A1n , A21 , A22 ,..., A2n ,... An1 , An 2 ,..., Ann 
On the other hand, if A n n is stored columnwise then the elements of the matrix are stored as
 A11 , A21 ,..., An1 , A12 , A22 ,..., An 2 ,... A1n , A2n ,..., Ann 
In FORTRAN, matrix elements are stored columnwise. In statically allocated C++ matrices, matrix elements are stored
rowwise. The CMatrix class that we developed in Chapter 9 is designed to store the elements rowwise and with a little
tweaking can also be designed to store the elements columnwise. Either way the total storage requirement is n 2 locations.
Banded and Skyline: In the context of finite element analysis, A n n is mostly symmetric and positive definite. Under certain
scenarios, A has a special form that can be exploited from a storage perspective. Consider the symmetric matrix shown in Fig.
10.2.1. The figure shows only the nonzero upper triangular components.
 A11 A12 A15 A16 
 A22 A25 A26 
 
 A33 A34 A35 A36 
 
 A44 A45 A46 
 A55 A56 A57 A58 A59 A5,10 
 
 A66 A67 A68 A69 A6,10 
 A77 A78 
 
 A88 
 A99 A9,10 

 A10,10 
Fig. 10.2.1 Upper triangular portion of the system stiffness matrix
We will draw a special box that encompasses all the nonzero elements in the upper triangular portion of the matrix. As can be
seen from Fig. 10.2.2, the nonzero elements are all contained within a band. The width of this band is known as the half-band
width (HBW) of the matrix. To find the HBW, we scan each row of the matrix starting with the diagonal element and look for

S. D. Rajan, 2000-24 10-270


M A T R I X A L G E B R A

the last nonzero entry in that row. The maximum distance from the last nonzero element to the diagonal element (in a particular
row) for all the rows gives the HBW. The formula for HBW of row i is given as
 HBW i  c  i  1 (10.2.2a)
where c is the column number of the last nonzero element in row i (note, the +1 is used so that the distance includes both
the last nonzero element and the diagonal element) and
HBW  max  HBW i (10.2.2b)
i

For example, in row 1, since the last nonzero element is A16 , this distance is 6  1  1  6 . As can be deduced from Fig. 10.2.2,
the half-band width (HBW) of the given matrix is 6.

Fig. 10.2.2 Banded profile for the nonzero elements


Instead of storing the entire matrix (full storage), we can store this symmetric, banded matrix as a rectangular matrix as follows.
 A11 A12 0 0 A15 A16 
 A 0 0 A25 A26 0 
 22 
 A33 A34 A35 A36 0 0 
 
 A44 A45 A46 0 0 0 
 A55 A56 A57 A58 A59 A5,10 
banded
A 10 6  
 A66 A67 A68 A69 A6,10 0 
 A A78 0 0 0 0 
 77 
 A88 0 0 0 0 0 
 A A9,10 0 0 0 0 
 99
 A10,10 0 0 0 0 0 
Fig. 10.2.3 Symmetric, banded matrix stored as a rectangular matrix
The relationship (mapping) between the elements in the original A in Fig. 10.2.1 and the banded form A banded in Fig. 10.2.3
can be derived simply as follows.
Ai , j  0 if  j  i  1  HBW (10.2.3)

Ai , j  0 if  j  i (10.2.4)

else Ai , j  Aibanded
, j  i 1 (10.2.5)

S. D. Rajan, 2000-24 10-271


M A T R I X A L G E B R A

As we can see in Fig. 10.2.2, a good number of the elements within the band are zero! We can capitalize even more on this
characteristic by storing only the elements within the skyline profile that is shown in Fig. 10.2.4.
A11 A12 A15 A16
A22 A25 A26
A33 A34 A35 A36
A44 A45 A46
A55 A56 A57 A58 A59 A5
A66 A67 A68 A69 A6
A77 A78
A88
A99 A9
A10,10
Fig. 10.2.4 The “skyline” profile
The original matrix, in fact, can be stored in a vector as follows.
351   A11 , A22 , A12 , A33 , A44 , A34 ,..., A10,10 , A9,10 , A8,10 , A7,10 , A6,10 , A5,10 
A skyline (10.2.6)
Note that each column is stored starting with the diagonal element of that column followed by all the other elements in that
column until the last nonzero entry in that column. To facilitate mapping the original elements of the matrix, an additional
indexing vector, Dlocn1 is created that has ( n  1) elements. These elements store the location of the diagonal element (of each
column) with the last element storing the (last element+1) in the stiffness matrix. In other words, the last element contains one
more than the total number of entries in the skyline profile. Going back to the current example, we have the following Dlocn1
vector.
loc
D11  1, 2, 4, 5, 7,12,18, 21, 25, 30, 36 (10.2.7)
The relationship (mapping) between the elements in the original A in Fig. 10.2.2 and the skyline form A skyline in Fig. 10.2.4 can
be derived as follows.
Ai , j  0 if  j  i    D locj 1  D locj  (10.2.8)

Ai , j  0 if  j  i (10.2.9)
Ai , j  l  D locj  j  i  Alskyline (10.2.10)
For example, to locate A68 we note that (a) i  6, j  8 , (b) D  21 , (c) l  21  8  6  23 . Hence, A68  A23skyline .
loc
8

Finally, let us compare the storage requirements of the three schemes – full, banded, and skyline, assuming that the stiffness
matrix is stored in the double precision format, and that two integer words make up a single double precision word
( q  hbw , m  Dlocn1  1) .
Storage Scheme What is to be stored? Equivalent integer words
Full A n n 2n2

Banded A banded
n hbw
2nq
Skyline A skyline
m , Dlocn1 2m  n  1

Using our example, we have the three values in the last column as 200,120 and 81 integer words – significant savings with
increasing sophistication of the storage scheme.

S. D. Rajan, 2000-24 10-272


M A T R I X A L G E B R A

35
 100  35%
Sparse: It is not evident with our simple example that the matrix is sparse. The percent sparsity of the matrix is
100
. For some engineering problems, as the size of the problem increases, the sparsity of the matrix increases (or the number of
nonzero terms decreases). In one of the sparse storage schemes, three vectors are used to track the locations of the nonzero
entries. The matrix is stored in a vector rowwise with only the non-zero entries being stored1 in A sparse
m1 . Using our previous
example, we have the following.
311   A11 , A12 , A15 , A16 , A22 , A25 , A26 ,..., A88 , A99 , A9,10 , A10,10 
A sparse (10.2.11)

To access the entries in the matrix, two additional (indexing) vectors are needed. The first, C m1 is used to store the column
numbers of the nonzero entries. Again, we have with our example
C311  1, 2, 5, 6, 2, 5, 6, 3, 4, 5, 6,...,8, 9,10,10 (10.2.12)
The second, R ( n 1)1 , stores the starting location of each row. Again, we have with our example

R 111  1, 5,8,12,15, 21, 26, 28, 29, 31, 32 (10.2.13)
Discussion of sparse equation solvers is outside the scope of this book.
Offline: There are special situations when the solution procedure operates on matrices that are written on and retrieved from
computer hard disk. This is typically done when the size of the stiffness matrix and other associated vectors/matrices in any
storage format is several times the size of the computer’s random-access memory (RAM). Discussion of offline equation solvers
is outside the scope of this book.
10.2.1 Direct Solvers
In this section we will see several solution methods starting with the Gaussian Elimination technique.

Gaussian Elimination
There are three operations that can be used on a system of equations to obtain another equivalent system.
(1) We can interchange the order of two equations.
(2) Both sides of an equation may be multiplied by a nonzero constant.
(3) A multiple of one equation can be added to another equation.
Consider the following set of equations.
 8 1  x 1   6   x 1   1 
 4 7   x   18    x    2 
  2    2  
(1) Interchanging the two equations, we have the following set of equations.
 4 7   x 1  18   x 1   1 
 8 1  x    6    x    2 
  2    2  
Note that row interchanges do not require changing the order of the unknowns.
(2) We could multiply the second equation by a constant (say 1.5) to obtain the following set of equations.
 8 1   x 1   6   x 1   1 
6 10.5   x    27    x    2 
  2    2  
 4
(3) If we multiply the first equation by     0.5 and add to the second equation, we obtain the following set of equations.
 8

1 It should be noted that zero entries within the skyline profile may become nonzero during the solution phase.

S. D. Rajan, 2000-24 10-273


M A T R I X A L G E B R A

 8 1   x 1   6   x 1   1 
 0 7.5   x   15    x    2 
  2    2  
The central idea in the Gaussian Elimination technique is based on generating an equivalent set of equations using row
operations (3) discussed earlier.
There are two phases to the solution – forward elimination and backward substitution. In the forward elimination phase, the
basic idea is to take the set of equations from the original form
 A11 A12 A13 A1i A1n   x 1  b1 
A A22 A23 A2 i A2 n  x 2  b2 
 21    
 A31 A32 A33 A3i A3n   x 3  b3 
    
      (10.2.14)
 Ai 1 Ai 2 Ai 3 Aii Ain   x i  bi 
    
    
A    
 n1 An 2 An 3 Ani Ann   x n  bn 

to an upper triangular matrix of the form

 A11 A12 A13 A1i A1n   x 1   b1 


 0 A (1)
A (1)
A (1)
A2(1)n  x 2   b2(1) 
 22 23 2i    
 0 0 A ( 2)
33 A (2)
3i A3( n2 )   x 3   b3( 2 ) 
    
      (10.2.15)
 0 0 0 A ( i 1)
Ain( i 1)   x i  bi( i 1) 
   
ii

    
 0 ( n 1)     (n ) 
 0 0 0 Ann   x n   bn 
With each step through the equations, one unknown is eliminated until only x n remains as the unknown in the equation. In
step k  k  1, 2,..., n  1

Aik( k 1) ( k 1)


Aij( k )  Aij( k 1)  ( k 1)
Akj i , j  k  1,..., n (10.2.16)
Akk
Aik( k 1) ( k 1)
bi( k )  bi( k 1)  ( k 1) k
b i  k  1,..., n (10.2.17)
Akk
( k 1)
If Akk   , where  is a small positive constant, then the system of equations is linearly dependent.

In the backward substitution phase (dropping the superscript), we first compute the value of the last unknown  x n  followed
by the other unknowns as shown below.
b
xn  n (10.2.18)
Ann
n
bi  
j  i 1
Aij x j
and xi  i  n  1, n  2,...,1 (10.2.19)
Aii

S. D. Rajan, 2000-24 10-274


M A T R I X A L G E B R A

Algorithm
Step 1: Forward Elimination. Loop through rows, k  1,..., n  1 .
Step 2: Check if Akk   . If yes, stop. The equations are linearly dependent (or A is singular).
Step 3: Loop through rows, i  k  1,..., n .
Aik
Step 4: Compute constant, c  .
Akk
Step 5: Loop through columns j  k  1,..., n .
Step 6: Set Aij  Aij  cAkj .
Step 7: End loop j .
Step 8: Set bi  bi  cbk .
Step 9: End loop i .
Step 10: End loop k .
Step 11: Backward substitution. Set x n  bn Ann .
Step 12: Loop through all rows, i  n  1,...,1 .
n
Step 13: Compute sum  
j  i 1
Aij x j .

bi  sum
Step 14: Compute x i  .
Aii
Step 15: End loop i .
Example 10.2.1
Solve the following set of equations using Gaussian Elimination method.
 10 5 2   x 1   6 
 3 20 5   x   58 
  2  
 2 7 15   x 3  57 
Solution
Forward Substitution
Noting that n  3 , the successive snapshots as we go through the algorithm are as follows.
10 5 2   x1   6 
   
k  1   21.5 4.4  x 2   56.2 
 6 15.4   x 3  58.2 

10 5 2   x1   6 
   
k  2   21.5 4.4  x 2    56.2 
 14.172   x 3  42.5163
Backward Substitution
42.5163
x3  3
14.172

S. D. Rajan, 2000-24 10-275


M A T R I X A L G E B R A

56.2  4.4  3 
i  2  x2  2
21.5
6  ( 4)
i  1  x1  1
10
1
 
Hence the solution is x 31  2  .
3
 
Pivoting and Scaling: Numerical problems of the form of truncation and round-off errors can be problematic if the coefficient
matrix A is not well-conditioned. A small change in b resulting in a large change in x is symptomatic of this ill-conditioning.
The culprit in the standard implementation of the Gaussian Elimination technique is Akk that is used in Step 4. With rounding
errors, this number could turn out to be very small and using this value could result in gross errors. Pivoting can be used to
improve the computations and reduce the effects of the numerical errors.
Partial pivoting: For 1  k  n  1 , at the kth stage, let
c k  max Aik( k ) (10.2.20a)
k i n

Let i be the row index such that i  k for which we obtain c k . If i  k , we switch rows k and i in both A and b . By using
the largest remaining element in A , we prevent the creation of elements in A ( k ) of greatly varying size that leads to numerical
errors.
Total pivoting: For 1  k  n  1 , at the kth stage, let
c k  max Aij( k ) (10.2.20b)
k i , j n

Let i , j be the row and column indices such that i , j  k for which we obtain c k . If i  k , we switch rows k and i in both
A and b . If j  k , we switch columns k and j in A and the order of the unknowns in x . At the end of the solution, we
need to switch the order of the unknowns back to the original form.
Numerical studies have shown that total pivoting prevents the catastrophic accumulation of roundoff errors. However, total
pivoting is an expensive process. Partial pivoting provides adequate relief for most practical problems.
Scaling: Round-off errors are likely to dominate if the elements of the coefficient matrix A vary greatly in value. One solution to
this problem is to scale A so that this variation in value is less, and this can be achieved by multiplying the rows and columns
by an appropriate constant. In practice, it is necessary to scale the rows so that they are approximately equal in magnitude.
Similarly, x should be scaled so that all the unknowns are approximately equal.
Let S1 and S2 be diagonal scaling matrices so that
  S AS
A (10.2.21a)
1 2

Hence the solution to the original equations can be obtained by


 =S b
Ay and x=S y (10.2.21b)
1 2

To select the appropriate scaling constant, it is desirable that


max 
Aij  1 i  1, 2,..., n (10.2.21c)
1 j  n

The selection of the pivot elements for step k is


Aik( k )
c k  max s i  max Aij i  1, 2,..., n (10.2.21d)
k i n si 1 j  n

which is a modification of the condition used in partial pivoting (Eqn. (10.2.20b)). The modified Gaussian Elimination algorithm
is presented below. There is an additional vector piv that is needed to store the row interchange information. In other words,

S. D. Rajan, 2000-24 10-276


M A T R I X A L G E B R A

if piv ( k )  k , then no row interchange took place. Otherwise piv ( k )  i indicates that rows i and k were interchanged at
step k .

Algorithm
Step 1: Forward Elimination. Compute s i  max Aij i  1, 2,..., n .
1 j  n

Step 2: Loop through rows, k  1,..., n  1 .


Aik
Step 3: Compute constant, c k  max .
k i n si
Step 4: Let i 0 be the smallest index i  k for which the maximum in Step 3 is attained. Store piv ( k )  i 0 .
Step 5: Check if c k   . If yes, stop. The equations are linearly dependent (or A is singular).
Step 6: If i 0  k , interchange Akj and Ai 0 j , j  k ,..., n .
Step 7: Loop through columns, i  k  1,..., n .
Step 8: Set Aik  s  Aik Akk .
Step 9: Set Aij  Aij  sAkj , j  k  1,..., n .
Step 10: End loop i .
Step 10: End loop k .
Step 11: Forward substitution. Loop through k  1, 2,..., n  1 .
Step 11: If j  piv ( k )  k , then interchange b j and bk .
Step 12: Set bi  bi  Aik bk i  k  1,..., n
Step 13: End loop k .
Step 14: Backward substitution. Set bn  bn Ann .
Step 15: Loop through all rows, i  n  1,...,1 .
n
Step 16: Compute sum  
j  i 1
Aij x j .

bi  sum
Step 17: Compute x i  .
Aii
Step 18: End loop i .

Error Analysis
It is practical and necessary at the end of the solution process to examine the quality of the solution. If x is indeed the solution,
then the residual vector r = Ax - b should be zero. Numerically, the residual vector is not zero and one must ascertain the
magnitude of the residual terms. One can define the absolute error and relative errors
 abs  r (10.2.22)
r
 rel  (10.2.23)
b
Provided both these measures are small, the solution x is acceptable.

S. D. Rajan, 2000-24 10-277


M A T R I X A L G E B R A

Example 10.2.2
Solve the following set of equations using Gaussian Elimination method with and without partial pivoting. Compute the residual
vector in each case.
 0.7 0.8 0.9   x 1  0.7 
1.0000001 1.0 1.0  x   0.8 
  2  
 1.3 1.2 1.1  x 3  1.0 
Solution
Numerically the coefficient matrix is nearly singular.
The solution and residual vector using pivoting is as follows.
499999.9994  1.16415(10 10 )
   
x   1000000.649  r   5.82076(10 11 ) 
499999.7994   0 
   
The solution and residual vector without pivoting is as follows.
 0.7  0.280000188 
   
x  0.2000001 r   0.20000015 
0.09999988  0.220000252 
   

LU Factorization
The matrix A in Ax  b can be factored2 as A  LU where L is a lower triangular matrix with nonzero values on the
diagonal and below, and U is an upper triangular matrix with nonzero values on the diagonal and above. For example, we
could describe the situation symbolically as follows.
 A11 A12 A13 A14   L11  U 11 U 12 U 13 U 14 
A A A A  L L  U 22 U 23 U 24 
 21 22 23 24 

21 22   (10.2.24)
 A31 A32 A33 A34   L 31 L 32 L 33  U 33 U 34 
    
 A41 A42 A43 A44   L 41 L 42 L 43 L 44   U 44 
The attractive aspect of this procedure is that we can solve the original equations (10.2.1) as follows.
A n n x n 1  bn 1 (10.2.25a)
or L n n Un n x n 1  bn 1 (10.2.25b)
Let L n n y n1  bn 1 (10.2.25c)
and solve for y . Then solve
Un n x n 1  y n 1 (10.2.25d)
for x . The implication is that once L and U have been obtained, a new RHS vector requires just forward and backward
substitutions in Eqns. (10.2.21c-d). In other words, the forward substitution involves computing
b1
y1  (10.2.26a)
L11

2 We do not present a formal proof here.

S. D. Rajan, 2000-24 10-278


M A T R I X A L G E B R A

i 1
bi   L ij y j
j 1
yi  i  2, 3,..., n (10.2.26b)
L ii
Similarly, the backward substitution involves computing
yn
xn  (10.2.27a)
U nn
n
yi  U
j  i 1
ij xj
xi  i  n  1, n  2,...,1 (10.2.27b)
U ii
The major question that remains is how do we obtain the L and U matrices? If we multiply the L and U matrices, we obtain
for a typical term
Aij  L i 1U 1 j  L i 2U 2 j  ...... (10.2.28)
However, we should note that all elements of L and U matrices do not exist. Hence,
i j: Aij  L i 1U 1 j  L i 2U 2 j  .....  L ii U ij (upper triangular elements) (10.2.29a)
i j: Aij  L i 1U 1 j  L i 2U 2 j  .....  L ii U jj (diagonal elements) (10.2.29b)
i j: Aij  L i 1U 1 j  L i 2U 2 j  .....  L ij U jj (lower triangular elements) (10.2.29c)

These three sets of equations indicate that we have n 2 equations for  n 2  n  unknowns – the elements in L and U matrices.
Crout’s Algorithm simply is to set the diagonal entries in L as unity, i.e. L ii  1 . Now we can solve n 2 equations in n 2
unknowns. An examination of the three equations will show that we can start with column 1 and compute U11 . Once we know
U11 , we can compute all the elements in the first column of L , i.e. L i 1 , i  2,..., n . Next, we compute the elements in the
second column of U , i.e. U12 , U 22 . After that we can compute the elements in the second column of L , i.e. L i 2 , i  3,..., n.
We repeat this process until the elements in all the columns of L and U have been computed.
Algorithm
Phase 1: Factorization
Step 1: Loop through all diagonal entries in L and set them to unity, L ii  1, i  1, 2,..., n .
Step 2: Loop through j  1, 2, 3,..., n .
i 1
Step 3: Loop through i  1, 2,.., j . Set U ij  Aij   L ikU kj .
k 1

Step 4: Check if U jj   . If yes, stop. The equations are linearly dependent (or A is singular).
j 1
Aij   L ikU kj
Step 5: Loop through i  j  1, j  2,.., n . Set L ij  k 1
.
U jj
Step 6: End loop through j .
Phase 2: Forward and backward substitution
Step 1: Solve Eqn. (10.2.25c): Solve for y1 (Eqn. 10.2.26a). Loop through i  2, 3,..., n and solve for y i using Eqn. (10.2.26b).
Step 2: Solve Eqn. (10.2.25d): Solve for x n (Eqn. 10.2.27a). Loop through i  n  1, n  2,...,1 and solve for x i using Eqn.
(10.2.27b).

S. D. Rajan, 2000-24 10-279


M A T R I X A L G E B R A

The attractive aspect of this algorithm is that we really do not need (additional) storage space for the L and U matrices. The
original A matrix can be overwritten with these matrices such that
 A11 A12 A13 A14  U11 U 12 U 13 U 14 
A A22 A23 A24  L U 22 U 23 U 24 
 21    21 
 A31 A32 A33 A34   L 31 L 32 U 33 U 34 
   
 A41 A42 A43 A44   L 41 L 42 L 43 U 44 
Furthermore, we do not need space for y . We first implement Step 1 in Phase 2 using x to store the y values. Then Step 2
can be computed with every successive y i value replaced with the x i value starting y n  x n .
The LU Factorization provides a convenient and efficient way to compute the determinant of a matrix. Since A = LU , we
have det( A )  det( L )det( U ) . Since det( L )  1 , the determinant of the original matrix is simply
det( A )  det( U )  U11U 22 U nn .
Example 10.2.3
Solve the following set of equations using LU Factorization and compute the determinant of the coefficient matrix.
 10 5 2   x 1   6 
    
 3 20 5   x 2   58 
 2 7 15   x 3  57 
Solution
Factorization  n  3
j  1, i  1  U11  A11  10
A21 3
j  1, i  2  L 21    0.3
U11 10
A31 2
j  1, i  3  L 31    0.2
U11 10
j  2, i  1  U12  A12  5
j  2, i  2  U 22  A22  L 21  U12  20  (0.3)( 5)  21.5
A32  L 31U12 7  ( 0.2)( 5)
j  2, i  3  L 32    0.27907
U 22 21.5
j  3, i  1  U13  A13  2
j  3, i  2  U 23  A23  L 21U13  5  (0.3)(2)  4.4
j  3, i  3  U 33  A33  L 31U13  L 32U 23  15  ( 0.2)(2)  (0.27907)(4.4)  14.1721
Hence the new coefficient matrix replaced by the elements of L and U matrices is as follows.
 10 5 2 

A   0.3 21.5 4.4 
 0.2 0.27907 14.1721
Forward Substitution
b1 6
i  1  y1   6
L11 1

S. D. Rajan, 2000-24 10-280


M A T R I X A L G E B R A

b2  L 21 y1 58  (0.3)(6)
i  2  y2    56.2
L 22 1
b3  L 31 y1  L 32 y 2 57  ( 0.2)(6)  (0.27907)(56.2)
i  3  y3    42.5163
L 33 1
Backward Substitution
y 42.5163
i  3  x3  3  3
U 33 14.1721
y 2  U 23 x 3 56.2  (4.4)(3)
i  2  x2   2
U 22 21.5
y1  U12 x 2  U13 x 3 6  ( 5)(2)  (2)(3)
i  1  x1   1
U11 10
Determinant of A
det( A )  U11U 22U 33  (10)(21.5)(14.1721)  3047

Cholesky Decomposition or LDLT Factorization (or, Decomposition)


When A is symmetric and non-singular, the matrix can be factored as A  LDLT where L is a lower triangular matrix with
1’s on the diagonals and D is a diagonal matrix with positive entries. The LDLT decomposition is a variation of Cholesky
Decomposition3, and provides a very effective solution to the system equations.
The solution proceeds as follows. We first factor A  LDLT by finding the L and D matrices.
 K 11 K 12 . K 1n   1 0 . 0   D1 0 . 0  1 L 21 . Ln1 
K K 22 . K 2n   L 21 1 . 0  0 D2 . 0  0 1 . Ln 2 
 12    
 . . . .   . . . .  . . . .  . . . . 
     
 K 1n K 2n . K nn   L n 1 Ln 2 . 1  0 0 . Dn   0 0 . 1 

 D1 D1 L 21 D1 L 31 . D1 L n 1 
 D1 L221  D2 D1 L 21 L 31  D2 L 32 . D1 L 21 L n 1  D2 L n 2 
 
 D1 L231  D2 L232  D3 . D1 L 31 L n 1  D2 L 32 L n 2  D3 L n 3  (10.2.30)
 
 . . 
sym D1 L2n 1  D2 L2n 2  ...  Dn 
 
By comparing the RHS of Eqn. (10.2.30) with the LHS we have the mechanism to compute the L and D matrices given the
A matrix.
To obtain the solution once the decomposition or factorization is completed, we have the following steps.
LDLT x  b (10.2.31)
Let Ly  b (10.2.32)
Then DLT x  y (10.2.33)
T
Note that DL is of the upper triangular form.

3 Strictly speaking, Cholesky Decomposition A = U T U can only be used with a symmetric positive definite A .

S. D. Rajan, 2000-24 10-281


M A T R I X A L G E B R A

 D1 D1 L12 . D1 L1n 
 D . D 
2 L 2n 
DLT  
2
(10.2.34)
 . . 
 
 0 Dn 
We can solve Eqns. (10.2.32) for y through the forward substitution procedure that we have seen before. Once y has been
computed, we can solve Eqns. (10.2.33) through the backward substitution process.

Algorithm
Factorization Phase
Step 1: Loop through rows, i  1,..., n .
i 1
Step 2: Set Di  Aii   L2ij D j . If Di   , stop. The matrix is not positive definite.
j 1

i 1
A ji   L jk Dk L ik
Step 3: For j  i  1,..., n , set L ji  k 1
.
Di
Step 4: End loop i .
Forward and Backward Substitutions
Step 5: Forward Substitution. Set y1  b1 .
i 1
Step 6: For i  2,..., n , set y i  bi   L ij y j . This ends the Forward Substitution phase.
j 1

yn
Step 7: Backward Substitution. Set x n  .
Dn
yi n
Step 8: For i  n  1,...,1 , set x i 
Di
 L
j  i 1
ji x j . This ends the Backward Substitution phase.

A careful examination of the steps will show that no extra storage is required. The storage locations in A can be used to store
both D and L . Similarly, the storage locations in x can be used to store the elements of y .
Similar to LU Decomposition, Cholesky Decomposition requires no special effort to solve additional RHS vectors since the
factorization and the forward/backward substitutions steps are separate. Once the factorization step is completed (once), the
forward/backward substitutions steps can be repeated as many times as required.
Example 10.2.4
Solve the following set of equations using Cholesky Decomposition method.
 3.5120 0.7679 0 0 0   x1   0 
 0.7679 3.1520 0 2 0  x 2   0 
     
 0 0 3.5120 0.7679 0.7679   x 3    0 
 
 0 2 0.7679 3.1520 1.1520  x 4  0.04 
   
 0 0 0.7679 1.1520 3.1520   x 5   0 
Solution
Factorization  n  5 
i  1  D1  A11  3.5120 .

S. D. Rajan, 2000-24 10-282


M A T R I X A L G E B R A

A21 0.7679
j  2  L 21   =0.21865 . Also, L 31  L 41  L51  0 .
D1 3.5120

i  2  D2  A22  L221D1  3.1520   0.21865   3.5120   2.9841 .


2

A32  L 31D1 L 21
j  3  L 32  0.
D2
A42  L 41 D1 L 21 2  0
j  4  L 42    0.670219 . Also, L52  0 .
D2 2.9841
i  3  D3  A33  L231 D1  L232 D2  3.5120  0  0  3.5120 .
A43  L 41D1 L 31  L 42 D2 L 32 0.7679  0  0
j  4  L 43    0.21865 .
D3 3.5120
A53  L 51D1 L 31  L 52 D2 L 32 0.7679  0  0
j  5  L 53    0.21865 .
D3 3.5120
i  4  D4  A44  L241 D1  L242 D2  L243 D3

 3.1520  0  ( 0.670219)2  2.9841   0.21865   3.5120   1.64366


2

A54  L 51 D1 L 41  L 52 D2 L 42  L 53 D3 L 43
j  5  L 54 
D4
1.1520  0  0   0.21865  3.5120  0.21865 
  -0.598724
1.64366
i  5  D5  A55  L251 D1  L252 D2  L253 D3  L254 D4

 3.1520  0  0  (0.21865)2  3.5120    0.598724  1.64366   2.3949


2

Forward Substitution  Ly = b 

 1 0 0 0 0   y1   0 
 0.21865 1 0 0 0  y   0 
   2   
 0 0 1 0 0   y3    0 
    
 0 0.670219 0.21865 1 0   y 4  0.04 
 0 0 0.21865 0.598724 1   y 5   0 

i  1  y1  b1  0
i  2  y 2  b2  L 21 y1  0
i  3  y 3  b3  L 31 y1  L 32 y 2  0
i  4  y 4  b4  L 41 y1  L 42 y 2  L 43 y 3  0.04
i  5  y5  b5  L51 y1  L 52 y 2  L53 y 3  L 54 y 4  0   0.598724  0.04   0.023949

S. D. Rajan, 2000-24 10-283


M A T R I X A L G E B R A

Backward Substitution  DLT x  y 

 3.5120  (1)  3.5120  0.21865  0 0 0   x1   0 


    
 0  2.9841 (1) 0  2.9841 0.670219  0  x 2   0

 0 0  3.5120 1  3.5120  0.21865   3.5120  0.21865   x 3    0 
 
 0 0 0 1.64366 1 1.64366  0.598724   x 4   0.04 
  2.3949  (1) 
 0 0 0 0   x 5  0.023949 
0.023949
i  5  x5   0.01
2.3949
y4 0.04
i  4  x4   L 45 x 5    0.598724  0.01  0.0303232
D4 1.64366
y3
i  3  x3   L 34 x 4  L 35 x 5
D3
0
   0.21865  0.0303232    0.21865  0.01  0.00444367
3.5120
y2
i  2  x2   L 23 x 3  L 24 x 4  L 25 x 5
D2
0
  0   0.670219  0.0303232   0  0.0203232
2.9841
y1
i  1  x1   L12 y 2  L13 y 3  L14 y 4  L15 y 5
D1
0
   0.21865  0.0203232   0  0  0  0.00444367
3.5120

10.2.2 Special Cases


Sometimes solution to Ax = b is required with additional conditions of the following forms.
(a) x i  c
(b) c i x i  c j x j  c
We will handle each form separately only because of efficiency and accuracy considerations.
Case (a)
Consider the set of three equations
 A11 A12A13   x 1  b1 
    
 A21 A22 A23  x 2   b2  (10.2.35a)
 A31 A32 A33   x 3  b3 
with the additional condition as
x2  c (10.2.35b)
where c is a constant.
We will rewrite Eqn. (10.2.35a) as three separate equations
A11x 1  A12 x 2  A13 x 3  b1 (10.2.36a)

S. D. Rajan, 2000-24 10-284


M A T R I X A L G E B R A

A21 x 1  A22 x 2  A23 x 3  b2 (10.2.36b)


A31x 1  A32 x 2  A33 x 3  b3 (10.2.36c)
Utilizing the condition x 2  c in the above equations, we have
A11x 1  (0)x 2  A13 x 3  b1  A12 c (10.2.37a)
(0)x 1  (1)x 2  (0)x 3  c (10.2.37b)
A31x 1  (0)x 2  A33 x 3  b3  A32 c (10.2.37c)
Note carefully that Eqns. (10.2.36a) and (10.2.37a) are the same if x 2  c . We have merely taken the x 2 term to the right-hand
side where it belongs since x 2 is strictly no longer an unknown. Similar comments are valid for Eqns. (10.2.36c) and (10.2.37c).
In order that (a) we do not change the number of equations, and (b) recognize that x 2  c , suitable changes have been made
to Eqn. (10.2.36b) and Eqn. (10.2.37b) is merely x 2  c . We can rewrite Eqns. (10.2.37) as
 A11 0 A13   x 1  b1  A12 c 
    
 0 1 0  x 2    c  (10.2.38)
 A31 0 A33   x 3  b3  A32 c 
This approach is known as the Elimination Approach. The advantage of the equations in the above form is that the number of
equations remains the same and that the coefficient matrix is still square (and symmetric if the original A was symmetric). From
a viewpoint of implementing the solution procedure in the form of a computer program, these are desirable properties.
General Procedure: If condition x j  c is to be imposed, implement the following three steps.
Step 1: Modify the right-hand side vector as bi  bi  Aij c , i  1,..., n .
Step 2: Modify the coefficient matrix as
Aij  0 , i  1,...n
A ji  0 , i  1,...n
Step 3: Set A jj  1 .
This three-step process must be applied for each condition x j  c .
Case (b)
We will solve this case using the Penalty Approach. We will restrict our attention to the case where A is symmetric and positive
definite. From calculus, we have the condition that the minimum of
1 T 1
x Ax  x T b  C  c i x i  c j x j  c 
2
( x )  (10.2.39)
2 2
where C is a large number, is the solution to the original constrained problem. Note that  takes on a minimum value when
c i Di  c j D j  c is zero (or, numerically very small). Minimizing  or computing  x  0 yields the following equations

 A11 A1i A1 j A1n   x 1   b1 


 ..    
    ..

 Ai 1 Aii  Cc i2 Aij  Cc i c j Ain   x i   bi  Ccc i 
    
 ..     ..  (10.2.40)
 A j1 A ji  Cc i c j A jj  Cc 2
A jn  x j  b j  Ccc j 
    
j

 ..    .. 
A
 n1 Ani Anj Ann   x n   bn

S. D. Rajan, 2000-24 10-285


M A T R I X A L G E B R A

Note that the modified equations Ax = b still has a symmetric and positive definite coefficient matrix. The only question left
to answer is “What is the suitable value for the large number C ?” A popular choice that seems to work effectively, is to make
the constant a function of the largest element in the coefficient matrix.
C  10 4 max A pq , 1  p , q  n (10.2.41)
Example 10.2.5
Modify the following equations

 10 5 2   x 1   6 
 3 20 5   x   58 
  2  
 2 7 15   x 3  57 
so that the condition x 2  3 can be imposed.
Solution
Using the general procedure listed above, we have the following modified equations.
 10 0 2   x 1   6  5(3)   21
 0 1 0  x    3    3 
  2    
 2 0 15   x 3  57  7(3) 36 
These equations can now be solved using LU Decomposition.
Example 10.2.6
Modify the following equations
 10 5 2   x 1   6 
 5 20 5  x   58 
  2  
 2 5 15   x 3  57 
so that the condition 2x 1  x 3  3 can be imposed.
Solution
Using the procedure listed above, we have the following modified equations.
 10  20 104  2 2 5 2  20 10 4  (2)(1)  x 1   6  20 104  (3)(2) 
     
 5 20 5  x 2    58 
 2  x   
 2  20 10  (2)(1) 5 15  20  10  1   3  57  20 10  (3)(1)
4 4 4

Or, simplifying
800010 5 400002   x 1  1200006 
 5    
 20 5  x 2    58 
 400002 5 200015   x 3   600057 
These equations can now be solved using LDLT Decomposition.

10.3 Case Study: A Matrix ToolBox


We saw the development of the CVector and CMatrix classes in Chapter 9. In this section, we will see how to leverage that
development with building a matrix toolbox that will be useful in obtaining numerical solutions to engineering problems.
Example Program 10.3.1 Matrix Toolbox
We will now develop the function prototypes for the functions in the matrix toolbox. The template class is called CMatToolBox.
In the table below, we present the details of these template functions that use CVector and CMatrix classes.

S. D. Rajan, 2000-24 10-286


M A T R I X A L G E B R A

Function Prototype Remarks


void Add (const CVector<T>& A, const CVector<T>& B, Computes c = a + b
CVector<T>& C);
void Subtract (const CVector<T>& A, const CVector<T>& B, Computes c = a  b
CVector<T>& C);
T DotProduct (const CVector<T>& A, const CVector<T>& B); Computes c  a  b
void Normalize (CVector<T>& A); a
Computes
a
void Scale (CVector<T>& A, const T c); Computes a  c a where c is a constant.
T MaxValue (const CVector<T>& A) const; Computes max
i
a i 
T MinValue (const CVector<T>& A) const; Computes min
i
a i 
T TwoNorm (const CVector<T>& A); n
Computes a  a
i 1
2
i

T MaxNorm (const CVector<T>& A);


Computes a   max
i
 ai 
void CrossProduct (const CVector<T>& A, const CVector<T>& Computes c  a  b
B, CVector<T>& C);
void Add (const CMatrix<T>& A, const CMatrix<T>& B, Computes C = A + B
CMatrix<T>& C);
void Subtract (const CMatrix<T>& A, const CMatrix<T>& B, Computes C = A  B
CMatrix<T>& C);
void Multiply (const CMatrix<T>& A, const CMatrix<T>& B, Computes C = A  B
CMatrix<T>& C);
void Determinant (const CMatrix<T>& A, T& c); Computes c = det( A )
void Scale (CMatrix<T>& A, const T c); Computes A = c ( A ) where c is a constant.
T MaxNorm (const CMatrix<T>& A) const;
Computes A   max
i, j
 Aij 
void Transpose
CMatrix<T>& B);
(const CMatrix<T>& A, Computes B = A T
void MatMultVec (const CMatrix<T>& A, Computes b = Ax
const CVector<T>& x, CVector<T>& b);
void LUFactorization (CMatrix<T>& A, T TOL); Computes A = LU . Overwrite A with L
and U.
void LUSolve (const CMatrix<T>& A, CVector<T>& x, const Solves LUx = b . A contains L and U.
CVector<T>& b);
void AxEqb (CMatrix<T>& A, CVector<T>& x, Computes Ax = b using Gaussian
CVector<T>& b, T TOL);
Elimination.
void LDLTFactorization (CMatrix<T>& A, T TOL); Computes A = LDLT . Overwrite A with
L and D . A is a symmetric matrix.
void LDLTSolve (const CMatrix<T>& A, CVector<T>& x, const
CVector<T>& b);
Solves LDLT x = b . A contains L and D.

A sample client code that uses three of the functions from the matrix toolbox is shown below. Development of the remaining
functionalities is left as an exercise – See Problem 10.5.
Two versions of the matrix toolbox – float and double (defined in lines 15-16), are used in the test program. The float version
is used in testing vector addition and matrix-matrix multiplication. The double version is used in testing the solution to
simultaneous linear algebraic equations.

S. D. Rajan, 2000-24 10-287


M A T R I X A L G E B R A

main.cpp

Only the vector addition test is shown here. The loop that contains the three tests starts in line 18. The try block is wrapped
around all the tests so that if an exception is thrown it can be caught and handled before the next test is executed. The vector
addition takes place on line 34 using float vectors of size 3.

The catch block follows the try block.

The program terminates with the floating-point operation count statistics from the double precision version of the toolbox.

S. D. Rajan, 2000-24 10-288


M A T R I X A L G E B R A

Summary
Matrix algebra forms the foundation of most of numerical engineering analysis. We saw several matrix operations and solution
techniques in this chapter. We will see more solution techniques such as the numerical solution to eigenproblems, solving
ordinary and partial differential equations, numerical optimization and computer graphics in later chapters.

Where to go from here?


Problem 10.5 gives the reader an opportunity to develop a template matrix toolbox that used the CVector and CMatrix template
classes. It is a worthwhile exercise to spend time to make the toolbox robust and efficient. The efficiency aspect can be examined
by instrumenting the toolbox. The simplest is to capture the wall clock time for various steps of the algorithm using the CClock
class provided in the library directory. More sophisticated tools are available with various IDEs including Microsoft Visual
Studio - https://docs.microsoft.com/en-us/visualstudio/profiling/getting-started-with-performance-tools.

S. D. Rajan, 2000-24 10-289


M A T R I X A L G E B R A

Exercises
Appetizers
Problem 10.1
Solve the following set of equations by hand using LU Factorization. Also compute the residual vector and the absolute and
relative errors.
6 2 2   x 1   2 
 2 2 3 1 3 x    1 
  2  
 1 2 1   x 3   0 

Main Course
Problem 10.2
Solve the following set of equations by hand using Cholesky Decomposition. Also compute the residual vector and the absolute
and relative errors.
 2.25 3.0 4.5   x 1   1000.0 
 3.0 5.0 10.0  x    0 
  2  
 4.5 10.0 34.0   x 3  500.0 

Problem 10.3
Write two functions that help solve Ax = b where A is a symmetric, positive definite matrix stored in the rectangular format.
The function prototypes are given below.
void CholeskyFactorizationBanded (CMatrix<T>& A, const T TOL);
void CholeskySolveBanded (const CMatrix<T>& A, CVector<T>& x);

Problem 10.4
Write two functions that help solve Ax = b where A is a symmetric, positive definite matrix stored in the skyline format. The
function prototypes are given below.
void LDLTFactorizationSkyline (CVector<T>& A, CVector<int>& DLoc,
const int n, const T TOL);
void LDLTSolveSkyline (const CVector<T>& A, const CVector<int>&
DLoc, CVector<T>& x, const int n);

Numerical Analysis Concepts


Problem 10.5
Complete the implementation of the matrix toolbox discussed in Example Program 10.3.1. Write a client code that will properly
test the functions in the toolbox. Here are the programming details to note and follow.
(1) Functions should carry out the basic checks to see if the arrays are of the proper size. Functions should throw an
exception if they detect an input error.
(2) Do not use std::cout or std::cin statements in these functions unless specifically required. For example, the Display
function should use std::cout.
(3) Do not declare temporary or local arrays in any of these functions except the function that computes the determinant.
(4) For both LU and LDLT factorization approaches, assume that the LUFactorization and the LDLTFactorization
functions are called first (just once) and that the LUSolve and LDLTSolve functions can then be called with the
factorized A matrix as many times as needed.

S. D. Rajan, 2000-24 10-290


M A T R I X A L G E B R A

References
Gilberg and Forouzan, Data Structures: A Pseudocode Approach with C++, Brooks/Cole, 2001.
Kruse and Ryba, Data Structures and Program Design in C++, Prentice Hall, 1999.
Clifford Shaffer, Data Structures and Algorithm Analysis, Prentice-Hall, 1997.

S. D. Rajan, 2000-24 10-291


M A T R I X A L G E B R A

S. D. Rajan, 2000-24 10-292


11
R E G R E S S I O N A N A L Y S I S

Chapter

Regression Analysis
“Ifyourexperimentneedsstatistics,you oughttohavedone abetterexperiment.”ErnestRutherford

“There isgreatcorrelationbetweenmusicandimages.”GrahamNash

“Getting informationofftheInternetis liketakingadrinkfromafirehose.”MitchellKapor

Engineers and scientists deal with a large amount of data sometimes involving several variables. The data is used to understand
the relationship between these variables. Often, a model is built that facilitates this understanding and can be used as a predictive
tool. In later chapters, we will see some of these models in use with sophisticated numerical techniques.
In this chapter, the model will be described by simple yet powerful, functions. To understand how well the model captures the
data, a function is created and used. This function is called a figure-of-merit or merit function that measures the agreement
between data and the fitting model for a particular choice of the parameters1. The agreement is good when the value of the
merit function is small. In the process known as regression, parameters are adjusted based on the value of the merit function
until a smallest value is obtained, thus producing a best-fit with the corresponding parameters giving the smallest value of the
merit function known as the best-fit parameters (Press et al. 1992, p. 498).

Objectives
 To understand what regression analysis is.
 To understand how to fit data to a model.
 To understand some of the basics of distinguishing good fits and poor fits.

1 http://mathworld.wolfram.com/MeritFunction.html

S. D. Rajan, 2000-24 11-293


R E G R E S S I O N A N A L Y S I S

11.1 Building a Model


Often engineers and scientists conduct experiments to understand a system or process. For example, a structural engineer can
build a cantilever beam with a known material and known cross-sectional shape and dimensions. The engineer can then load
the beam by placing weights at the tip of the beam  P  and measuring the vertical displacement at the tip    . An experimental
setup (without the weight at the tip) is shown in Fig. 11.1.1.

y, v

A x
B 
B

L
(b)

(a)
Fig. 11.1.1 (a) Experimental setup to measure the tip displacement of a cantilever beam and (b) Model schematics
Placing a few different known weights at the tip and measuring the tip displacement (deflection) generates the data set. A sample
data set may look like (0.22 lb, 0.022 in), (0.44 lb, 0.045 in) and (1.10 lb, 0.11 in). Using the data, one can postulate a reasonable
model of the form
y x )    P   a0  a2 P
ˆ( (11.1.1)
since the plot of the load-deflection data (Fig. 11.1.2) suggests a linear relationship.

Fig. 11.1.2 Load-deflection plot of the experimental data


The model at this stage is a single independent variable model, i.e. the response (tip displacement) is a function of a single
variable (tip load). Additional experiments may involve changing (a) the length of the beam, (b) the material of the beam, and
(c) cross-sectional shape and dimensions of the beam. When the effects of the length of the beam, the material and cross-
sectional shape and dimensions are factored into the experiment, building the model becomes more complex since each variable
independently influences the tip displacement. If all the parameters are included in the model, the model building problem
becomes a 4 independent variable problem. Plotting the response (tip displacement) is not possible since the model space is in
a 5-dimensional space. Response Surface Methodology is one methodology that can be used to help build complex models.
In the rest of this chapter, we will deal with a single independent variable models.

S. D. Rajan, 2000-24 11-294


R E G R E S S I O N A N A L Y S I S

11.2 Least Squares Fit


Assume that you have n data points in the form  x i , y i  , i  1, 2,..., n that you wish to fit to a model, ˆ(
y x ) that is described
by (k+1) parameters, i.e. ˆy( x )  ˆy( a 0 , a1 ,..., a k , x ) . One way of fitting the model would be to minimize the square of the sum
of the differences between the data point y i and the model value at the same location, i.e. ˆ( y x i ) . You may have seen this
problem described as a least-squares problem. Formally, the problem can be expressed as
Find  a 0 , a1 ,..., ak  (11.2.1)
n
f ( a; x )    ˆy( a 0 , a1 ,..., a k , x i )  y i 
2
to minimize (11.2.2)
i 1

We will look at one of the simplest examples dealing with the problem.
Example 11.2.1
Consider the data shown below.
x y
1 3
2 4
3 5
4 6
5 6.2188
Let us assume that we wish to fit a linear polynomial  ˆ(
y x )  a 0  a1 x  using the data. Then the problem can be expressed as
Find  a 0 , a1  (11.2.3)
5
f ( a; x )    a 0  a1 x i   y i 
2
to minimize (11.2.4)
i 1

y x )  2.3125  0.8438x with f ( a )  0.244109 . The result is shown in Fig. 11.2.1.


The solution to this problem is ˆ(
Linear Polynomial Fit
7

6 y = 0.8438x + 2.3125
R² = 0.9668
5

4
y(x)

0
0 1 2 3 4 5 6
x

Fig. 11.2.1 Linear polynomial fit for the least-squares problem


In general, let’s consider a polynomial written as
y  a 0  a 1 x  ...  a k x k (11.2.5)
The solution to problem described by Eqns. (11.2.1)-(11.2.2) when the fit involves a general polynomial expressed as Eqn.
(11.2.5), can be obtained by simply taking the derivative of f ( a ) with respect to a which yields

S. D. Rajan, 2000-24 11-295


R E G R E S S I O N A N A L Y S I S

f n
 2  y i   a 0  a1x i  ...  a k x i k    0
a 0 i 1

f n
 2  y i   a 0  a1x i  ...  a k x i k   x i  0
a1 i 1 (11.2.6)

f n
 2  y i   a 0  a1 x i  ...  a k x i k   x i k  0
a k i 1

These equations can be rewritten as


 n n
  n 
 n x i  x i
k
   yi 
 i 1` i 1`
  a 0   i 1 
 n n n
k 1     n 
  xi x   x i   a1    x i y i 
2


i
(11.2.7)
 i 1` i 1` i 1`
     i 1` 
         
 n n n  a k   n 
 xk   x i 2k   xky 
 i
 i 1`
x
i 1`
i
1 k

i 1` 
 i i 
 i 1`
The coefficient matrix is a Vandermonde matrix and Eqn. (11.2.7) can be expressed differently as
 1 x1 x 12  x 1k   a 0   y1 
    
 1 x2 x 22  x 2 k   a1   y 2 
 (11.2.8)
           
    
 1 xn xn2  x n k  a k   y n 
Or, y  Xa (11.2.9)
Hence the solution to Eqn. (11.2.7) or (11.2.8) is simply
a  (X T X)1 X T y (11.2.10)
In practice, the solution is not obtained as shown in Eqn. (11.2.10) but is solved as (X X)a  X y .
T T

Algorithm
Step 1: Obtain the input data and the degree of polynomial, k, to fit the data.
Step 2: Form the coefficient matrix X and the RHS vector y shown in Eqn. (11.2.8).
Step 3: Solve the system of equations (X T X)a  X T y to obtain a .
Example Program 11.2.1 Least Squares Polynomial Fit Program
We will develop a program to carry out a least squares polynomial fit given a set of data points. The theory and algorithm are
encapsulated in the CLSF class that has a public member function Fit to carry out the least-squares fit.

S. D. Rajan, 2000-24 11-296


R E G R E S S I O N A N A L Y S I S

main.cpp

The client code for solving data from Example 11.2.1 is shown below. We will leverage the tools developed earlier – the CVector,
CMatrix and CMatToolBox classes. CLSF’s default constructor is used in line 29 followed by the call to the Fit function in line
32.

An error message is shown if a fit is not possible. Otherwise, the output contains details of the fit (see next section for details
of the residual and R2 values). The output from the program is shown in Fig. 11.2.1.

S. D. Rajan, 2000-24 11-297


R E G R E S S I O N A N A L Y S I S

Fig. 11.2.1 Output from Example Program 11.2.1

11.3 General Least Squares Fit


The least squares approach shown in the previous section can be generalized to all types of functions, not just polynomials.
Find  a 0 , a1 ,..., ak  (11.2.11)
2
n  k 
to minimize f ( a ; x )    y i   a j F j ( x ) (11.2.12)
i 1  j 0 
As we have done before, we can take the derivative of f ( a ) with respect to a which yields
n 
f  F  
 2  y i   0    0
a 0 i 1   a 0  
n 
f  F  
 2  y i   1    0
a1 i 1   a1   (11.2.13)

n 
f  F
 2  y i   k   0
a k i 1   a1

Consider for example, an exponential function that can be expressed as
y  Ae Bx (11.2.14)
In order to find the values of A and B, the formula can be linearized so that the linear least squares method can be used as
ln y  ln Ae Bx  ln A  ln e Bx  ln A  Bx (11.2.15)
and similar to Eqn. (11.2.8) we have
1 x 1   ln y1 
1 x  a  ln y 
 2 0

2
  (11.2.16)
    a1    
   
1 x n   ln y n 
After solving for  a 0 , a1  , note that A  exp( a 0 ) and B  a1 .
Example 11.3.1
Use the data from Example 11.2.1 to fit an exponential curve.
Using the data we obtain the following.

S. D. Rajan, 2000-24 11-298


R E G R E S S I O N A N A L Y S I S

1 1  1.09861
1 2 1.38629 
   
X  1 3  , y  1.60944 
   
1 4 1.79176 
1 5  1.82759 

0.983709 
Setting up and solving (X T X)a  X T y yields the following solution, a    . Hence
 0.186343 
y  2.67436e 0.186343 x
Similarly, a logarithmic function can be written as
y  A  B ln x (11.2.17)
and the problem can be reduced to
1 ln x 1   y1 
1 ln x  A  
 2   y2 
 (11.2.18)
    B    
   
1 ln x n   yn 
Example 11.3.2
Use the data from Example 11.2.1 to fit a logarithmic curve.
Using the data we obtain the following.
1 0   3 
1 0.693147   4 
   
X  1 1.098612  , y   5 
   
1 1.386294   6 
1 1.609438  6.218876 

 2.827 
Setting up and solving (X T X)a  X T y yields the following solution, a    . Hence
 2.1063
y  2.827  2.1063 ln x
Once the fit is completed, the obvious question to ask is “How good or bad is the fit?” The correlation coefficient (also known
as cross-correlation factor) is a value that describes the quality of a fit for a set of data. In the least square methods, the value of
R 2 is a fraction between 0 and 1 and is unitless. When R 2 equals 0 it means that the curve does not fit the data better than a
horizontal line going through the mean of the data would. When R 2 equals 1 it means that all the points lie on the curve and
there is no scatter.
R 2 is calculated from the sum of the squares of the vertical distance of the points from the best-fit curve, SSreg , and the sum of
the squares of the vertical distance of the points from a horizontal line through the mean of all Y values, SStot . The equations
for SSreg , SStot and R 2 are as follows:
n
SSreg   ( y i  y )2 (11.2.19)
i 1

n
SStot   ( y i  y )2 (11.2.20)
i 1

S. D. Rajan, 2000-24 11-299


R E G R E S S I O N A N A L Y S I S

SSreg
R2  1 (11.2.21)
SStot
where, ˆy i are the y i values corresponding to the best fit curve and y is the mean for the given set of data.

Example 11.3.3
y x )  2.3125  0.8438x . Using the solution
Consider the data analyzed in Example 11.2.1. The solution to the problem was ˆ(
we can construct a new table shown below.
x y ŷ
1 3 3.1563
2 4 4.0001
3 5 4.8439
4 6 5.6877
5 6.2188 6.5315
The R 2 value can be computed as follows.
3  4  5  6  6.218876
y  4.8438
5
n
SStot   ( y i  y )2  (3  4.8438)2  (4  4.8438)2  (5  4.8438)2 
i 1

(6  4.8438)2  (6.218876  4.8438)2  7.36


n
SSreg   ( y i  ˆy i )2  (3  3.156)2  (4  4)2  (5  5.687)2  (6.2189  6.5413)2  0.244
i 1

SSreg 0.244
R2  1  1  0.9669
SStot 7.36

S. D. Rajan, 2000-24 11-300


R E G R E S S I O N A N A L Y S I S

Summary
We looked at the basics of data modeling in this chapter. There is much more to this topic than what has been discussed in this
chapter. Other closely associated topics include statistical analysis and design of experiments. Engineering and the sciences move
forward when experiments are performed to understand a topic that is then usually followed by construction of one or more
models that can then be used in lieu of experiments, as a predictive analysis or design tool.

Where to go from here?


It is important that the reader revisit this chapter after learning how to write C++ programs using more advanced object-
oriented ideas. The ideal time would be after completing Chapter 13.

S. D. Rajan, 2000-24 11-301


R E G R E S S I O N A N A L Y S I S

Exercises
For each of the following problems use the CLSQ class used in Example 11.2.1. It may be useful to modify the class
to accept a pointer to the function(s) that computes the

Appetizers
Problem 11.1
Consider the following data points (2.08, 1.45), (2.30, 2.85), (3.01, 2.15), (4.71, 4.74) and (5.50, 7.73). (a) Fit a polynomial function
and compute the R2 value. (b) Fit an exponential function and compute the R2 value.
Problem 11.2
Consider the following data points (0.5, 4.2), (1.0, 4.4), (3.5, 5.8), (5.5, 7.0), (7.5, 8.6), (8.5, 9.5), (10.0, 10.5) and (12.8, 15.0). (a)
Fit a polynomial function and compute the R2 value. (b) Fit an exponential function and compute the R2 value.
Problem 11.3
Consider the following data points (0.5, 9.7), (1.0, 10.05), (3.5, 10.6), (5.5, 10.8), (7.5, 10.9), (8.5, 11.1), (10.0, 11.15) and (12.8,
11.2). (a) Fit a polynomial function and compute the R2 value. (b) Fit a logarithmic function and compute the R2 value.

Main Course
Problem 11.4
Derive the solution to least squares fit using the function y ( x )  a1 x a 2 by linearizing the function.
Problem 11.5
Using the following data (1, 2.45), (2, 3.08), (3, 3.47), (4, 3.79), (5, 4.05), (6, 4.28), (7, 4.48) and (8, 4.67) fit the function
y ( x )  a1 x a 2 . How good is the fit?

Numerical Analysis Concepts


Problem 11.6
Derive the solution to least squares fit for the following data (1, 6.2), (2, 6.8), (3, 7.3), (4, 7.6), (5, 8.2), (6, 8.5), (7, 8.9), and (8,
9.2) using the function y ( x )  a 0  a 1 x a 2 .
Problem 11.7
In Section 11.2 we saw how to linearize a nonlinear function into a linear form, ŷ  ax  b . For example, y  Ae Bx can be
linearized as ln y  ln A  Bx . Show how you would linearize the following functions.
a0 x a0
(a) y  (b) y  (c) y  a 0 x 1a1 x 2a 2 ....x kak
a1  x a1  x
Problem 11.8
Rewrite portions of the Fit function in Example Program 11.2 using all your developed functionalities in the matrix toolbox.

S. D. Rajan, 2000-24 11-302


R E G R E S S I O N A N A L Y S I S

References
Press, Flannery, Teukolsky and Vetterling, Numerical Recipes in C, Cambridge Press, 1988.
Kottegoda and Rosso, Applied Statistics for Civil and Environmental Engineers, Blackwell Publishing, 2008.
Walpole, Myers, Myers and Ye, Probability & Statistics for Engineers & Scientists, 2002.
Montgomery, Design and Analysis of Experiments, John Wiley & Sons, 2013.

S. D. Rajan, 2000-24 11-303


R E G R E S S I O N A N A L Y S I S

S. D. Rajan, 2000-24 11-304


12
F I L E H A N D L I N G

Chapter

File Handling
“Torturethedata,anditwillconfessto anything.”RonaldCoase

“Youmustbethechangeyouwishtoseeintheworld.”MahatmaGandhi

Almost all computer programs require some input from the user and they generate some output. In programs that are graphical-
user interfaced (GUI), the input is via the keyboard, mouse or even an external file. The output is shown graphically or as a
report either on the screen, or in the printed form, or stored in an external file such that it can be viewed later. There are
innumerable occasions when data need to be transported from one program to another or data created by a program need to
be retained even after the program has finished execution. Typically, the data under these scenarios, are stored in an external file
on a computer’s hard disk. If the contents of the file can be viewed meaningfully by a text editor1 then the files have text data
in them. Otherwise, the files are holding binary data. In this chapter, we will see how to handle files and manipulate data in them
using C++.
Some of the C++ file handling concepts requires an understanding of advanced object-oriented concepts such as inheritance.
However, the value of file handling takes precedence at this stage. We will see the advanced OO concepts, e.g. inheritance, in
Chapter 13 when we will be in a better position to understand all the nuances of the C++ file handling classes.

Objectives
 To understand what are external files.
 To understand how to read the data from an input file.
 To understand how to write data to an output file.

1 Wordpad© or Notepad© are examples of Windows text editors. vi or emacs are examples of Linux text editors.

S. D. Rajan, 2000-24 12-305


F I L E H A N D L I N G

12.1 File Streams


Input and output in C++ are byte-oriented. In other words, a sequence of bytes is read from or written to streams. Fig. 12.1.1
shows the C++ class hierarchy. One can look at class hierarchy as building blocks. The base class is at the foundation level and
other classes (derived classes) have functionalities that are built on top of the base class functionalities. We already have seen the
basic IO class – iostream. This is the class that has been used in most of the prior examples to ‘read the input’ via cin and
‘write the output’ via cout.
ios

istream ostream

ifstream iostream ofstream

fstream
Fig. 12.1.1 Class hierarchy
In this section, we will see how to read data from an external file and how to output information into an external file. Note that
files are either text (meaning that they can be viewed in a text editor such as Windows Notepad etc.) or binary (meaning that
the information is stored as 0’s and 1’s whose meaning can be interpreted by the program reading and writing the data). Text
files are usually be accessed sequentially. Random access can take place with binary files.

12.2 File Input and Output


12.2.1 Opening a text file for reading
A file must be opened first before data can be read from it (or written to it). To open a file, one must know the name of the file.
In other words, a file can be opened for reading provided it is an existing file. Typically, one creates this file using a text editor.
To open a file for reading, we need to include the appropriate C++ header file. The header file and the accompanying statements
are as follows.
#include <fstream>
using std::ifstream;

First, we need to create an object associated with the ifstream class. The general syntax is the usual syntax associate with the
declaration of a class-related object.
ifstream object_name;
For example,
ifstream FileForInput;
FileForInput is the variable or object. Once the object is declared, it can then be used to open the file using the open member
function. For example,
FileForInput.open ("DataSet1.dat", ios::in);
opens a file called DataSet1.dat for reading only. The member function open has two parameters. The first is the character
string that contains the name of the file (e.g., DataSet1.dat) and the second defines the openmode value. The openmode value in
this case is ios::in that indicates that the file exists and is being opened for reading only. Once the file is opened, the references
to the file do not take place with respect to the file name. Instead, the object name is used for the rest of the program as an alias
for the file name.

S. D. Rajan, 2000-24 12-306


F I L E H A N D L I N G

When the file is no longer needed, it must be closed before the program terminates using the close member function. For
example,
FileForInput.close ();
closes the file that was opened earlier.
To read the data from the file, we can use the >> operator. For example, if we wish to read an integer followed by a float from
the file, using the example discussed above, we would have the following statement.
FileForInput >> nIntV >> fFloatV;
where nIntV is an int variable and fFloatV is a float variable.
12.2.2 Opening a text file for writing
A file must be opened first before data can be written to it. To open a file, one must know the name of the file. A file can be
opened for writing so that (a) it appends data to an existing file, (b) it overwrites an existing file, or (c) it creates a new file. The
header file and the accompanying statements are as follows.
#include <fstream>
using std::ofstream;
First, we need to create an object associated with the ofstream class. For example,
ofstream FileforOutput;

declares FileForOutput as an ofstream variable or object. Once the object is declared, it can then be used to open the file. Two
examples are shown below.
FileforOutput.open (DataSet1.out, ios::out);
FileforOutput.open (DataSet1.out, ios::out | ios::app);

In the above examples, ios::out indicates that the file is created if it does not exist or if it exists, the contents are destroyed.
Similarly, ios::out | ios::app indicates that if the file exists, then new information must be appended to the end of the file.
When the file is no longer needed, it must be closed before the program terminates using the close member function. For
example,
FileForOutput.close ();

closes the file. It is possible that you may not see the expected contents in a file if you do not close the file. This is because the
contents are buffered. The contents are put in a special location and are written to the file only if the buffer is full or the close
member function is invoked. The member function flush can be explicitly called to ‘flush the buffer’; flush is automatically
called by the close function.
To write data to a file, we can use the << operator. For example, if we wish to write an integer followed by a float on a single
line, using the example discussed above, we would have the following statement.
FileForOutput << nIntV << " " << fFloatV << "\n";

The following table summarizes the file openmode flags.


Flag Remarks
in Opens the file for reading (default action for ifstream)
out Opens a file for writing (default action for ofstream)
app Appends (at the end) when writing
ate Positions at the end of the file after opening. ate is the short form for at end.
trunc Truncates the contents of the file by removing the current contents
binary For binary (non-text) files.

Tip: Text files are created and accessed sequentially. If a file is being created, the contents are first written to line 1, followed by
line 2, and so on. A new line is created only if the newline character is output to the file. One cannot go back and rewrite a
specific line in a text file. Similarly, if a file is being read, the contents are read sequentially starting at the beginning of the file.
To read a specific line that is not the first line in the file, one must skip the previous lines by reading the contents and discarding
them, before reading the required line.

S. D. Rajan, 2000-24 12-307


F I L E H A N D L I N G

12.2.3 Stream States


Streams have an associated state that identifies if the IO was successful or not, and possibly the reason for the failure. One must
safeguard against error conditions when writing a program. C++ provides several member functions that can be used for error
handling by accessing the state-bit value.
Constant Remarks
badbit Fatal error with undefined state.
eofbit End-of-file was encountered.
failbit An IO operation was unsuccessful.
goodbit Everything is OK. Other bits are not set.

fail: This member function is used to test if the stream operation has failed or not by accessing the failbit or badbit value.
The fail function returns true if the last stream operation was unsuccessful. For example, to see if the open function worked
or not, we could have the following code.
FileForInput.open (DataSet1.dat, ios::in);
if (FileForInput.fail())
{
std::cout << “Could not open specified output file.\n”;
// take appropriate action

}
eof: This member function is used to test and see if the end-of-file condition has been reached by accessing the eofbit value.
A file cannot be read beyond its end. Here is an example code that repeatedly reads an integer from a file until there is no more
input.
int nV;
while (!FileForInput.eof())
{
FileForInput >> nV;
// do what’s appropriate with the input

}
Note that the end-of-file character is a special character that is not visible on most text editors. The computer’s file system
automatically adds this marker to every file.
clear: This member function clears all of the standard input status flags. For example, to be able to read a file once again after
the end-of-file is reached, one can use the code shown below.
FileForInput.clear();

The overloaded form of this function is clear (state) that clears all and sets the state flag.
good: This member function checks to see if the stream is OK by accessing the goodbit value.
bad: This member function checks to see if a fatal error has occurred by accessing the badbit value.
rdstate: This member function returns the currently set flags. The following example checks to see if the failbit is set and
clears it if necessary.
if (FileForInput.failbit())
{
FileForInput.clear (FileForInput.rdstate() & ~std::ios::failbit);
}

Example Program 12.2.1 File Handling


Next, we will see a sample program that illustrates how to read from an input file and how to write a tabular output to an output
file.
Problem Statement: Write a program to read an input file. Assume that the file has an unknown number of lines of input. However,
each line has an integer and 3 real numbers. The integer represents the point number while the real numbers represent the (x,
y) coordinates of and the temperature at that point. While reading the file, create a properly formatted file that has the same
information in an easy-to-read tabular form. Use the sample data file given below to test the program.

S. D. Rajan, 2000-24 12-308


F I L E H A N D L I N G

1 12.0 ‐45.6 33.3


2 16.0 45.6 88.6
3 ‐3.3 ‐4.4 1.1
The source code for the developed program is given below.
main.cpp

In line 13 the required header file <fstream> is included. For opening both the input and output files, a do loop is used that
repeatedly executes until the file is opened successfully. Note the way the input file name obtained from the user is passed to
the open function (lines 37 and 59). The open function does not permit a std::string object; instead it expects to see a char
string. The std::string member function c_str() provides the address of the location where the string is stored. Also note
that the clear member function is called if for some reason the file cannot be opened.

S. D. Rajan, 2000-24 12-309


F I L E H A N D L I N G

The reading of the input file starts with the for loop in line 82. Each input line is read in two parts. First, the point number is
read (line 84) followed by the statement to read the coordinates and the temperature on line 86. After both reads, the end-of-
file condition is checked using the eof member function.

The data is immediately written to the output file (line 89). The formatting statements (using setiosflags and std::setw
functions) are similar to those that we saw before. They ensure that the column headings and the column data are properly
aligned. Finally, the close member functions are called to close both the input and output files (lines 99-100).
Tip: The >> operator will read past leading and trailing blanks as well as newline character to get to the next input. In other
words, one can reform the sample data file for the program as follows.
1
12.0
‐45.6
33.3
2 16.0 45.6 88.6

S. D. Rajan, 2000-24 12-310


F I L E H A N D L I N G

3 ‐3.3 ‐4.4 1.1

Error checking mechanism is not very robust in the program. For example, omitting even a single value will lead to erroneous
results. We will develop a much, much more robust and versatile way of reading a text input file as we progress through this
chapter.

12.3 Advanced Usage


C++ provides a very rich set of functionalities for file manipulation. We discuss some of the more useful features here.

Reading a single character


A single character can be read by using the get member function whose prototype is shown below.
istream& istream::get (char& c);

This function assigns the next character to the passed argument and returns the stream. The state of the stream indicates if the
read was successful. Here is an example.
char c;
FileForInput.get(c);

Reading an entire line


The contents of an entire line can be read using the getline member functions.
istream& istream::getline (char* str, streamsize count);
istream& istream::getline (char* str, streamsize count, char delim);

This function reads up to count‐1 characters into str. However, the reading is terminated if the next character to be read is the
newline character (\n) that is the default or the delim character. The terminating character is not read. The state of the stream
indicates if the read was successful. Here is an example.
const int MAXCHARS = 256;
char szUserInput[MAXCHARS];
FileForInput.getline(szUserInput, MAXCHARS);

Passing a fstream object to a function


A file, once opened, can be used in any part of the program where the file stream object is visible. Sometimes, it may be desirable
to pass the file stream object to a function. Here is an example function prototype.
void StoreMessage (fstream& OutputFile, const std::string& strMessage);

As we have seen before with other objects, stream objects should be passed as references.

Example Program 12.3.1 Reading Text from an External File


In this example we will see how to parse and read an input file.
Problem Statement: Write a program to read an input file called dbfile.dat that has travel-related data for a trucking company. A
sample file is shown below (note two blank lines – lines 8 and 9; city names are one word with blank spaces replaced with _).
Rem: File generated on 15‐Jan‐2005
Phoenix 12‐Jan‐2005 13500
San_Diego 12‐Jan‐2005 13945
Sacramento 14‐Jan‐2005 15456
Rem: File generated on 18‐Jan‐2005
Portland 16‐Jan‐2005 17800
Seattle 17‐Jan‐2005 18400

Rem: File generated on 25‐Jan‐2005


Eugene 20‐Jan‐2005 19500
San_Jose 21‐Jan‐2005 20600

S. D. Rajan, 2000-24 12-311


F I L E H A N D L I N G

Las_Vegas 23‐Jan‐2005 22300


Phoenix 24‐Jan‐2005 23100

A typical line in the file is either a line that starts with Rem: (a remark or comment that should be ignored) or is a blank line (that
should be ignored) or is a data line. The data line contains three fields – the city name, the date of travel and the current odometer
reading.
main.cpp

The fundamental idea in this approach is to read the entire line one at a time and then parse the input based on what we expect
to see on a typical line. Reading one line at a time is done in the function ReadNextLine. The input is read into character string
strInputString using the getline function (line 19). Error checks are carried out immediately (lines 22-28). If an end-of-file
condition is reached, a true value is returned, and reading of the file is terminated. If a fatal error is encountered, the function
throws an exception. Otherwise, a false value is returned to indicate that the end-of-file condition was not detected.

The main program starts by opening the database file, dbfile.txt, a text file for reading. Five variables are used to facilitate
reading the file – lines 47-48 store the data from each line while the contents of line are stored in strInputLine and nLine is
used to track the current line number.

S. D. Rajan, 2000-24 12-312


F I L E H A N D L I N G

The ReadNextLine function is called in the for loop (line 51) until a true value is returned because the end-of-file condition is
detected. It is assumed that each line will not have more than 255 characters. Once the contents of a line are captured in the
strInputLine, we determine if the input line is a remark line or a blank line by checking to see if the first four characters are
Rem: or if the length of the input string is zero, respectively. If both these conditions are not satisfied, then we can assume that
the input line contains data. The istringstream object is then used to read (or parse) the three input fields. First, in line 61, the
contents are copied into the istringstream object, strFormatString. Next, in line 62, the actual reading is done using the now
familiar >> operator. The read contents are displayed on the screen. Finally, if the end-of-file condition was detected in line 53,
the break statement is used to exit the for loop. The error states detected and thrown in the main program and the functions
are caught and the error message displayed in lines 73-89. Since a character string is thrown with each error, the first catch block
is used to display those errors.

Binary File Access


As we saw with text files, access to the contents of the file is sequential in nature. However, there are applications where very
fast access to the data is needed and the data need not exist in the text form. C++ provides functions that can be used to read
and write data at specific locations in a file with the read and write operations taking place in a random order. We will now see
how this can be carried out.
First, we need to open the file (a) for both reading and writing, and (b) in a binary mode. This can be done using the open
member function that we saw before. However, we need to specify additional parameters. For example,
BinaryFile.open (szFileName, std::ios::binary | std::ios::out |
std::ios::in);

where BinaryFile is a fstream object. This form of the open statement opens the file for both reading and writing in a random
manner. Few things to note about random access files.
(1) Data cannot be read unless data is written first to the file.
(2) The file needs to be positioned first before reading from or writing to that location. It is possible to overwrite the
contents at the specified location.
(3) The IO operations take place in bytes using char data type and type casting is done during the function calls.
The following functions are used for reading and writing. The letter g is used to signify get and the letter p is for put.

S. D. Rajan, 2000-24 12-313


F I L E H A N D L I N G

tellg: This function returns the read position.


seekg(pos): This function sets the read position as an absolute value.
seekg(offset,rpos): This function sets the read position as a relative value.
tellp: This function returns the write position.
seekp(pos): This function sets the write position as an absolute value.
seekp(offset,rpos): This function sets the write position as a relative value.
One can use an integral type to represent the file position. The constants std::ios::beg, std::ios::cur, and std::ios::end
signify the beginning, current and end position of the file.
write(char*,n): This function can be used to write a scalar variable or a vector variable of size n bytes. The variable
must be type cast as a char * before it can be written.
read(char*, n): This function can be used to read a scalar variable or a vector variable of size n bytes. The variable must
be type cast as a char* before it can be read.
The example below shows how the type casting can be done.
Rewinding a file
A file can be rewound so that reading the file can once again start from the beginning of the file.
FileForInput.clear (std::ios_base::goodbit);
FileForInput.seekg (0L, std::ios::beg);

Example Program 12.3.2 Working with Binary Files


In this example we will see how binary files can be used in a program.
Problem Statement: Write a program to read point-related data interactively using keyboard input, store (or write) the data in a
binary file, read all the data back and display them on the screen.
We first slightly modify the CPoint class used in Example 9.1.1 by adding one more private member variable – m_strTag that
will store the identification tag associated with a typical point. This variable is a std::string variable and by design will hold a
maximum of 10 characters. In this example, we will not check if the identification tag is unique or not. The modified header
and source files are not shown here.
The binary file is implemented as a fixed-length record file. In other words, it is assumed that each point record has a constant
size. If the size of one record is n bytes, then the first record starts at location 0, the second record at location n-1, and so on.
The sizeof function can be used to obtain the size of a data type or a variable. For example,
int nintSize = sizeof(int);
would return the size of an int for a particular computer system (typically 4 bytes). Similarly,
int nCPSize = sizeof(CPoint);
would return the size of a typical CPoint object. A diagram to help visualize a fixed length record file is shown in Fig. 12.3.1.
n n n
...
1 2 3 m
Fig. 12.3.1 Schematic diagram of a fixed length record file (record length is n, and there are m records currently in
the file)
main.cpp

S. D. Rajan, 2000-24 12-314


F I L E H A N D L I N G

The appropriate header files are shown in lines 10-12. Files that are used for reading and writing must be associated with the
fstream class. This declaration takes place on line 18 and the file is opened on line 19. Note the multiple openmode flags that are
used. The trunc flag is needed if the file does not exist or if the file exists and the contents can be overwritten. The trunc flag
should be omitted if the file exists and the contents need to be reused.

The loop starting at line 29 is used to interactively obtain the point data from the user. The nRecords variable keeps track of
how many point records have been created. It is then used to compute the location on the file where the reading or writing of
the point data is to take place. The corresponding computations are on lines 37 and 47. In line 237, the seekp function is used
to position the file before writing. Next the write function is used to write the data. There are two arguments to the function –
the first argument is const char* type and the second argument is std::streamsize (effectively an integer). The
reinterpret_cast<const char*> type casting is used to convert the data into const char* type before writing.

Similarly, before reading the data back, the seekg function is used to position the file. The read member function is called to
read the data. Once again type casting needs to take place and is done using the reinterpret_cast<char*> construct. Finally,
the file is closed in line 54. One should note that C++ generates instructions to store the data associated with any object in
contiguous memory locations. That is the reason why writing or reading sizeof(CPoint) bytes works properly and one does
not have to write or read the tag, the x coordinate and the y coordinate values individually.

S. D. Rajan, 2000-24 12-315


F I L E H A N D L I N G

The overloaded friend function in the CPoint class is shown below. Line 160 is used to prevent identification tags that are
invalid including longer than 10 characters (CPoint::MAXIDTAG) to be stored.

A sample program execution is shown in Fig. 12.3.2.

Fig. 12.3.2 Output from Example Program 12.3.2


Tip: The above scheme needs to be used with care when writing/reading objects that have a pointer embedded in them and if
the pointer is used to carry out resource management. In other words, not only the sizeof function may not return the correct
length, but all the values may not be written to the disk. That is the reason why in this example, the point identification tag is
stored as a char string not as a std::string. Different std::string objects may have different lengths. Snippets of the code
from point.h and point.cpp files are shown below.

The strcpy_s function is used to copy from a std::string object to a char string variable and back.

S. D. Rajan, 2000-24 12-316


F I L E H A N D L I N G

Example Program 12.3.3 Parsing a Free Format Text File


In the final example in this chapter, we will look at a problem frequently encountered in engineering and science – how to take
data often stored in text files, read them, and carry out selective tasks.
Problem Statement: Write a program to read the data connected with discretization of a planar surface (grey) with triangles as
shown in Fig. 12.3.1, and use the data to compute the surface area (grey).

17 16 15 14 13

7 10
5 12

10.0
4 9
6 8 13
11
18 6 12
5 7
3 4 8 14

2 3 9 15
16
1

2.0
1 2 10 11
8.0 4.0

Fig. 12.3.3. (a) Planar surface (b) Discretized by triangles


Fig. 12.3.4 shows the class diagram used to generate the solution. The workhorse is the C2DSurface class. Using the idea of class
composition, the planar surface data representation is via (a) CPoint objects that store the (x, y) coordinates of a triangle’s vertices
(1, 2, … 18), and (b) CTriangle objects that store the three vertices of each triangle (1, 2, …, 16). We will use CPoint source
code developed from Example Program 7.7.2. The CTriangle is a new class that we will examine briefly here.

C2DSurface
CFileIO

CParser
CPoint CTriangle
CVector
Fig. 12.3.4. Class diagram showing solution used in Example Program 12.3.3
The primary focus will be on the CParser class. Features of this class will be used to read the input file and provide the parsed
data to the C2DSurface data for data validation and storage. The point and triangle data will be stored as CVector objects. We
will start by looking at the main program.
main.cpp

The one and only C2DSurface object is declared in line 17 and the program execution is a simple two step process. In the first
step (line 21), the input file is read. Once it has been successfully read, the next step is to compute the properties of the discretized
planar surface that in this example is to just compute the total area as the sum of the areas of all the triangles. If an error is

S. D. Rajan, 2000-24 12-317


F I L E H A N D L I N G

thrown in either step, it is caught, and a generic error message is displayed. It is expected that a more detailed error message
would be displayed closer to where the error is first detected, e.g., in the C2DSurface class.

The entire input file containing 42 lines is shown in Fig. 12.3.5.

Fig. 12.3.5 Input file test1.dat


A few things to note. First, the entire file is divided into three parts – the first part contains the model size (18 points and 16
triangles), the second part contains the point data, and the last part contains the triangle data. A dollar sign ($) in the first column
indicates a comment line that is ignored. Command lines start with * character, e.g., *model, *point, *polygon, and *end. A
comma (,) is the field delimiter though the program also supports a blank space as a field delimiter. Each point is defined by a
point number, x-coordinate, and y-coordinate. The point data need not be in an ascending order, but the point number needs
to be between 1 and the number of points specified in the *model data. While this version of the program supports only triangles,
one can visualize enhancing the capabilities by supporting other polygons such as quadrilaterals, pentagons, etc. Similar to point
data, each triangle is identified by a triangle number and the point numbers of three points that define the triangle. In the input
file, these points are defined in a counterclockwise sense, though this is not a requirement. The last line in the input file is *end.
The order of the input data (the three parts) cannot be changed.
The header file for the C2DSurface is shown next. Line 16 shows the list of commands including NONE that is used to initialize
the Commands variable. The two public member functions that are called from the main program are declared in lines 18-19. The
rest of the file contains private member variables. Note how all the point data are stored in a CVector object (m_PointData)
where every element of the vector is a CPoint data, e.g., point #6 is at m_PointData(6). Similarly, all the triangle data are stored
in CVector object (m_TriangleData) where every element of the vector is a CTriangle data, e.g., triangle #4 is at
m_TriangleData(6).
The CFileIO class is used to help with opening files for reading and writing, and if needed, for rewinding a file.

S. D. Rajan, 2000-24 12-318


F I L E H A N D L I N G

2DSurface.h

Next, we will look at the ReadData member function in the C2DSurface class. The variable CurCommand is used to hold the current
command handled by the program. Every line in the input file is broken into tokens using the delimiter character and stored as
character strings in strVTokens. For example, if the input line is triangle, 4, 18, 4, 17, then the parser detects 5 tokens
(stored in nTokens) and each token (triangle, 4, 18, 4, 17) are stored as strings in strVTokens[0]…strVTokens[4]. The CParse
object Parse is declared in line 31 with the first argument as the delimiter character(s), the second argument as the character(s)
that start a comment line, and the third argument to specify if all the input is converted to lowercase characters. To store if the
end-of-file has been reached, the variable bEOF is used when the parser is called. The CFileIO object is used in lines 35-36 to
open the input file. In other words, the user is prompted repeatedly for the input file until the OpenInputFilebyName member
function is able to open the file.
Lines 39-41 are local variables that are used to store various pieces of input data – point number, triangle number, x and y
coordinates, and the list of point numbers that define a triangle.

The reading of the input file starts in line 44 and ends in line 125. The parser’s member function GetTokens is called in line 47.
The input argument is IFile (see line 34), and the function returns the current line number, the tokens in the current line
number, the number of tokens in the line, and if the end-of-file was read in the current line. If the current line is empty and has

S. D. Rajan, 2000-24 12-319


F I L E H A N D L I N G

the end-of-file character (line 48), the program exits (line 49) the for loop. The next seven sections of the code (if …else if)
deal with how to handle each command - *model, *point, *polygon, and *end.

As soon as the *end line is read, the break statement (line 68) is used to exit the for loop. Each model data contains the number
of points and number of triangles (lines 72-83). Hence, the number of tokens (nTokens) needs to be two with the number of
points stored at m_strVTokens[0] and the number of triangles in m_strVTokens[1]. To help get the integer values from these
strings, the parser has member function GetIntValue (with return type bool) that can be used. An error is thrown if the return
value is false (lines 76, 79). If the values are legal, memory allocation for the two vectors, m_PointData and m_TriangleData,
(lines 84-85) are made by calling the SetSize member function.

Each line containing point data has three pieces of input (lines 89-98) – the point #, x-coordinate and y-coordinate. Hence, the
number of tokens (nTokens) needs to be three with the point # stored at m_strVTokens[0], x-coordinate in m_strVTokens[1]
and the y-coordinate in m_strVTokens[2]. To help get the integer and float values from these strings, the parser has member
functions GetIntValue and GetFloatValue (with return type bool) that can be used. An error is thrown if the return value is
false. If not, the values are stored in the m_PointData vector (line 99).

S. D. Rajan, 2000-24 12-320


F I L E H A N D L I N G

Reading of the triangle data is done very similarly (lines 103-118). If there are no errors, the values are stored in the
m_TriangleData vector (lines 119-121).

An input line that does not conform to the input file format is detected and dealt with in line 124. Upon successfully reading
the input file, the summary statistics (lines 128-130) are shown on the screen.

Finally, the function that computes the total area is shown in lines 153-160.

Fig. 12.3.6 Program output from Example Program 12.3.3

S. D. Rajan, 2000-24 12-321


F I L E H A N D L I N G

The reader is encouraged to edit the input file, create potentially erroneous input, and see how the program detects and handles
the error.
Finally, it would be helpful to make a list of potential improvements that can be made to the program in terms of program
architecture as well as programmatically.
(1) Fig. 12.3.4 is key to understanding how the key components store and access the data. Duplicate data is stored as the
same CPoint data is also stored in the CTriangle class. While some duplication is healthy, storing several copies of a
piece of data can lead to excessive storage requirements.
(2) This program creates the data but does not edit the data. For example, what would happen to the triangle data if a
point is moved? Deleted?
(3) How easily can the capabilities of the program be enhanced if the surface is discretized with other planar shapes such
as quadrilaterals?
(4) Error checking of the model data can be vastly improved, e.g., have all the points and triangles been defined, are there
any duplicates? are the three vertices of the triangle distinct?
(5) A careful examination of the given data (Fig. 12.3.3(a)) would show that the surface area is
  42  
(10)(14)    114.867 units. Comparing this value with the one shown in Fig. 12.3.6 shows that the
 2 
discretization error is about 2.5%. Can the reader spot the discretization error in Fig. 12.3.3(b)? How can the
discretization error be decreased?

S. D. Rajan, 2000-24 12-322


F I L E H A N D L I N G

Summary
The ability to manipulate information stored in external files opens a lot of doors for us. Rarely do we have a single piece of
software taking care of all our needs. Data needs to be taken as input, manipulated using a numerically-based algorithm, and
finally exported to use by other programs.
In this chapter we saw how files can be opened and closed, how text and binary data can be read from and written to files in a
dynamic manner and how C++ provides functions to ensure that these are taking place in a safe and efficient manner. In later
chapters, we will have several opportunities to use these concepts for solving practical problems.

Where to go from here?


An emerging area that affects both engineering and the sciences is big data. Big data is a term for data sets that are so large or
complex that traditional data processing application software is inadequate to deal with them. Challenges include capture,
storage, analysis, data curation, search, sharing, transfer, visualization, querying, updating and information privacy2. Almost
always the data is available electronically. Fig. 12.1 shows the enormous increase in the global information storage capacity over
the last three decades.3 Engineers and scientists should be ready to handle and manipulate this vast amount of data.

Fig. 12.1 Growth of and digitization of global information-storage capacity

2 https://en.wikipedia.org/wiki/Big_data
3 By Myworkforwiki - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=29452425

S. D. Rajan, 2000-24 12-323


F I L E H A N D L I N G

Exercises
For the following problems, where appropriate use the CVector and CMatrix template classes to store the data.
Design and implement appropriate classes to store and manipulate the data. In other words, you need to implement
an object-oriented solution to each problem.

Appetizers
Problem 12.1
The program is required to create data to plot a cubic polynomial y ( x )  a  bx  cx 2  dx 3 . The user interactively supplies
the values of the 4 coefficients, the starting value of x , the ending value of x and the increment to be used. Write a program
that will create an input file (say, comma separated file) for Microsoft Excel so that the graphing features in MS Excel can be
used. Name this file P12‐1.csv.

Main Course
Problem 12.2
Look at the statistical functions defined in Problem 4.11. Create a template class, CStatPak, that has the functionalities defined in
the problem. Note that the class definition shown below is incomplete.
template <class T>
class CStatPak
{
public:
T StatMean (const CVector<T>& fV);
T StatMedian (const CVector<T>& fV);
T StatStandardDeviation (const CVector<T>& fV);
T StatVariance (const CVector<T>& fV);
T StatCoVariance (const CVector<T>& fVA,
const CVector<T>& fVB);
private:

};

Now write a non-member function (that is called from the main ()) whose prototype is as follows.
int ReadInput (std::ifstream& InpFile, CVector<double>& dVStats);
The function should read the data from the file associated with InpFile and compute the statistical values using the CStatPak
class. The results from the statistical analysis should be stored in the dVStats vector with the mean stored in the first location,
followed by the median, standard deviation and finally variance. The return value is 0 if no error was encountered and 1 if an
error was encountered. The input file has the following format.
Line 1: Number of data values, n
Line 2: First value
Line 3: Second Value

Line n+1: Last value
Carry out sufficient error checks to make this function robust. Create the template statistical functions in file statpak.h and the
non-member functions in file readinput.cpp.
Problem 12.3
Write a program that will analyze experimental data and compare the results with an analytical model. The program will read
data from an input file. It will then filter the data as per the specifications. It will compare the input to the theoretical value and
then create the output file properly formatted. The sequence of events in the program is as follows.
1. Ask the user for the input file name. Open the input file.
2. Ask the user for the output file name. Open the output file.
3. Read the input file one line at a time.

S. D. Rajan, 2000-24 12-324


F I L E H A N D L I N G

4. Filter the data.


5. Create the output file.
6. Terminate the program.
Input File Format: The input file contains the following data in every line of the file obtained from an experiment.
The displacement value (at least one blank space) The load value
Both the displacement and load values are real numbers. An example input file (not real data):
0 10
0.0001 50
‐0.1 200
0.001 1225
0.002 2500
0.004 5700
0.005 6400
0.007 8900

Filtering the Data: The displacement values (in inches) are expected to be between 0 and 10, and the load values (in pounds)
are expected to be between 0 and 10000.
Output File Format: Create the output file that displays the data in tabular form as follows and using the filtered data, compute
the area under the load-deflection curve.
Name:
Date:

Filtered Data
Data Point Load (lb) Experimental Displacement Theoretical Percent Difference
(in) Displacement (in)

Bad Data
Data Point Load (lb) Displacement (in)

Area under the load‐deflection curve = xxxx lb‐in (Load on the y‐axis and deflection on the x‐axis).

Anaytical load-deflection function


The analytical function is as follows
  10 10 (0.09P 3  3.9P 2  5.5P )

where P is the load in lb and  is the deflection in in.

Percent Difference
Analytical  Experimental
%Difference   100
Analytical
Problem 12.4
It is required to write a computer program to compute the amount of earthwork or excavation necessary at a site. The site (in
plan view or x-y plane, Fig. P12.4) is divided into a regular grid with the grid spacing at 3 feet in both directions. The contractor
using surveying techniques has created an input file that contains the (x,y,z) values of these grid points.

S. D. Rajan, 2000-24 12-325


F I L E H A N D L I N G

113 126

16
15 28
x
1 2 14
Fig. P12.4 Sample grid (all grid points are not labeled)
The program flow is expected to be as follows (you must use dynamically defined matrices to store the data).
Ask the user for the input file name. Open the input file.
Ask the user for the output file name. Open the output file.
Read the input file one line at a time and store the data.
Interpolate to fill missing values resulting from bad data (zero elevation). Interpolation is to seek the average value of all adjacent
(good) grid points.
Adjacent points are the immediate next points along the row and column. Write a function to implement this task.
Compute the average elevation and excavation/earthwork quantity.
The volume of a grid of heights h1 , h2 , h3 , h4 can be taken as ( plan area )( havg  hex ) where havg is the average of the four values.
Write a function to implement this task.
Create the output file. Write a function to implement this task.
Terminate the program.
Input File Format: The input file contains the following data on the first line.
Number of grid points in the x direction(blank)Number of grid points in the y direction
The second line contains the excavation elevation value.
Z coordinate representing the excavation elevation (this is hex )
Then it contains the (x,y,z) coordinates of the grid points rowwise.
X coordinate(blank)Y Coordinate(blank)Z Coordinate
X coordinate(blank)Y Coordinate(blank)Z Coordinate
Coordinate values are in feet.
An example of the input file:
5 4
8.1
0.0 0.0 12.1
3.0 0.0 12.5
6.0 0.0 11.9
9.0 0.0 11.5
12.0 0.0 11.4
0.0 3.0 12.6
3.0 3.0 12.4
6.0 3.0 12.3

S. D. Rajan, 2000-24 12-326


F I L E H A N D L I N G

9.0 3.0 11.9


12.0 3.0 0.0
0.0 6.0 12.3
3.0 6.0 12.1
6.0 6.0 12.0
9.0 6.0 0.0
12.0 6.0 12.0
0.0 9.0 12.2
3.0 9.0 12.3
6.0 9.0 12.1
9.0 9.0 11.9
12.0 9.0 11.8

The missing or bad data are highlighted.


Output File Format: Create the output file that displays the data as follows.

Name:
Date:
INPUT DATA
Grid Point X Coordinate Y Coordinate Z Coordinate Status
1 Good
2 Interpolated
3 Good

Average elevation of the site: xxx ft


Excavation elevation: xxx ft

EXCAVATION DATA
Grid Points Excavation (ft^3)
1‐2‐6‐7

Sum xxx ft^3

C++ Concepts
Problem 12.5
Write a computer program to handle a cross-sections database. The database supports circular, hollow circular, rectangular and
hollow rectangular cross-sections. Devise a scheme for identifying each cross-section in the database with a unique identification
tag. The computed properties to be stored in the database include cross-sectional area and the two principle moments of inertia.
Internally, the program should be able to store the data in a pre-defined set of units. However, it should allow the user to select
the units at run time. The computer program should support the following top-level commands.
units: Used to specify the length units.
create: Used to create information in the database.
edit: Used to edit information already in the database.
find: Used to find the cross-sectional properties of a cross-section already in the database.
search: To find the closest cross-section that meets a specified search criterion.
delete: Delete the cross-section from the database.
Create and store the database file as a binary file whose life extends beyond the single execution of the program.
Problem 12.6
C++ does not have a function to check if a file exists. Develop a function with the following prototype.
bool FileExists (const char* filename);
that will return true if the file exists, false otherwise. Will this function work on all operating systems?

S. D. Rajan, 2000-24 12-327


F I L E H A N D L I N G

S. D. Rajan, 2000-24 12-328


C L A S S E S : O B J E C T S 3 0 3

13
Chapter

Classes: Objects 303


“Educationisaprogressivediscoveryofourown ignorance.”WillDurant

“Whenaskedhowmucheducatedmenweresuperiortothoseuneducated,”Aristotleanswered,“Asmuchastheliving
aretothedead.”.” DiogenesLaertius

We were introduced to fundamental object-oriented concepts in Chapter 7. We learnt about classes and objects. We learnt how
to develop and write server code. We also learnt how to develop and write client code where objects would be declared and
manipulated. As we saw in Chapters 8 and 9, object-oriented concepts help us define and use classes in a powerful yet safe
manner. In this chapter, we will first see the process that leads to the development of a computer program. Second, we will see
the concepts associated with inheritance such as polymorphism. What this means is that through the process of data abstraction, we
will be able to enhance already available capabilities from existing classes by building specialized classes on top of them. Finally,
we will see the concepts associated with functors and suggest how they can be used in writing user-defined functions commonly
used in numerical analysis.

Objectives
 To understand the basics of software engineering.
 To understand how to leverage already developed classes and build additional functionality.
 To understand the concepts of polymorphism.
 To understand the concepts associated with functors and use them as a tool in writing user-defined functions for numerical
analysis.

S. D. Rajan, 2002-24 13-329


C L A S S E S : O B J E C T S 3 0 3

13.1 Software Engineering


So far, the programs that we have developed have been comparatively small and easy to manage. The process followed in the
development of the programs was more intuitive than formal. In this section, we will formalize the process - the development
of a complete program should be a well-thought-out process.
Software engineering is the development and study of the principles that enable any organization to predict and control quality,
schedule, cost, cycle time, and productivity when acquiring, building, or enhancing software systems1. Why is software
engineering important? It is important because computer software, like any other product, is useful and expensive to develop.
Hence, it must be designed and developed correctly and efficiently.
One of the most popular object-oriented analysis and design methodologies is the Object Modeling Technique (OMT)
developed slightly differently and independently by Jim Rumbaugh and his associates at General Electric R&D Center, and by
Grady Booch at Rational Inc., CA. The two joined forces at Rational and developed the methodology (or notation) that today
goes by the name Unified Modeling Language (UML). Coverage of UML is outside the scope of this book.
We will next look at some of the major components of software engineering including the role of OMT.
Product Specifications: This is and should be one of very first steps. The document that contains the specifications must be as
complete as possible since the entire software development effort is based on what is contained in this document. It is rarely
possible to arrive at a document that remains unchanged over the course of the project. When changes are made to the
specification document, the reasons for the change should also be recorded. This can be very useful when the software
undergoes periodic maintenance. In the context of software for engineering applications, it is important to include the details
of the engineering theory that the software is supposed to incorporate.
Sometimes, it is as important to include in the specifications what the software will not perform as it is important to spell out
unambiguously, what the software will do. Here are some suggested steps in prescribing the product specifications.
(1) Create a diagram that will show the overall flow of information through the program. Identify the input to the system and
the output created by the system.
(2) Now zoom in on the components from Step (1) and for each component identify the input and output, data storage
schemes, theoretical details, etc.
(3) Specify what are the software requirements in terms of software and hardware. Decide what are the limitations and
assumptions, how exception handling is to be detected and handled.
Scheduling: Most software projects are team driven. To lay the groundwork for meeting the project deadlines, an appropriate
procedure is used to (a) identify the different programming tasks, (b) estimate the amount of time required, and (c) the number
of personnel involved directly or indirectly with the software development task. A software process model is typically used to
control the activities. The scheduling is based on such methods as Program Evaluation and Review Technique (PERT) and
Critical Path Method (CPM). Details of these techniques can be found in a book of software engineering.
Object-Oriented Modeling (OOM) and Design (OOD): Before we build an object-oriented system, we have to identify the classes
associated with the problem. In addition, we also have to define how the different classes interact with each other. One
commonly used approach is to describe the use-cases associated with the problem. A use case is a collection of possible sequences
of interactions between the system under discussion and its external actors, related to a particular goal [Alistair Cockburn, 2000].
In other words, we need to describe how the actors (people, machines etc.) interact with the product to be built. The Class-
Responsibility-Collaborator (CRC) modeling translates the information contained in use-cases into a representation of classes and
their collaborations with other classes [Pressman, 2000]. What does this mean? We will use an example to answer this question
and detail the OOM process.
We will start the process by examining the problem (or project) statement. Objects are identified by listing all the nouns or noun
clauses. If the object is required to implement a solution, then it is part of the solution space.

1 Adapted from Software Engineering Institute (SEI), CMU, Pittsburgh, PA.

S. D. Rajan, 2002-24 13-330


C L A S S E S : O B J E C T S 3 0 3

Case Study: Steel Beam Cross-Section Selection Program


Problem Statement: You are required to design and code a “AISC2 Steel Beam Cross-Section Selection” program. The beam cross-
section is a doubly symmetric I-section (Fig. 13.1.1). The user of the program enters (a) the largest bending moment, (b) the
largest tensile force, and (c) the largest compressive force, the beam is subjected to. The program finds the lightest cross-section
from the available cross-sections database that will meet the strength (stress) requirements. The selection is based on satisfying
the following requirements for maximum compressive and tensile stresses.
Nc M
 max
c
    ac psi (13.1.1)
A S yy
Nt M
 max
t
    at psi (13.1.2)
A S yy
where N c is the largest compressive force, N t is the largest tensile force, M is the largest bending moment, A is the cross-
I yy
sectional area of the cross-section, and S yy  is the section modulus ( I yy is the moment of inertia). An example I-section
z max
(also called wide-flange section denoted with the letter W) is shown in Fig. 13.1.1. There are hundreds of different dimensioned
I-sections that are available for purchase by structural engineers.

(a) W36 x 16.5 x 925 properties (https://www.meeverusa.com/)

(b) These W36x925s are the heaviest hot-rolled wide-flange sections in the world. They are over 43 in. deep with 4.5 in.-
thick flanges and a 3-in. web. This 60-ft-long beam is roughly the size of a humpback whale, weighing approximately
27.75 tons. (https://www.aisc.org/modernsteel/news/2015/may/steel-shots-mega-steel/)
Fig. 13.1.1 Wide-flange beams used in a steel structures

2 American Institute of Steel Construction, http://aisc.org

S. D. Rajan, 2002-24 13-331


C L A S S E S : O B J E C T S 3 0 3

This design problem is typical of engineering design problems – off-the-shelf (OTS) solution or custom solution exists that can
be used in a (numerical) model to check if the required performance requirements are met (see Fig. 13.1.2). Among the very
many tasks in obtaining a realistic solution is one where the challenge is to find a “best” solution.

Model Performance
Requirements

OTS or Custom
Solution
Fig. 13.1.2. Typical solution blocks for an engineering design problem
With reference to the given problem, the OTS solution consists of commercially available wide-flange beams, the performance
requirements are given in Eqns. (13.1.1) and (13.1.2), and the model used to obtain the largest bending moment, the largest
tensile force, and the largest compressive force that the beam is subjected to, is not discussed here (see the author’s book on
structural analysis and design). There are several similar examples – design of an air conditioning system for a room in a building
using off-the-shelf air conditioners, design of a piping network for a small factory using commercially available pipes, etc.
We will see models for solving engineering problems described via partial differential equations in Chapter 15.
Solution: Analyzing the problem statement, we can identify the following nouns or noun clauses.
Beam I-Section User Input
Largest Moment Tensile Force Compressive Force
Lightest Cross-section Cross-sections Database
The next step in the process is to identify the different entities that describe the system. A closer examination of these nouns
will show the following. The I-Section is in fact the beam cross-section that is the end product of the design process. All the
different cross-sections are in a Cross-sections Database. The User Input contains the Largest Moment, Tensile Force and
Compressive Force. Once we recognize how to use the user input, we can generate the criterion to locate the Lightest Cross-
section.
The distinct entities can now be captured in classes. We could use a class CISection to store the properties of a typical I-section
such as the cross-sectional area, moment of inertia, etc. We could store the properties of all the available I-sections using a class
CXSDatabase. The class would allow access to all the individual I-sections, provide the mechanism to add new sections, or to
delete sections that are no longer manufactured. Finally, to allow the selection of the lightest cross-section from the I-section
database, we could define and use CXSSelector class.
Let us review our plan of action to assure ourselves that these classes are the major classes to achieve our objectives. The
program would use the CXSDatabase class to load all the information about existing I-sections. The program would then ask the
user for the input and based on the values compute the ‘smallest’ properties of the I-section necessary to meet the strength
requirements. Then the CXSSelector class would be used to find the lightest I-section (we will assume that the I-section database
is arranged in order of increasing weights). Finally, the answer (the properties of the I-section) will be communicated to the user
using the CISection class. The responsibility of each class is to define what each class is capable of doing – identify the attributes
and behavior through member functions and variables. Sometimes, achieving these objectives requires the help of other classes
– the helper classes. The Class Responsibility Collaborator (CRC) cards describing the classes are shown in Fig. 13.1.3.

Responsibilities: Helpers:
Responsibilities: Helpers: know all I-sections CISection
know properties std::string allow access to individual I-section std::vector
allow access to properties add new sections std::ifstream
remove existing sections
(a) (b)

S. D. Rajan, 2002-24 13-332


C L A S S E S : O B J E C T S 3 0 3

Responsibilities: Helpers:
get an I-section CISection
CXSDatabase

(c)
Fig. 13.1.3 Program classes described using CRC cards
Using the CRC cards as a guide, we can start writing the code by defining the class members and variables. The CISection class
is described first.
isection.h

As usual we define three constructors (lines 13-16), destructor, accessor, modifier, and helper functions. There are five member
variables declared in lines 32 through 36. The identification tag for the section shown in Fig. 13.1.1 is W36x925. While there are
many more properties that are available from the manufacturers and used by design engineers, we store the values of cross-
sectional area, the section moduli about the y and z axes, and the weight per unit length.
Next, we will look at the class that stores the cross-section data – the CXSDatabase class. The database is stored in a std::vector
object m_listofISections declared in line 26. I-sections can be added and deleted one at a time. As the names suggest, the Add
and the Remove modifier functions add one I-section to the cross-section database and remove one section. The member variable
m_nSize stores the current number of I-sections in the database that is accessed via the std::ifstream object, m_IFile. The

S. D. Rajan, 2002-24 13-333


C L A S S E S : O B J E C T S 3 0 3

function GetOne obtains the CISection object that is at the ith location in the database, i.e., is at the ith location in
m_listofISections.
xsdatabase.h

The next class to develop is the CXSSelector class that is used to implement Eqns. (13.1.1)-(13.1.2) using the user-supplied data
and select the lightest I-beam that meets the performance requirements via the public member function GetXSection.
xsselector.h

We seem to have defined all the major classes. But have we? How do we obtain the user input? How do we process the user
input? One way is to define a new class CWizard that would solicit the required input from the user, process the input, and
display the results. This new class is shown in Fig. 13.1.4.

S. D. Rajan, 2002-24 13-334


C L A S S E S : O B J E C T S 3 0 3

Responsibilities: Helpers:
know user input CISection
process user input CXSDatabase
display the results CXSSelector

Fig. 13.1.4 The CWizard class description

Our initial design appears to be complete. Once all the classes are identified and defined, it is necessary to build a blueprint or a
software architecture. The blueprint can then be used by a software developer to “put the pieces together” so as to build and
test the software system.
Tip: Objects such as the different I-sections need identification. Sometimes the identification needs to be unique. For example,
every person in the US has a unique identification that is the Social Security Number. The attributes that are associated with the
identification constitute the key. In other words, the key is the identification tag. The data type associated with the key plays an
important role in the manner in which the object is stored.
In the context of this exercise, we should recognize that there may be tens and hundreds of I-sections. How do we differentiate
one I-section from the next? This is usually done through the section identifier, e.g., W36x300. In the context of the class
definition, the variable m_strID is the key. We will see more about keys and classes later.
Object-Oriented Testing: As we can see even with the simple example discussed in this section, the chances of making errors –
modeling, design and coding errors, are plentiful.
Once the initial blueprint is completed and the coding phase has started, testing should begin immediately. First, the classes can
be tested individually. Then collaborating classes can be tested. In keeping with this trend, subsystems formed by several
collaborating classes can be tested. If it is found that the current state of the model and design is incomplete, incorrect, or
extraneous, the modeling and design may have to be redone.
Example Program 13.1.1 Testing the CISection class
We will develop the first program to test the CISection class.
The listing of main program (client code that carries out unit testing) is shown below. In the include file is used in line 7. Three
distinct I-sections are created as three objects – First, Second and Third by declaring them in lines 11, 14, and 17. The
overloaded constructor is used for the First and Second objects while the modifier function is used for the Third object. To
check if the values in the object are correct, the Display function is used. Finally, the Fourth object is created in line 21 using
the copy constructor. The testing program can be improved since all the functionalities are not being tested – the GetProperties,
the overloaded SetProperties functions are skipped.

S. D. Rajan, 2002-24 13-335


C L A S S E S : O B J E C T S 3 0 3

main.cpp

Example 13.1.2 Testing the CXSDatabase class


Our original class definition needs to undergo a few changes in thinking. Populating the database can be a tedious task since the
database can potentially contain a large number of I-sections. Why not use an input file that has the values for the different I-
sections and use the input file to populate the CXSDatabase object when program execution starts? To read the input from a
file, we define a private variable m_IFile.
std::ifstream m_IFile; // (file) source of database
The constructor is modified to read the data from a predefined file, XSdatabase.dat.
We first look at the default constructor in the CXSDatabase class.

S. D. Rajan, 2002-24 13-336


C L A S S E S : O B J E C T S 3 0 3

A sample database text file (xsdatabase.dat) is shown below containing just three I-section data.
W21X201 59.2 461.0 86.1 201.0
W36X300 88.3 1110.0 156.0 300.0
W44X335 98.3 1410.0 150.0 335.0

In the constructor, the database file is opened, and the data is read one line at a time (line 28). A CISection object is modified
in line 31 with the read data and a new section is added to the database in line 32. We are now ready to test the class.
main.cpp

Note how the CISection class previously developed is included in this test program in line 8. When line 14 is executed the cross-
section database with the 3 cross-sections is loaded into the DBISection object. In line 17, a fourth cross-section is added. In
line 19 the first cross-section is obtained, and the properties displayed in line 20, whereas in line 22, the last cross-section is
obtained, and the properties displayed in line 23. Sample execution details are shown in Fig. 13.1.4.

Fig. 13.1.4. Output from Example Program 13.1.2


Finally, we are ready to test the entire program. We will test the CXSSelector class and the CWizard class via the final program.

S. D. Rajan, 2002-24 13-337


C L A S S E S : O B J E C T S 3 0 3

Example Program 13.1.3 Testing the Wizard


The CWizard class holds all the data associated with the program. The variable m_found is used to store the status of the search
– whether a section has been found that satisfies all the strength requirements or not.
The main program is a very short one. The program is in a continuous loop until the user decides that there is no further input
to the program. The GetandProcessUserInput function calls the GetXSection function in the CXSSelector class. It returns true
if a section is found that satisfies the strength requirements or a false value if one is not found.
main.cpp

A sample program execution is shown in Fig. 13.1.4.

Fig. 13.1.4 Sample session from Example Program 13.1.3 program

S. D. Rajan, 2002-24 13-338


C L A S S E S : O B J E C T S 3 0 3

Observation: Software development is an evolutionary process. The specifications, classes, and program architecture are likely
to change over time. It is important to integrate the testing and prototyping process early in the software development cycle.

13.2 Inheritance
Object-oriented methodology provides building blocks. One such building block process is known as inheritance in which a
new class called the derived (or child) class is created by building that class on top of an existing class called the base (or parent)
class. The derived class then has all the member variables and ordinary member functions from the base class and can, in
addition, define more variables and functions. Let us understand this concept.
In the earlier section, we saw the use of beams with an I-shaped cross-section. Clearly humans have invented other shapes that
can be used. Fig. 13.2.1 shows five common cross-sectional shapes that are used as structural members. All the five cross-
sectional shapes have the same set of (derived) attributes – cross-sectional area, moments of inertia, section modulus, and other
similar properties. However, they are described differently. For example, a circular cross-section (Fig. 13.2.1(b)) is defined in
terms of a single attribute – radius, r whereas a rectangular hollow section (Fig. 13.2.1(c)) is described in terms of four attributes
h, b, th , tb .
z z

y
h y

w (b)

(a)
z z

t
ri
h y
y
tb
th

b
(c) (d)
z

ft

wh
y

wt

fw

(e)
Fig. 13.2.1 Suite of cross-section types (a) Rectangular solid (b) Circular solid (c) Rectangular hollow (d) Circular
hollow (e) Symmetric I-section

S. D. Rajan, 2002-24 13-339


C L A S S E S : O B J E C T S 3 0 3

Inheritance provides the means of enhancing the capabilities of existing classes. If a new class is to be defined and an existing
class is available, the new class can inherit the properties from the existing class. For example, we can define a base class, CXSType,
for all cross-sectional shapes. This class would store and make available such things as the cross-section identification and the
sectional properties. We could then define derived classes for different shapes such as CISection for I-sections and CCircSolid for
circular solid cross-sections. If later, we need to add a new shape – rectangular solid, we could simply define a new derived class
inheriting the properties of the base class, CXSType. The inheritance diagram for the cross-sections is shown in Fig. 13.2.2.

CXSType

CRectSolid CISection CCircSolid

CRectHollow CCircHollow
Fig. 13.2.2 Inheritance diagram
In this example, the base class contains what is generic to all cross-sectional shapes. Base classes are sometimes called abstract
classes since base class objects are usually not constructed. CXSType is an abstract class since in a program we will not define an
object directly tied to this class. However, we will define and use the constructor to store pertinent data associated with the class
– the cross-section identification, the cross-sectional properties such as area and so on, and the cross-sectional dimensions (for
example, height and width for a rectangular section). The base class also provides the generic accessor and modifier functions
for both the cross-sectional properties and dimensions. What the base class cannot provide are the functions for computing
the cross-sectional properties using the cross-sectional dimensions, and displaying the cross-sectional dimensions. This is
because these are dependent on the cross-sectional shape that the base class does not know. Each cross-section has the following
attributes and properties.
(1) An identification tag, e.g., W36x300.
(2) The cross-section area.
(3) The section modulus about the (local) y and z axes.
(4) The number of dimensions associated with the cross-section and the dimension values stored in a vector.
This information can be stored for every cross-section in the base class. However, the base class has no way of knowing how
to compute the cross-sectional properties that must be computed separately for each cross-section type. For the sake of
simplicity, we will ignore the issue of units and assume that the values are stored in consistent units (e.g., inches). The base class,
CXSType, is shown below.
xstype.h

S. D. Rajan, 2002-24 13-340


C L A S S E S : O B J E C T S 3 0 3

Note the use of protected qualifier. Earlier we saw two other qualifiers – public and private. This protected qualifier enables
only the derived classes to access these member variables and functions of the base class. Using private would preclude that
possibility. Using this base class definition, we can define the derived class. The derived class, CISection, is shown below.
isection.h

With this definition, the CISection class inherits all the non-specialized public member functions in the CXSType class such as
DisplayProperties(), and GetDimensions(), and the member variables m_strID, m_fArea, etc. The specialized member
functions of a class such as the constructor, the destructor, the copy constructor and the assignment operator = are not inherited.
A derived class can also have a member function with the same name and argument types as the base class. This is called
redefining3 the inherited member function. In this example, the DisplayDimensions() function is redefined. As we will see
later, the context in which the use (in the client code) takes place determines which function (base or derived) gets called.
Note how the derived class is defined. There is a colon after the derived class name followed by the keyword public and the
base class name. As mentioned earlier, the constructor in the base class is not inherited. When the derived class object is
instantiated, one should instantiate the base class explicitly. If the base class is not explicitly invoked, then the default base class
constructor is called. A code cannot be compiled if the default base class constructor is not defined. Here is the code from the
derived class, CISection, that shows the instantiation of derived class using an overloaded version of the base class. The
statement
CISection::CISection (const string& strID,
const CVector<float>& fV) : CXSType (numISDimensions)

shows how the base class is instantiated using the overloaded CISection constructor. The base class overloaded constructor is
called first followed by the derived class constructor.
The design of this class is such that the cross-sectional properties are computed as soon as the cross-section is defined in terms
of its ID and its dimensions. This design construct is the reason why the default constructors are not provided in the derived

3 There is a difference between redefining and overloading. If there are two or more functions with the same name in a derived class, then the

functions are overloaded. Similarly, if there are two or more functions with the same name in a base class, then the functions are overloaded.
If there are functions in the base class and derived class with the same name but different parameters, then the base class function is redefined.

S. D. Rajan, 2002-24 13-341


C L A S S E S : O B J E C T S 3 0 3

classes since there is no way to call the derived class’s function to compute the cross-sectional properties. In the next section,
we will see how this potential problem can be avoided.
One should remember that if class B is derived from class A, then instantiating a class B object would involve class A constructor
being invoked first followed by class B constructor. In the same vein, if object B goes out of scope, the destructor for class B is
called first followed by the destructor for class A.
In the following example we show how the cross-sections inheritance concept is used in a client code.
Example Program 13.2.1 Using inheritance with cross-sectional shapes
Develop a program that has a base class for storing data connected with various cross-sectional shapes. Build two derived classes
to specifically store data connected with the I-section and the rectangular solid section.
We will first look at the base class functions.
xstype.cpp

Note how the base class provides several functionalities that are common for all possible cross-sectional shapes. There are two
constructors that initialize the values of the cross-sectional dimensions and the ID. The overloaded constructor also allocates
memory (line 18) to store the values of the cross-sectional dimensions.

The DisplayProperties() member function displays the values of the cross-sectional properties that are common to all cross-
sections. However, the DisplayDimensions() function displays an error message since the base class does not know the meaning
of these dimensions. It would be inappropriate to call this function. The derived classes’ DisplayDimensions() function should
be called. The GetProperties and GetDimensions functions are not shown. They provide access
Next, we show the sample client code.

S. D. Rajan, 2002-24 13-342


C L A S S E S : O B J E C T S 3 0 3

main.cpp

The program illustrates the use of public inheritance with the base class CXSType and the derived classes, CISection and
CRectSolid. The base class is purely abstract – there is no object in the program that is associated with the base class! When the
object ISection1 is declared (line 23), it inherits all the data and the attributes of the CXSType class in addition to the additional
data and attributes, if any, of the CISection class. The DisplayProperties() functionality is provided by the base class while the
DisplayDimensions() is provided by each derived class.
Access to Base Class’s Functions
In lines 24 and 32, the derived class’s version of DisplayDimensions is called. What if we wish to call the base class version for
some reason? We can if we use the scope :: operator. For example, if we replace line 24 with
ISection1.CXSType::DisplayDimensions();
would call the base class’s (CXStype) version of DisplayDimensions function.
Other Type of Inheritances
Protected and Private: In the previous example, we saw the derived classes CISection and CRectSolid publicly derived from the
CXSType base class.
class CISection: public CXSType
{

};

If on the other hand, we have the following declaration

S. D. Rajan, 2002-24 13-343


C L A S S E S : O B J E C T S 3 0 3

class CISection: protected CXSType


{

};

then the public members in the base class (CXSType) are protected in the derived class. If the inheritance is private, then all the
members of the base class are inaccessible in the derived class. The protected and private inheritances are seldom used in
practice.
Multiple: A class may be derived from more than one base class. We saw an example of this in Chapter 12 when we looked at
C++ file stream classes. For example, the iostream class is inherited from two classes – istream and ostream. Here are the
class definitions.
class istream // handles input
{

};

class ostream // handles output


{

};

// iostream handles both input and output


class iostream: public istream, public ostream
{
….
};

The general syntax is as follows.


class classname: i_mode base_classname1, i_mode base_classname2, …
In the above syntax i_mode is the inheritance mode – public, private or protected. As usual, if i_mode is not specified, the
inheritance is assumed to be private. There is no limit to the number of base classes from which a class can be inherited. The
same inheritance rules as we saw for a single inheritance apply to multiple inheritances. Multiple inheritances can lead to
confusion over member function and variable names from different base classes. Almost always, multiple inheritances can be
avoided through other means.
So how has inheritance helped? It has helped in several ways. First, it has forced us to think differently. We do not think of an
I-section as being completely different than a rectangular solid section. We recognize that that are features that are common to
all different sections and there are features that are unique to each section. Second, it helps us build a hierarchy starting with the
definition of the base class that provides the gateway to all the features that are common to all the different cross-sectional
shapes. This avoids unnecessary code duplication and maintenance. Third, it provides a mechanism to define, store and
manipulate information that is unique to each cross-section.
Finally, let us examine what is necessary to define a new derived class from CXSType.
 The number of distinct cross-sectional dimensions.
 The formulae to use these cross-sectional dimensions to compute the cross-sectional properties.
 Constructor to allocate memory to store the dimension values.
 A function to display these cross-sectional dimensions.
What happens if the derived class does not provide some of these member functions? Can we streamline the inheritance
mechanism to make the class more useful and avoid some of the pitfalls? We will see how this is done in the next section using
the concept of polymorphism.

13.3 Polymorphism
In the previous section, we saw an example of inheritance with cross-sectional shapes. We asked the question as to what would
happen if (a) the client code does not know ahead of time what specific cross-sectionals shape are going to be used during a

S. D. Rajan, 2002-24 13-344


C L A S S E S : O B J E C T S 3 0 3

program run, (b) new cross-sectional shapes are to be added, and (c) the process must be robust such that undefined situations
are detected? The answer lies in the idea associated with polymorphism.
Polymorphism is the ability of different objects to respond in their own way to the same message by means of virtual functions
or late binding or dynamic binding. In the context of the cross-sectional shape example, we want the program to automatically
call the function that would compute the cross-sectional properties for any cross-sectional shape. In other words, we want the
appropriate ComputeProperties() to be called when that function is invoked. We can achieve this objective by declaring the
ComputeProperties() function as a virtual function. When the C++ compiler encounters a virtual function, it generates
instructions to call the appropriate version of the function based on the instance of the object that is encountered during
program execution. For example, if a CRectSolid object is created during program execution and a call is made to
ComputeProperties, then CRectSolid’s version of ComputeProperties is called.
Example Program 13.3.1 Using inheritance with virtual member functions
Rewrite Example Program 13.2.1 using CXStype as an abstract base class but with DisplayDimensions and ComputeProperties
as virtual functions.
As we will see, the changes that we need to make to the program from the preceding section are small. We first start making
changes to the base class. Only the changes in the base class are shown.
xstype.h

There are two changes. We declare the DisplayDimensions and the ComputeProperties functions as virtual functions. The
virtual keyword precedes the function declaration. If a function is declared to be virtual and the function is redefined in the
derived class, then for any object associated with the derived class the derived class version of the virtual function is used not
the base class version. Both these functions are declared as public functions so that they can be called directly by the client code.
Since in this example, the use of the base class functions is not desirable, we have the following error messages embedded in
those two functions.
xstype.cpp

We will illustrate the changes that we need to make in the derived class by looking at the CISection class definition.
isection.h

While not required by C++ standards, we have declared the two functions as virtual functions just to remind ourselves that
these are virtual functions. C++ requires that the virtual keyword be used only in the base class. Note that both these functions
are declared as public functions so that they can be called directly by the client code. The virtual keyword should not be used
in when defining the body of the member function.

Finally, we do not have to make any changes to the client code (main.cpp) at all. In fact, this version of the program is more
robust!

Pure Virtual Functions


In the previous example, the base class, CXSType defined two virtual member functions - DisplayDimensions and
ComputeProperties. However, these functions could function meaningfully since the derived classes provided the required
functionalities as the base class knows neither the details about the cross-sectional dimensions nor how the cross-sectional
properties are computed from the cross-sectional dimensions. These base class functions merely print error messages. These

S. D. Rajan, 2002-24 13-345


C L A S S E S : O B J E C T S 3 0 3

base class functions are called only if the derived class fails to provide its own version of these functions. C++ provides a
mechanism to force the derived class to provide its copy of these functions. If the base class declares these functions to be pure
virtual functions, the derived class then must define its version of these functions. To define a member function as a pure virtual
function in the base class, one must define the function as follows.
virtual return_type functionname (….) = 0;
A base class containing one or more pure virtual functions is called an abstract class since an object cannot be directly associated
with an incomplete class. Similarly, a derived class is also abstract if it has one or more pure virtual functions. We will now
illustrate the definition and use of pure virtual class using the same cross-sections example.

Example Program 13.3.2 Using inheritance with pure virtual member functions
Rewrite Example Program 13.3.1 using CXStype as an abstract base class but with DisplayDimensions and ComputeProperties
as pure virtual functions.
Once again, the changes are small. First the changes in the base class header file are small.
xstype.h

Next, we will then delete the bodies of the two member functions (DisplayDimensions and ComputeProperties) from the base
class (xstype.cpp). No changes need to take place in the code associated with the derived classes. We finally have a program
that is just about right!

Pointers and Virtual Functions


C++ provides more powerful features with inheritance and virtual functions. Consider, for example, the problem of storing
and manipulating beam cross-sectional data. Let us assume that (a) the I-sections are made of steel and the Rectangular Solid
sections are made of wood, and (b) an additional member variable is necessary to define a rectangular solid section – moisture
content (% units). This is the maximum permissible moisture content in the wood for use as a structural member.
class CRectSolid: public CXSType
{
static const int numRectDimensions = 2;
public:
CRectSolid (const std::string& id, const CVector<float>& fV,
const float fMC);
CRectSolid (const CRectSolid&);
~CRectSolid ();
// helper functions
virtual void DisplayDimensions ();
virtual void ComputeProperties ();
float GetMC ();
void SetMC (const float moisturecontent);

private:
float m_fMC; // moisture content
};

We have seen that every I-section or Rectangular Solid section is a cross-section derived from the base class. If we have the
following statements in a program, the program compiles fine.
CVector<float> fVDim(2); fVDim(1) = 4.0f; fVDim(2) = 2.0f;
CXSType xsection;
CRectSolid rsolid ("R2x4", fVDim, 10.5f);
xsection = rsolid;

It is perfectly valid to assign a derived class object to a base class object4. However, there is slicing problem. The information that

4It should also be noted that a derived class object cannot be assigned to a base class object. For example, the following statement will not
compile.
rsolid = xsection;

S. D. Rajan, 2002-24 13-346


C L A S S E S : O B J E C T S 3 0 3

is available only in the derived class cannot be stored and made available in the base class. In other words, the moisture content
value is not available (sliced off the derived class data). A statement such as
std::cout << "Moisture content is " << Xsection.GetMC() << "\n";
will not compile since the GetMC() function is not visible to the base class object.
C++ provides a solution to this problem – manipulating derived class information via a base class object. For example, if we
change the above code as follows then the statements compile and execute correctly.
CVector<float> fVDim(2); fVDim(1) = 4.0f; fVDim(2) = 2.0f;
CXSType *pXsection;
CRectSolid *pRsolid = new CRectSolid("R2x4", fVDim, 10.5f);
pXsection = pRsolid;

std::cout << "Moisture content is " << pXsection‐>GetMC() << "\n";


….
delete pRsolid;

Finally, a point about destructors. When inheritance is used, it is a good idea to define virtual destructors. In the above code,
when the statement
delete pRsolid;
is executed, the destructor of class CRectSolid is called. If on the other hand the code had been written as follows
CVector<float> fVDim(2); fVDim(1) = 4.0f; fVDim(2) = 2.0f;
CXSType *pXsection = new CRectSolid("R2x4", fVDim, 10.5f);

std::cout << "Moisture content is " << pXsection‐>GetMC() << "\n";


….
delete pxsection;

then the last statement would call the destructor of class CXSType. Since the destructor of class CRectSolid is not called, this
process can be dangerous especially if resource allocation and deallocation takes place in the CRectSolid class. All the derived
class destructors can be automatically called if the destructor for the base class is declared as a virtual destructor. In other
words, if a base class destructor is tagged virtual, then the derived class destructor is called first followed by the base class
destructor. It should also be noted that if the destructor for the base class is tagged virtual, then automatically destructors for
the derived classes are tagged virtual. This scheme provides a very nice approach to information management as we will later
see in the example in this section.
Example Program 13.3.3 Using inheritance with pure virtual member functions and virtual destructors
Develop a program that demonstrates how to dynamically generate, store and manipulate information dealing with cross-
sections. The type of the cross-section and its associated data are known only a runtime. The design should be such that adding
new types of cross-sections should have a well-defined easy and robust path.
The solution is built on all the discussions we have had in preceding sections. We explain the solution by showing the developed
source code.
xstype.h

Compared to the earlier uses of the CXType class, there are no changes to the definition of the base class except to declare the
destructor as a virtual destructor. This has major implications as we discussed earlier.
xstype.cpp
Compared to the earlier uses of the CXType class, there are no changes to the definition of the base class.

S. D. Rajan, 2002-24 13-347


C L A S S E S : O B J E C T S 3 0 3

rectsolid.h

Compared to the earlier uses of the CRectSolid class, there are a few changes. We have added a private member variable, m_fMC
to store the maximum moisture content. To handle this extra member variable, changes are made to the overloaded constructor
(lines 14-15) and two new member functions are declared (lines 20 and 21).
rectsolid.cpp

Compared to the earlier uses of the CRectSolid class, there are a few changes. These relate to the new member variable, m_fMC
that is handled by the overloaded constructor and two new member functions.

S. D. Rajan, 2002-24 13-348


C L A S S E S : O B J E C T S 3 0 3

main.cpp

This client code merits a close look. We use the CVector class as a container. The basic idea is to use pointers to store the address
of each cross-section as they are defined during execution in a vector (line 17). As we saw earlier in this section, base class
pointers can be used to manipulate derived class objects. In this sample program we define only two cross-sections. There is no
reason why the size of the vector cannot be set dynamically as we have done before in earlier chapters using the SetSize member
function. Line 26 shows how memory is dynamically allocated to store one object of type CISection. After that point, the
program is similar to the previous version. Line 34 shows another dynamic memory allocation to store another cross-section
object of type CRectSolid. Finally, note memory deallocation must take place (to avoid memory leak) and statements in line 46
achieve this objective. Lines 41 and 43 are explained next.

Using Run-Time Typing Information


In the example we saw how a vector of base class pointers can be used to hold the address of derived class objects created at
run time. What can we do if at run-time we need to know the type of the object so that an appropriate action can be taken? For
example, in line 42 a message is displayed stating that a rectangular solid section is being deleted. C++ provides a casting operator
known as dynamic_cast that can be used as follows to handle the scenario.
The template argument is a type, and the argument is a pointer or a reference. In the above example, the type is CRectSolid* (a
pointer to CRectSolid) and the argument is the base class pointer (CXSType*). If the type of the parameter value matches the
template argument type, the value is returned; if the value does not match, a nullptr is returned. In the above example, if
pVXSections(i) does not hold the address of a rectangular solid section (CRectSolid*), then the returned value is nullptr and
the if statement evaluates as false. This type of type casting is known as upcasting since the conversion moves up the class

S. D. Rajan, 2002-24 13-349


C L A S S E S : O B J E C T S 3 0 3

hierarchy (from the base class to a derived class). It should be noted that if a reference is used (instead of a pointer), the casting
failure results in a bad_cast exception rather than a nullptr pointer.
C++ also provides a typeid operator that can be used to obtain the type information. For example, this operator takes as the
argument either an expression or a class name and returns an object of type type_info. Here are a couple of examples to
illustrate the usage.
#include<typeinfo>

CRectSolid* pR1 = new CRectSolid ("R20x10", fV, 10.5f);


std::cout << "Typeid information: " << typeid(pR1).name() << "\n";
std::cout << "Typeid information: " << typeid(*pR1).name() << "\n";

The program displays the following output.


Typeid information: class CRectSolid *
Typeid information: class CRectSolid

The typeid operator can also be used to test or compare the typeinfo value.
#include <typeinfo>
CRectSolid* pR1 = new CRectSolid ("R20x10", fV, 10.5f);
if (typeid(pR1) == typeid(CRectSolid *))
std::cout << "Cross section is a rectangular solid.\n";

Note that
if (typeid(pR1) == typeid(CRectSolid *))
is not the same as
if (typeid(pR1) == typeid(CRectSolid))
One should be cautious in defining virtual member functions. If design requires the use of virtual functions, then go ahead and
use them. Otherwise, do not. There is an overhead associated with the usage of virtual functions. This overhead can sometimes
slow down the execution of a program substantially. C++ compilers create a virtual function table for any class that uses virtual
functions. The address of each of these virtual functions is stored in the table. This table management can be quite involved if
multiple inheritances are involved, and member functions are overridden.
Nested classes
A class can be defined inside another class as long as there is no ambiguity. For example, the following class definition is not
valid since class CA refers to class CB that is private.
class CA
{
float m_fX; // by default, private
class CB { }; // by default, private
class CC // by default, private
{
CB m_b;
};
};

Similarly, the member variable m_fX cannot be used inside class CB or class CC since once again, m_fX is a private member variable.
In the above example, class CA is the enclosing class and classes CB and CC are nested classes. Nested classes are independent
classes. An object of the nested type does not have members defined by the enclosing class. Conversely, an object of the
enclosing class does not have members defined by the nested class. Here is an example that is syntactically correct.
class A // enclosing or outer class
{
private:
float m_fX;
class B { }; // private nested or inner class
public:
class C // public nested or inner class
{
void XYZ (int n); // private by default

S. D. Rajan, 2002-24 13-350


C L A S S E S : O B J E C T S 3 0 3

…..
public:
void ABC (int n);
};
};
A nested class can be either public or private. If it is public, then it can be used outside the enclosing class. Using the above
example, one can declare a variable as
A::C aCobject;
Nested classes are typically used to avoid naming conflicts – to place the name of a class inside the scope of another class so
that two classes can contain another class inside them with the same name and having different functionalities.
Example Program 13.3.4 Nested Classes
Develop a program to store basic material properties and allow for conversion of values from one set of units to another (SI
and US Customary units). Assume that the SI-related units are kg, m, s, and the USC-related units are slug, ft, s.
We will define a base class CMaterial designed to have member functions and variables to store the material data. We will derive
two classes CMaterialSI and CMaterialUSC to define objects to store material properties in SI and US Customary units
respectively. In each of these classes we will define a nested class CConvert that will convert the material values from SI to USC
units and back.
material.h

Lines 11 through 39 define the base class. There are four member variables – name of the material, mass density, yield strength
and modulus of elasticity.

S. D. Rajan, 2002-24 13-351


C L A S S E S : O B J E C T S 3 0 3

The derived classes have a nested class in them – CConvert that has a member function GetValue. The GetValue function is
designed to convert the value from one set of units to another. For example, the GetValue function defined within the
CMaterialSI::CConvert class is designed to convert the values from SI to USC units. The static member variable m_dVConvert
stores the conversion factor from SI to USC units for each one of the material properties. A member variable with the same
name is also defined in the CMaterialUSC::CConvert.

Since the member variable is static, the vector is initialized at the top of material.cpp file. Select functions are shown below.
material.cpp

main.cpp

A CMaterialSI object (for widely used steel material) is defined in line 23. The current values are displayed in line 24. An object
to facilitate the conversion of units is defined in line 19. Note how the scope operator :: is used as well as the fact that the

S. D. Rajan, 2002-24 13-352


C L A S S E S : O B J E C T S 3 0 3

definition is possible since the nested class is declared public. Another object storing values for steel in USC units is defined in
lines 27-29. Note how the object to convert the units is used in conjunction with the GetValue member function. The reverse
process is then defined in lines 33 through 35 to now define a clone of the original (steel) object. We check our program by
making sure that execution of the Display function in lines 30 and 37 displays the same values.

13.4 Function Pointers and Functors


Function pointers are variables that point to the address of a function. In other words, they are like other pointer variables
except that they point to the memory address where a function is stored in memory. We first looked at function pointers in
Chapter 6 (Section 6.4). Here are two examples that show how function pointers are defined and declared.
void (*ptQuadraticRoots) (float a, float b, float c, float& r1, float& r2);
int (CMyMath::*ptLinearInterp) (float m, float c, float x, float& fValue);

Note that the names of the variables in the two examples are ptQuadraticRoots and ptLinearInterp. Once the function pointer
variables are declared, they can be used just like any other variable.
Example Program 13.4.1
In this example we will illustrate how non-class member pointer variables can be used.
main.cpp

In lines 10 and 11 we define two simple functions completely. Both these functions return a float value and accept two float
variables as function arguments. However, their names and functionalities are completely different. In lines 13 through 27 we
define a function, Use_In_A_Function, that has a single argument – a pointer to a function that returns a float value and accepts
two float variables as the function arguments.

S. D. Rajan, 2002-24 13-353


C L A S S E S : O B J E C T S 3 0 3

We start illustrating the usage of the function pointer variables in the main program.

In line 33 we define a variable called ptAddFunction that is a pointer to a function that returns a float value and accepts two float
variables as function arguments. The variable is initialized to nullptr. In line 36, the variable is given a value, the address of the
function AddSimpleTwo that was declared and defined in line 10. The function is invoked in line 39 with the arguments as 2.3
and 3.4.
How the variable can be used as an argument in a function call is shown in line 42 where the function Use_In_A_Function is
called. More function pointer usage is shown in that function. In line 16, the == equality operator is used to check if the variable
is initialized with an appropriate address. In lines 13 and 15 the same == operator is used to check what function is passed as the
argument. Finally, line 26 shows how the variable is used to invoke the function similar to the usage in line 39.

Example Program 13.4.2 Function Pointers


In this example we will illustrate how class member pointer variables can be used using essentially the same example as in 13.4.1.
Three source code files are used. The class containing the two add functions is called CMyAddLibrary. They are shown first
followed by the main program.
myaddlibrary.h

S. D. Rajan, 2002-24 13-354


C L A S S E S : O B J E C T S 3 0 3

The major difference in the declaration of the member function Use_In_A_Function is the use of the scope operator :: to
qualify the the member pointer variable is associated with the CMyAddLibrary class (line 12).
myaddlibrary.cpp

Once again we point the major differences in the definition and the usage. Lines 29 and 31 must now contain the class name
(CMyAddLibrary::) so that the scope of the usage is clear. In line 35, the appropriate member function from the class can be
invoked only by using (*this.*ptrFunc) qualifier.
main.cpp

There are several changes to bring in the class (object-oriented) associated usage. As before, we declare a pointer variable to a
function in line 14; however we need to add the class qualifier (CMyAddLibrary::). In addition, we need an object associated
with the class – this is specified in line 15. Note that lines 12, 15 and 18 are essentially the same as before except that we now
need the class qualifier. In lines 21 and 22 we show how the variable value can be reassigned and how the variable can be reused.

S. D. Rajan, 2002-24 13-355


C L A S S E S : O B J E C T S 3 0 3

Functors
Function pointers can be used in the context of callback functions. A callback function is a function that is called using a function
pointer. We have already seen callback function usage not only in this section but also in Chapter 6. A Function Object, or
Functor is simply any object that can be called as if it is a function. Functors can encapsulate function-pointers in C and C++
using templates and polymorphism. Thus, you can build up a list of pointers to member-functions of arbitrary classes and call
them all through the same interface without bothering about their class or the need of a pointer to an instance5. However, all
the functions must have the same return-type and calling parameters. Sometimes Functors are also known as Closures. Functors
can also be used to implement callbacks.
We start the implementation by first defining a base class FunctorABC that provides a virtually overloaded operator () with
which one will be able to call the required member function. From the base class we derive a template class that is initialized
with a pointer to an object and a pointer to a member function in its constructor. The derived class overrides the operator ()
of the base class. In the overridden versions it calls the member function using the stored pointers to the object and to the
member function. Here is an example that illustrates the ideas.

Example Program 13.4.3


We will create our own math library that has two functions that (1) accept 2 numbers as input (say, a and b ) and return a
value, and (2) supports the following computations using a and b as (i) function Simple2 that computes ( a  b ) , and (ii)
function Square2 that computes ( a 2  b 2 ) .
We first start by defining an abstract base class FunctorABC that provides a virtually overloaded operator () with which one will
be able to call the required member function.
functorABC.h

Note that the class is a template class with Type being int, float , double etc. The () operator is overloaded and is a pure
virtual function.Next, we define our math library.

5 Lars Haendel, “Introduction to C/C++ Function-Pointers, Callbacks and Functors”, www.function-pointer.org.

S. D. Rajan, 2002-24 13-356


C L A S S E S : O B J E C T S 3 0 3

math2lib.h

Our math library is a template class derived from the abstract base class, FunctorABC. The class object is invoked (see line 7) by
specifying one of the math functions in the library and the data type associated with the computations, and the constructor
(lines 12 and 13) takes a pointer to an object associated with one of the math functions and a pointer to the member function
associated with the math function that actually carries out the computation. For example
CMath2Lib<MathFunction, DataType> object(&MathFunctionObject, &MemberFunction);

Or CMath2Lib<CAddSimple2<int>, int> AS2(&CAddSimple2<int>, &CAddSimple2<int>::AddThem);


The two functions in the library are defined in lines 25 though 37. Each one has a public member function AddThem that
implements the actual computation associated with the function. These member functions are never called directly. They are
invoked via the overloaded operator (lines 16 and 17).
main.cpp

S. D. Rajan, 2002-24 13-357


C L A S S E S : O B J E C T S 3 0 3

The overall process of using the functors is a 4 step process with the first 3 steps being used to set up the usage and step 4 being
the actual use. In the first step we instantiate objects associated with the two functions (lines 7 and 8) in the math library. The
next step involves instantiating the objects, the functors that will be used in invoking the two functions. To make this invocation
simple, we need the third step where we define the abstract base class array where each element in the array contains the address
of the pointer objects that can be used to invoke the individual functions in the library. From this point onwards, the math
functions can be invoked simply by using the elements of this array as seen in lines 25 and 27.

It may be worthwhile to look at the Example13_4_4 project that illustrates how user-defined functions can be integrated in
programs using numerical solution techniques with the Newton-Raphson root finding technique as an example. The example
is not discussed here since the next section presents a more comprehensive example – development of a

13.5 Numerical Toolbox with Exception Handling


Error handling in any program that performs useful tasks requires careful planning. As we saw in Chapter 9, C++ divides errors
into two categories – logic errors and runtime errors. In addition, it should be noted that these errors can arise in three different
codes – C++ functionalities and libraries, in third-party code, if used, and in the user-developed code. As a code developer, one
would like to know the exact source (location) of the error. While a good debugger is immensely useful to detect this source, in
a complex program, the distance between the code where the error occurs and the code where the error handling should be
implemented, could be large – the call stack that one can see in the debugger shows how many function calls are active at the
instant the error occurs. Modern programs rely on C++ exceptions for a variety of reasons6.
(1) The try‐catch block is necessary to catch and handle the error. As we saw in Chapter 9, exceptions that are thrown
but not caught, results in abnormal program termination. If an exception is thrown in the try block, it is caught by
the nearest catch block with the corresponding type. If no corresponding catch block is found, std::terminate is
invoked resulting in an abnormal program termination.
(2) When an exception is thrown, execution is transferred to the function that is designed to handle the error. In programs
where error codes are used, control is transferred one function at a time as each function returns its own error code
to the calling function.
(3) If a code is well designed, no resource leak takes place as the “exception stack-unwinding mechanism destroys all
objects in scope according to well-defined rules after an exception is thrown.”

Some of these ideas are illustrated in the next example program. As the last example in the last chapter that deals primarily with
modern C++, most previous examples are updated and brought under a common framework with the intent that the reader
will spend time enhancing the features so that the developed code will be useful for engineering and scientific applications.

Example Program 13.5.1


We will create and test a numerical functions library with two functionalities – using Newton-Raphson method to compute one
of the roots of a polynomial, and computing numerical derivative using Ridder’s method. The program architecture is shown
in Fig. E13.5.1(a). Two features will be the primary driving forces – error handling via C++’s exception handling mechanisms
and easy extensibility of the numerical analysis capabilities.

6 https://docs.microsoft.com/en-us/cpp/cpp/errors-and-exception-handling-modern-cpp?view=vs-2017

S. D. Rajan, 2002-24 13-358


C L A S S E S : O B J E C T S 3 0 3

Unit Testing

std::exception

CNFToolBox
Newton-Raphson
Numerical Differentation CArrayBase
....
CVector CMatrix

Fig. E13.5.1(a) Program architecture


We will divide the discussions into three parts.
Array Handlers: The CVector and the CMatrix classes are derived from CArrayBase class that not only stores the common
variables associated with the two classes but handles all errors thrown by the two classes (Fig. E13.5.1(b)).

Fig. E13.5.1(b) CArrayBase class


Seven matrix-related errors and four vector-related errors are handled. These errors are identified in lines 18-29. Some of the
variables and functions are declared as static and are designed to be shared by all the CArrayBase objects. The variables m_ValueR
and m_ValueC are used to display specific error messages in the ErrorHandler function that is shown in Fig. E13.5.1(c).

S. D. Rajan, 2002-24 13-359


C L A S S E S : O B J E C T S 3 0 3

Fig. E13.5.1(c) ErrorHandler function in CArrayBase class (partial listing)


In addition to helping handle errors and displaying the appropriate error message, the base class also is used to track the amount
of memory allocated and deallocated by the two classes. The ShowStatistics function is used to display memory allocation and
deallocation and the Reset function is used to reset these values back to zero.

Each error in the list of errors needs to be caught in the CVector or CMatrix class and sent to its own ErrorHandler function.
Two examples are shown below. The first involves a system error, std::bad_alloc that is handled via the
VECTOR_ALLOCATION_ERROR as shown in Fig. E13.5.1(d).

Fig. E13.5.1(d) Handling vector allocation error in the CVector class


The CVector’s ErrorHandler function is very simple, and its main purpose is to help in debugging when the program is executed
using the debugger (set a breakpoint at line 759 and look at the Call Stack window to see more precisely which CVector member
function the error is coming from) as shown in Fig. E13.5.1(e). It has one line of code where the error is thrown with the hope
that it will be caught at the appropriate location and handled.

S. D. Rajan, 2002-24 13-360


C L A S S E S : O B J E C T S 3 0 3

Fig. E13.5.1(e) The ErrorHandler function


The second is an error that is unique to the functionality of the CVector class - VECTOR_INDEX_OUT_OF_BOUNDS. The relevant code
is shown in Fig. E13.5.1(f). Note that bound check is carried out only in the debug version of the program (recognized in line
447) in order to speed up calculations.

Fig. E13.5.1(f) Handling vector index out of bounds error


Additional Unexamined Capabilities in the CVector and CMatrix classes: The reader is strongly urged to examine, learn
and use the new features in both classes such as extensive operator overloading including << operator, initializer list, move
constructor, support for a few operations such as computing two norm, max norm, solving linear algebraic equations, etc.
Careful use of these features would make it easier to write and maintain computer programs. Here are a few examples.
CVector<double> X(n, 1.0); // vector of length n for double values with an initial value of 1.0
CVector<int> nVA = {‐10, 0, 20}; // vector of length 3 for integer values with initial values of ‐10, 0, 20
dVA = dMB*dVC + dVX ‐ 3.0*dVY; // vector‐matrix multiplication, addition, subtraction
dVx = dMA/dVb; // solving A x = b

Numerical Functions Toolbox: The second part of the program is the numerical functions toolbox. This class is derived
from the std::exception class so as to enable exception handling functionalities. The public portion of the class is shown in
Fig. E13.5.1(g). There are four errors identified in lines 19 and 20 – (Newton-Raphson Method) when the derivative used to
compute the next estimate of the root is close to zero, when the root cannot be found in a specified number of iterations and
tolerance, when a floating point exception takes place, and (Ridders’ Method) when an invalid input is specified by the user for
the step size.
Three error handler functions are declared – in line 34 for the errors listed on line 19-20, in line 35 for handling errors from the
CVector and CMatrix classes, and in line 36 for handling system-generated exceptions. In lines 37 and 39 we see functions that

S. D. Rajan, 2002-24 13-361


C L A S S E S : O B J E C T S 3 0 3

can be called to check if overflow or underflow has taken place or not. The two solution methods are available via their compute
functions (NRCompute and RMCompute) shown in lines 22-31 where an argument is used to point to the user-supplied function for
computing the required function and derivative values.

Fig. E13.5.1(g) Public portion of the CNFToolBox class


The NRCompute function is shown in Fig. E13.5.1(h). The RMCompute function operates in a similar manner and is not listed here.

Fig. E13.5.1(h) Function that implements Newton-Raphson calculations

S. D. Rajan, 2002-24 13-362


C L A S S E S : O B J E C T S 3 0 3

A revised version of the code presented in Example Program 6.4.1 is used here for the Newton-Raphson Method. Note that
the Newton-Raphson technique update takes place as
f xi 
x i 1  x i  i  1, 2,...... (13.5.1)
f xi 
where a polynomial of order n has (n+1) terms - f ( x )  a n x n  a n 1 x n 1  ...  a1 x  a 0 .
The first error condition (DERIVATIVEISZERO) is detected in line 43-44 so that a potential divide by zero or an overflow condition
does not take place when Eqn. (13.5.1) is used. The second error condition (UNABLETOCONVERGE) is detected outside the iterative
loop, in line 67. Finally, instead of checking every single floating point computation, overflow and underflow conditions
(FLOATEXCEPTION) are detected in line 63-44. The reader is urged to look at the the two functions bool
CNFToolBox::OverFlowDetected () and bool CNFToolBox::UnderFlowDetected (), to see what system function calls are
necessary to obtain the floating point exceptions. The main program is shown below.
main.cpp

A menu of test options (lines 19-22) is shown in line 28 via the GetInteractive function in line 27. It should be realtively easy
matter to add more methods to both the toolbox and hence, the menu and test cases. The test cases are created and solved in
the try block (lines 33-47). The files TestCases.h and TestCases.cpp contain the code for the classes that act as an intermediary
between the main program the the numerical toolbox. These intermediary test classes are really not needed, but they help put
the user-defined functions in identifiable classes. Adding new functions to these test cases is as simple as commenting and
uncommenting lines in the source code.

S. D. Rajan, 2002-24 13-363


C L A S S E S : O B J E C T S 3 0 3

The catch block is in three parts – the toolbox error is caught in lines 45-48, the CVector and CMatrix errors are caught in lines
50-53, the system-generated errors are caught in lines 55-58, and rest of the errors that do not belong to these three categories
are caught in the catch all block in lines 60-63.

Finally, two executions of the program are shown. A rather unique polynomial (Fig. E13.5.1(i)) whose root is difficult to find is
used to illustrate the exception handling concepts, and the use of Ridder’s Method in computing derivatives is shown in Fig.
E13.5.1(j).

Fig. E13.5.1(i) Sample output from Example Program


3.5.1
Fig. E13.5.1(j) Sample output from Example Program
3.5.1
A more substantial example is shown in Chapter 10 of the companion textbook Rajan, Intermediate Structural Analysis & Design,
where the same ideas discussed in this section are used in a finite element program for the analysis of space trusses.

S. D. Rajan, 2002-24 13-364


C L A S S E S : O B J E C T S 3 0 3

Summary
This chapter completes the formal treatment of C++ related object-oriented techniques in this book. In the rest of the chapters,
we will see how to use these OO ideas to develop and implement algorithms and solve engineering and scientific problems
numerically.

Where to go from here?


This is the last chapter dealing with C++ ideas. The rest of the chapters deal with more advanced numerical techniques. Hence,
before reading the following chapters, it is important that the reader consolidate the knowledge gained so far and start
constructing a library of useful numerical techniques using object-oriented ideas. The exercises that follow suggest the types of
classes to develop. In fact, the author would suggest that the reader develop a STL-style numerical library. For those who are
interested in a GUI program for freehand drawing that uses some of the advanced ideas discussed in this and previous chapters,
should download and look carefully at the weDraw program.

S. D. Rajan, 2002-24 13-365


C L A S S E S : O B J E C T S 3 0 3

Exercises
As a reader, learner and practitioner, you have reached a high level of maturity in organizing your thoughts, breaking
a non-trivial problem into small, manageable pieces, developing specifications, and implementing detailed
algorithms via C++. The problems below involve this higher level of thinking. There is no unique, best answer.
There are poor and then there are well thought-out solutions!

Appetizers
Problem 13.1
Review Section 6.1. Write the specifications for a class that can be used to find the roots of a function. Implement, test and
document the class.
Problem 13.2
Review Section 6.2. Write the specifications for a class that can be used to differentiate a function. Implement, test and document
the class.
Problem 13.3
Review Section 6.3. Write the specifications for a class that can be used to integrate a function. Implement, test and document
the class.

Main Course
Problem 13.4
Review Section 10.3 and your solution to Problem 10.5. Rewrite the specifications for the matrix (template) toolbox class. What
changes did you make and why? Implement, test and document the class.
Problem 13.5
Review Section 11.3. Write the specifications for a class that can be used for data fitting via regression analysis. Implement, test
and document the class.

C++ Concepts
Problem 13.6
Develop a template class (COOMatrix) to store two-dimensional arrays similar to the CMatrix class that we saw in Chapter 9.
However, derive this class from the CVector class and store the elements of the matrix in a vector. Note that if we have a matrix
Am n , then element Aij is located at ( i  1)n  j if the matrix is stored row-wise and at ( j  1)m  i if the matrix is stored
column-wise. Write a few test programs to compare the runtime performance of this implementation with the CMatrix
implementation.
Problem 13.7
Rework Example Program 13.5.1 by not using C++ exceptions but error codes, and then compare the two approaches to
handling and managing errors.

S. D. Rajan, 2002-24 13-366


C L A S S E S : O B J E C T S 3 0 3

References
A good source for a number of C++ libraries: http://en.cppreference.com/w/cpp/links/libs

S. D. Rajan, 2002-24 13-367


C L A S S E S : O B J E C T S 3 0 3

S. D. Rajan, 2002-24 13-368


14
O R D I N A R Y D I F F E R E N T I A L E Q U A T I O N S

Chapter

Ordinary Differential Equations


“Abilityisof littleaccountwithoutopportunity.”Napoleon Bonaparte

“Itisallonetomeif amancomesfromSingSingorHarvard.Wehireaman,nothishistory.” HenryFord

“Scienceisadifferentialequation.Religionisaboundarycondition.”AlanTuring

A differential equation contains derivatives of the dependent variable. We have an ordinary differential equation (ODE) when
the equation contains only one independent variable. There are several engineering and scientific models that involve derivatives
that are related to other parameters via known expressions. For example, you may have seen in your first course in physics,
ds
velocity, v , of an object defined mathematically as v  , the rate of change of its position. Or, in a college course in
dt
mechanics, the differential equation describing the transverse displacement, y , of a beam (see Chapter 11.1) can be expressed
d 4 y( x ) w( x )
as  . Solving such differential equations either analytically or numerically provides a wealth of information in
dx 4 EI
understanding the complex relationship between the dependent and independent variables.

Objectives
 To understand the different types of ordinary differential equations.
 To understand how to use the Runge-Kutta Methods to solve ODEs.

S. D. Rajan, 2000-24 14-369


O R D I N A R Y D I F F E R E N T I A L E Q U A T I O N S

14.1 Ordinary Differential Equations


We have an ordinary differential equation (ODE) when the equation contains only one independent variable. In Chapter 15,
we will see differential equations involving more than one independent variable and the problem is then described as partial
differential equations.
Consider the example of differential equations shown earlier. The relationship between velocity of an object and its position
with respect to time was expressed as
ds ( t )
v(t )  (14.1.1)
dt
This is a first-order linear differential equation since the highest-order derivative is one. The independent variable is time t .
Both the velocity v and the position s are dependent variables since their values are dependent on time. The ODE is linear if
ds
the differential equation is a linear function of v and . In other words, the ODE in its general form
dt
 dy d 2 y dn y 
F  x , y , , 2 ,..., n   0 (14.1.2)
 dx dx dx 
dy d 2 y dn y
is linear if F is a linear function of the variables y , , 2 ,..., n . The differential equation
dx dx dx
d 4 y( x ) w( x )
4
 (14.1.3)
dx EI
is a fourth-order linear differential equation since the highest-order derivative is the fourth derivative. The parameters E and
I in the equation could be constants or functions of the independent variable x . Consider another example
dy( x )
 cy 2  0 (14.1.4)
dx
This differential equation is a first-order, nonlinear differential equation since F is a nonlinear function of y . In general, we
can write the entire problem as
dy( x )
 f ( x , y( x )) with y  x 0   y 0 (14.1.5)
dx
where y  x 0   y 0 is the boundary or initial condition. We saw Taylor series expansion in Chapter 5, and we will examine this
more carefully here. Consider
y ( x ) 2 y ( x ) 3 y(n )( x ) n y
( n  1)
  n 1
y( x  h )  y( x )  y ( x )h  h  h  ...  h  h (14.1.6)
2! 3! n! ( n  1)!
If the series is truncated after the second term
y ( ) 2
y( x  h )  y( x )  y ( x )h  h (14.1.7)
2!
we can solve for the approximation of the first derivative as
y( x  h )  y( x )
y ( x )   O h  (14.1.8)
h
Eqn. (14.1.8) can be used to solve Eqn. (14.1.5) as
y i 1  y i  hf  h , y i  (14.1.9)

taking y i 1  f  x i  h  , y i  f  x i  . The attractive feature of Eqn. (14.1.9) is that it is possible to solve for y1 using y 0 ,
y 2 using y1 , and so on.

S. D. Rajan, 2000-24 14-370


O R D I N A R Y D I F F E R E N T I A L E Q U A T I O N S

14.2 Runge-Kutta Methods


The Runge-Kutta methods overcome the disadvantages of other techniques. For example, without using derivatives, Runge-
Kutta methods have the same level of accuracy, they require only one initial point to start the solution process, and the method
can be refined to produce more accurate solutions. In general, the Runge-Kutta methods can be described as
y i 1  y i  h  x i , y i , h  (14.2.1)
where from the initial point, x 0 , the solution at the next point is computed using the increment h and the increment function
  x i , y i , h  , and so on. The increment function can be expressed as
  x i , y i , h   c 1k1  c 2 k2  ....  c n kn (14.2.2)
where n is the order of the Runge-Kutta method, c i ’s are constants and ki ’s are recurrence relations given by
k1  f  x i , y i  (14.2.1)

k2  f  x i  p2 h , y i   21hk1  (14.2.2)

kn  f  x i  pn h , y i   n 1hk1   n 2 hk2  ....   n ,n 1hkn 1  (14.2.3)
In general, we can use Eqns. (14.2.1)-(14.2.3) to generate a general solution as
n n
 j 1

y i 1  y i  h  x i , y i , h   y i  h  c j k j  y i  h  c j f  x i  p j h , y i   a jl kl  (14.2.4)
j 1 j 1  l 
We will now see how to customize Eqn. (14.2.4) to generate various forms of Runge-Kutta methods.
14.2.1 First-Order Runge-Kutta Method
Using the general form of the method, Eqn. (14.2.4) with n  1 , we have the expression for the first order Runge-Kutta method
as  c 1  1

y i 1  y i  hc 1k1  y i  hf  x i , y i  (14.2.1.1)

Example Program 14.2.1 First Order Runge-Kutta Method


Consider the following problem
dy
 x 2  5x  10 0x 2 y( x  0)  1 .
dx
x 3 5x 2
The analytical solution to the problem is y( x ) 
  10x  1 . We will now solve the problem using first order Runge-
3 2
Kutta method using h  0.5 and h  0.1 . Take h  0.5 , y 0  1 . Then
y1  y 0  hf  x 0 , y 0   1  0.5  x 2  5x  10 x  0.5  4.875
y 2  y1  hf  x 1 , y1   4.875  0.5  x 2  5x  10 x 1  7.875
and so on. The rest of the calculations are shown in Fig. 14.2.1.

S. D. Rajan, 2000-24 14-371


O R D I N A R Y D I F F E R E N T I A L E Q U A T I O N S

Fig. 14.2.1 Solution to the given ODE with two different h values

14.2.2 Second-Order Runge-Kutta Method


To develop the second-order Runge-Kutta method  n  2  , we start with Taylor series expansion similar to Eqns. (14.1.6)-
(14.1.9) as
h2
y i 1  y i  hy i  y i  O  h 3  (14.2.2.1)
2
Noting that
dy  x i 
y i   f  x i , yi   f i (14.2.2.2)
dx
d 2 y xi  df  x i , y i   f f dy 
y i        fx  f y f  (14.2.2.3)
 x y dx x i
2
dx dx i

and substituting Eqns. (14.2.2.2) and (14.2.2.3) in (14.2.2.1) we have


h2 h2
y i 1  y i  hf i 
f xi  f i f yi  O  h 3   y i  hc 1k1  hc 2 k2 (14.2.2.4)
2 2
Comparing Eqn. (14.2.2.4) and (14.2.4) we can conclude that
k1  f  x i , y i  (14.2.2.5)

k2  f  x i  p2 h , y i  a 23hk1  (14.2.2.6)
We can use Taylor’s series expansion to expand Eqn. (14.2.2.6) as
k2  f  x i  p2 h , y i  a 23 hk1   f i  p2 hf x i  a 21hf yi f i  O  h 2  (14.2.2.7)
and
y i 1  y i  hf i  c 1  c 2   h 2 f x i  c 2 p2   h 2 f i f yi  c 2 a 21   O  h 3  (14.2.2.8)

S. D. Rajan, 2000-24 14-372


O R D I N A R Y D I F F E R E N T I A L E Q U A T I O N S

Comparing Eqns. (14.2.2.4) and (14.2.2.8) we have


c1  c 2  1 (14.2.2.9)
1
c 2 p2  (14.2.2.10)
2
1
c 2 a 21  (14.2.2.11)
2
There are three equations to solve for the four unknowns, there is no unique solution. The Modified Euler’s method is generated
by taking c 2  1 . This results in c 1  0, p2  1 2 , a 21  1 2 . There are other variations – with c 2  1 2 (Heun’s method), we
have c 1  1 2 , p2  1, a 21  1 , or with c 2  2 3 (Ralston’s method), we have c 1  1 3, p2  3 4 , a 21  3 4 , or with c 2  1
(Mid-point method), we have c 1  0, p2  1 2 , a 21  1 2 .

14.2.3 Fourth-Order Runge-Kutta Method


The most popular Runge-Kutta method is the 4th order that provides the delicate balance between accuracy and efficiency. In
general, the classical variation can be expressed as
h
y i 1  y i   k1  2k2  2k3  k4  (14.2.2.12)
6
where
k1  f  x i , y i  (14.2.2.13)

 h hk 
k2  f  x i  , y i  1  (14.2.2.14)
 2 2 
 h hk 
k3  f  x i  , y i  2  (14.2.2.15)
 2 2 
k4  f  x i  h , y i  hk3  (14.2.2.16)
We will use this version to develop an example program that implements the 4th order Runge-Kutta method.

Example Program 14.2.3 Runge-Kutta Method for First-Order ODE


We will develop an object-oriented solution to solving a first-order ordinary differential equation using the 4th order Runge-
Kutta method. The following two problems will be solved with h  0.1 :
dy( x )
(1)  2 y 0  x  2.5 y( x  0)  2 , and
dx
dy
(2)  x 2  5x  10 0x 2 y( x  0)  1 .
dx
Note that the second problem was solved in Example 14.2.1 using the first-order Runge Kutta method.

The main program details are explained first. The variables associated with the two problems are as follows – the value of h
(lines 23 and 32), the range of x values (lines 24-25 and 33-34) and the initial value of y (lines 24 and 33) for each problem.
The problem to be solved is selected in line 16. The rest of the required details are computed next. The range of x values is
divided into equal parts using the value of h (line 37) and the computed number of points is used to allocate the memory
required to store y and dy dx values at each point in line 38. The ODE solver object is defined in line 41 by invoking the
overloaded constructor. The public member function GoCompute is called to obtain the solution (line 42), and the results are
displayed in a tabular form (lines 44-55).

S. D. Rajan, 2000-24 14-373


O R D I N A R Y D I F F E R E N T I A L E Q U A T I O N S

main.cpp

The errors thrown by the overloaded constructor are caught and handled in lines 58-65.

This design of the ODE solver class makes it possible to hide the details of the solver from the user. The big question to ask is
“Where does the function evaluation take place to compute f i to obtain  x i , y i  ?”. The GoCompute function (file
ODESolver.cpp) reveals the where the 4th-order Runge-Kutta method class is used in obtaining the solution (lines 33-36).

S. D. Rajan, 2000-24 14-374


O R D I N A R Y D I F F E R E N T I A L E Q U A T I O N S

Note the last argument in the Solve function – it is the name of the function where f i is computed. The member function
UserFE is where the function evaluation takes place -  x i , yi 
come in as input and the y i is updated with the computed f i
value. Finally we will look at the Runge-Kutta class. The class has three public functions – the constructor, destructor and the
Solve function (RK4OM.h file).

The Solve functionalities can be divided into three parts. In the first part, the function value at the initial point is computed
(lines 24-29).

The second part involves the loop where the solution at all the other points is computed. This involves computing the four ki
values given by Eqns. (14.2.2.13)-(14.2.2.16) – lines 34-55. For each ki , the user function is called to obtain the f i  x i , y i 
value.

S. D. Rajan, 2000-24 14-375


O R D I N A R Y D I F F E R E N T I A L E Q U A T I O N S

In the final part, the recurrence formula, Eqn. (14.2.2.12), is used to compute the value at the next point (line 58), y i 1 , and
preparations are made for computations at the next point.

14.2.4 Solving a System of ODEs


So far, we have discussed the solution of a single differential equation that is of the first-order. How do we use the solution
strategy to solve problems with high-order derivatives? To understand the process, consider the following problem
d 2 y( x ) dy( x )
a 2
b  cy  0
dx dx
(14.3.1)
Define a new variable z ( x ) such that
dy
z (14.3.2a)
dx
Substituting Eqn. (14.3.2a) into Eqn. (14.3.1), we have
dz
a  bz  cy  0 (14.3.2b)
dx
The two equations in Eqn. (14.3.2a-b) can be rewritten as
dy1
 f 1  y1 , y 2   y1 (14.3.3)
dx
dy 2
 f 2  y1 , y 2   y 2 (14.3.4)
dx
The process can be easily generalized into n variables for an original problem involving nth order ODE with n initial conditions.

Example Program 14.2.4 Runge-Kutta Method for System of ODEs


Develop a computer to solve a system of ODEs using the 4th order Runge-Kutta method and solve the following problem
d 2 y( x ) dy( x ) dy
  5 y  0 , 0  x  2 and ( x  0)  1 , y( x  0)  0 .
dx 2 dx dx

S. D. Rajan, 2000-24 14-376


O R D I N A R Y D I F F E R E N T I A L E Q U A T I O N S

We will solve the problem using h  0.2 . Converting the original problem into two first-order differential equations and using
y1  y , we have an equivalent problem of the form
dy1
 y2 y1 ( x  0)  0
dx
dy 2
  y 2  5 y1 y 2 ( x  0)  1
dx
The slopes at the beginning of the interval are  f  f  x , y1 , y 2  

k1,1  f  0, 0,1  1 k1,2  f  0, 0,1  1  5  0   1


The first values at the mid-point are
h 0.2 h 0.2
y1  k1,1
 0 1  0.1 y 2  k1,2  1   1  0.9
2 2 2 2
from which the mid-point slopes can be computed as
k2,1  f  0.1, 0.1, 0.9   0.9 k2,2  f  0.1, 0.1, 0.9   0.9  5  0.1  0.4
The second set of midpoint values can be computed as
h 0.2 h 0.2
y1  k2,1
 0  0.9  0.09 y 2  k2,2  1   0.4   0.96
2 2 2 2
from which the second mid-point slopes can be computed as
k3,1  f  0.1, 0.09, 0.96   0.96 k3,2  f  0.1, 0.09, 0.96   0.96  5  0.09   0.51
The values at the end of the interval can then be computed as
y1  k3,1h  0  0.96  0.2   0.192 y 2  k3,2 h  1   0.51 0.2   0.898
from which the end-point slopes can be computed as
k4,1  f  0.2, 0.192, 0.898   0.898 k4,2  f  0.2, 0.192, 0.898   0.898  5  0.192   0.062
And finally, the next point can be computed as
0.2
y1  0.2   0  1  2  0.9   2  0.96   0.898   0.187267
6 
0.2
y 2  0.2   1 
 1  2  0.4   2  0.51  0.062   0.9080666
6 
The main program is shown first.

S. D. Rajan, 2000-24 14-377


O R D I N A R Y D I F F E R E N T I A L E Q U A T I O N S

Problem setup takes place in lines 19-21. The problem selection is in line 19 with the problem details defined in lines 23-44. The
ODE solver object is created in line 50 and the system of ODEs solved in line 51.

The output showing the solution takes place in lines 53-70. The errors detected in the constructor in CODESolver are caught in
lines 73-80.

S. D. Rajan, 2000-24 14-378


O R D I N A R Y D I F F E R E N T I A L E Q U A T I O N S

Summary
Differential equations are used to understand problems in a number of engineering and scientific disciplines. In this chapter, we
saw a very powerful technique, the Runge-Kutta Method to solve a system of ordinary differential equations. In the next chapter,
we will examine partial differential equations where there are more than one independent variables. In some cases, the PDEs
can be converted into a system of ODEs.

Where to go from here?


The exercises that follow require the reader to generate the solution to the problems both by hand (paper and pencil?) and by
writing a computer program. It may be a worthwhile exercise to integrate the classes for solving a single ODE and a system of
ODEs into the numerical toolbox developed at the end of Chapter 13.

S. D. Rajan, 2000-24 14-379


O R D I N A R Y D I F F E R E N T I A L E Q U A T I O N S

Exercises
Appetizers
Problem 14.1
dy( x )
Solve the following problem -  x 2 y , 0  x  2 with y( x  0)  1 , by hand using h  0.1 .
dx
Problem 14.2
Modify the program Example_14_2_3 to solve Problem 14.1.

Main Course
Problem 14.3
d2 y dy dy
Solve the following second-order differential equation: 4 2
 5  3 y  0 with y( x  0)  3, ( x  0)  1 in the
dx dx dx
interval 0  x  5 .
Problem 14.4
d 2x dx
The motion of a damped spring-mass system is given by m c  kx  0 , where x is the displacement from the
dt 2 dt
equilibrium position, m is the mass, c is the damping coefficient, k is the spring stiffness, and t is time. The problem
description is complete if the initial  t  0  displacement and velocity are known, and the time interval over which the solution
is required, i.e. 0  t  t final . Write a program that uses the 4th order Runge-Kutta method to solve this problem.

Numerical Analysis Concepts


Problem 14.5
An improvement over the fourth-order Runge-Kutta Method is called the Runge-Kutta-Fehlberg Method. Modify the program
Example_14_2_3 by adding this method as the second method in the CRK4OM class.

S. D. Rajan, 2000-24 14-380


15
P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

Chapter

Partial Differential Equations


"Thecentralenemyofreliabilityiscomplexity." Geeret al.

"Measuringprogramsbycountingthe linesofcodeislike measuringaircraft qualityby weight." Anon

“Thepurposeofcomputingisinsight, NOTnumbers.”RichardHamming

In Chapter 14, we saw differential equations where the dependent variable was a function of a single independent variable. In
scientific and engineering applications, there are numerous problems where a dependent variable is a function of more than one
independent variable (and their derivatives). For example, if y  y( x 1 , x 2 ,..., x n ) , then a partial differential equation (PDE)
 y y y  2 y  2 y 
may be written as f  x 1 , x 2 ,..., x n , , ,..., , , ... ,...   0 .
 x 1 x 2 x n x 12 x n2 
This chapter is geared for scientists and engineers who wish to learn a powerful numerical technique called the finite element
method. Obviously, this textbook is not a textbook devoted exclusively to discussing the finite element method but the main
aim of this chapter is to show how the method can be used to solve one-dimensional PDEs via the finite element method.

Objectives
 To understand the basic concepts associated with partial differential equations.
 To understand the different engineering examples of PDEs.
 To understand the basics of finite element method (FEM) and specifically, the Galerkin’s Method.
 To understand and practice numerical solution of one-dimensional PDEs via the finite element method.

SUPPLEMENTAL MATERIAL
A Windows-based program, 1DBVP© can be downloaded from the book’s web site and used to solve the problems discussed
in this chapter. The program is based on the finite element method.

S. D. Rajan, 2000-24 15-381


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

15.1 Background
One-dimensional boundary-value problems influence a variety of engineering areas - we list some of the more popular examples
below.
Specialty Area Problem Description
Solid Mechanics Transverse deflection of a cable
Hydrodynamics One-dimensional flow in an inviscid,
incompressible fluid
Magnetostatics One-dimensional magnetic potential
distribution
Heat Conduction One-dimensional heat flow in a solid medium
Electrostatics One-dimensional electric potential distribution

Consider a one-dimensional boundary value problem (also known as equilibrium problem) given by
d2 y dy
 P ( x )  Q( x ) y  F ( x ) (15.1.1)
dx 2 dx
If P and Q are constants, then
d2 y dy
2
P  Qy  F ( x ) (15.1.2)
dx dx
The total solution is
y ( x )  Ae 1x  Be 2 x  C 0 F ( x )  C 1 F '( x )  ..... (15.1.3)
where the constants of integration A and B can be found by substituting the two boundary conditions into Eqn. (15.1.2).
Note that we need two boundary conditions for the problem to be well-posed. The boundary conditions are of three types.
(a) The function y may be specified. This is known as the Dirichlet or Essential boundary condition.
(b) The derivative y ' may be specified. This is known as the Neumann or Natural boundary condition.
(c) The function y and the derivative y ' may be specified. This is known as the Mixed or Robin boundary condition.
In the FE solution, an approximate or trial solution ~y( x ) is constructed and solved for1. The FE approach has three distinct
operations.
(a) A trial solution ~y( x ) is constructed.
(b) An optimizing criterion is applied to ~y( x ) .
(c) An estimation of the accuracy of ~y( x ) is made.

Trial Solution
The trial solution is constructed with a finite number of terms as
~
y( x ; a )  0 ( x )  a11 ( x )  a 2 2 ( x )  ......  a nn ( x ) (15.1.4)
where i ( x ) are known trial or basis functions and the coefficients a i are undetermined parameters known as degrees of
freedom (DOF). The purpose of the trial function 0 ( x ) is to satisfy some or all of the boundary conditions. The most
common form of trial solutions is to use polynomials. We will see more about this later.

1 Those familiar with the finite difference solution will note that there are two approaches to solving the boundary-value ODE – the shooting

(initial-value) method and the equilibrium (boundary-value) method.

S. D. Rajan, 2000-24 15-382


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

Optimizing Criterion
The optimizing criterion is used to generate the appropriate equations so that we can solve for the numerical values of the
coefficients a i . As you can guess, the optimizing criterion is not unique and different approaches define what is meant by the
“best possible approximation” to the exact solution. The two most common forms are
(a) The Method of Weighted Residuals (MWR) – applicable when the problem is described by differential equations,
and
(b) The Ritz Variational Method (RVM) - applicable when the problem is described by integral (or, variational) equations.

In this lesson, the focus is on the former method. In the MWR, the criteria minimize an expression of error in the differential
equation. In the RVM, an attempt is made to extremize (typically, minimize) a physical quantity. We will see this approach in
the second module of this course.

Method of Weighted Residuals


There are at least four different optimizing criteria and we will see them in this lesson. Consider the differential equation (15.1.1)
rewritten as
d2 y dy
 P ( x )  Q( x ) y  F ( x )  0 (15.1.5a)
dx 2 dx
or, ( y )  F ( x )  0 (15.1.5b)
Substituting the trial solution, we have, in general
~
R ( x ; a )  ( y )  F ( x )  0 (15.1.6)
R( x ; a ) is known as the residual or error in the solution. In the MWR, the optimizing criterion is to find the numerical values
for a i which will make R( x ; a ) as close to zero as possible for all values of x throughout the domain of the problem. Once
a specific criterion is applied, a set of algebraic equations is produced. As you can see, the process is to transform the original
(linear) ODE to a set of linear algebraic equations.

Collocation Method
In this method, for each of the parameter a i a point x i is chosen and the residual is set to zero at that point.
R( x i ; a )  0 i  1,.., n (15.1.7)
The points x i are known as the collocation points. Note that we are setting the error in the residual, not the error in the solution,
to zero.

Subdomain Method
In this method, for each of the parameter a i an interval x i is chosen and the average of the residual is set to zero.
1
x i  R( x ; a ) dx  0
x i
i  1,.., n (15.1.8)

The intervals x i are called the subdomains.

Least-squares Method
In this method, with respect to each a i we minimize the integral over the entire domain, the square of the residual.

a i 
R 2 ( x ; a ) dx  0 i  1,.., n (15.1.9)

S. D. Rajan, 2000-24 15-383


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

The Galerkin’s Method


In this method, for each a i we require that a weighted average of the residual over the entire domain be zero. The weighting
functions are the trial functions i ( x ) associated with each a i .


 R( x ; a ) ( x )dx  0
i i  1,.., n (15.1.10)

The natural question is “Which of these techniques is the most appropriate?” A detailed answer is outside the scope of this text.
However, experience has shown that the Galerkin’s Method is the most suitable for the type of finite element applications
discussed in this text.
Example 15.1.1
Let us look at an example to see how the Galerkin’s Method works. Consider the problem

d  dy( x )  2
DE: x  1 x  2 (15.1.11)
dx  dx  x 2
BC: y( x  1)  2 (15.1.12)
 dy  1
 x   (15.1.13)
 dx x  2 2

Classical (or, Theoretical) Solution


Let us assume the trial solution as
~
y  a1  a 2 x  a 3 x 2  a 4 x 3 (15.1.14a)
Hence,
~
d y
 a 2  2a 3 x  3a 4 x 2 (15.1.14b)
dx
In order to satisfy the BC, we need
~
y( x  1)  2  a1  a 2 (1)  a 3 (1)2  a 4 (1)3
or, a1  a 2  a 3  a 4  2 (15.1.15)
And,
 d ~y 
 x   1  2a 2  8a 3  24 a 4
 dx  2
 x  2
1
or, a 2  4 a 3  12a 4   (15.1.16)
4
Eqns. (15.1.15) and (15.1.16) are known as constraint equations since the 4 a i ’s in Eqn. (15.1.14a) are no longer independent.
The two constraint equations render only 2 out of the 4 a i ’s as independent parameters (or, DOF). Using the two constraint
equations to write a1 and a 2 in terms of the other two, we have
a1  2  a 2  a 3  a 4 (15.1.17)
1
a 2    4 a 3  12a 4 (15.1.18)
4
Substituting these two equations in (15.1.14a)

S. D. Rajan, 2000-24 15-384


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

~ 1
y  2  ( x  1)  a 3 ( x  1)( x  3)  a 4 ( x  1)( x 2  x  11) (15.1.19)
4
This equation is now in the familiar form
~
y( x ; a )  0 ( x )  a11 ( x )  a 2 2 ( x ) (15.1.20)
1
where 0 ( x )  2  ( x  1)
4
1 ( x )  ( x  1)( x  3)
 2 ( x )  ( x  1)( x 2  x  11)
and we will assume for the sake of convenience that a1 and a 2 in (15.2.20) are the same as a 3 and a 4 in (15.2.19). We will now
write the residual as

d  d y( x )  2
~

R( x ; a )  x  0 (15.1.21)
dx  dx  x 2
 
Substituting (T3L2-10) in the above equation, we have
1 2
R( x ; a )    4( x  1)a1  3(3x 2  4)a 2  2 (15.1.22)
4 x
Now using the Galerkin’s Method (Eqn. (15.1.10))
2
 1 2 
   4  4( x  1)a
1
1  3(3x 2  4)a 2 
x2
( x  1)( x  3) dx  0

2
 1 2 
   4  4( x  1)a  3(3x 2  4)a 2  ( x  1)( x  x  11) dx  0
2
1 (15.1.23a)
1
x2 
Integrating yields two equations
5 41   29 
3   8 ln 2 
5   a1   6 
      (15.1.23b)
 41 81  a 2   211
  24 ln 2 
 5 2   16 
And solving
a1  2.138 and a 2  0.348
Hence, substituting in Eqn. (15.1.10) yields
~ 1
y( x ; a )  2  ( x  1)  2.138( x  1)( x  3)  0.348( x  1)( x 2  x  11)
4 (15.1.24)
 0.348 x 3  2.138 x 2  4.629x  4.839
There is an important concept associated with a derived term - flux that we have not seen before and should have. Flux is
defined as
dy
 ( x )  x (15.1.25)
dx
Flux has a physical interpretation and we will discuss the concepts in the next topic. Substituting in Eqn. (15.1.15), we have
~ 1 1
 (x; a )   ( x  2)  4.276x ( x  2)  1.043x ( x  2)( x  2)
2 4 (15.1.26)
 1.043x 3  4.276x 2  4.629x

S. D. Rajan, 2000-24 15-385


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

Finally, note that this solution is called the theoretical (or, classical) solution because of the manner in which the two boundary
conditions were imposed. In this methodology, the BCs were imposed on the trial solution itself so that the trial solution would
satisfy the BCs exactly. However, this in no way implies that the solution is correct in the interior of the problem domain.

Some Important Observations


 Why did we start with a trial solution that is a cubic polynomial? Since there are two BCs that must be imposed, the lowest
order polynomial that we could have used would have been a quadratic polynomial. Imposing the BCs then would have
left one free parameter (or, one DOF). The choice of using a cubic polynomial was an arbitrary choice and we were left
with two free parameters.
 The classical way of applying the BCs can become extremely cumbersome with more complex problems.
 How do we know whether the solution is good? One way to answer this question is to look at the concept of convergence.
The exact solution to the problem is
2 1
y( x )   ln x (15.2.27a)
x 2
and
2 1
(x )   (15.2.27b)
x 2
Convergence can be checked by starting with a low-order polynomial and increasing the order of the polynomial gradually.
The different trial functions should converge to a solution. If we had started with a quadratic polynomial, the Galerkin’s
solution would have been
~ 1
y( x ; a )  2  ( x  1)  0.427( x  1)( x  3)
4 (15.2.28)
 0.427x 2  1.958 x  3.531
and
~ 1 1
   ( x  2)  0.854 x ( x  2)  0.854 x 2  1.958x (15.2.29)
2 4
In the previous section, with the trial solution as a cubic polynomial the solution was
~ 1
y( x ; a )  2  ( x  1)  2.138( x  1)( x  3)  0.348( x  1)( x 2  x  11)
4
 0.348 x 3  2.138 x 2  4.629x  4.839
and
~ 1 1
 (x; a )   ( x  2)  4.276x ( x  2)  1.043x ( x  2)( x  2)
2 4
 1.043x 3  4.276x 2  4.629x
What if we had started with a quartic polynomial? The solution is then

~ 1
y( x ; a )  2  ( x  1)  3.3725( x  1)( x  3)  0.8881( x  1)( x 2  x  11) 
4
 0.0864( x  1)( x 3  x 2  x  31) (15.2.30)
 0.0864 x  0.888x  3.3725x  5.848x  5.277
4 3 2

and

S. D. Rajan, 2000-24 15-386


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

~ 1 1
 (x; a )   ( x  2)  6.745x ( x  2)  2.664 x ( x  2)( x  2)  0.346x ( x  2)( x 2  2x  4)
2 4 (15.2.31)
 0.346x 4  2.664 x 3  6.745x 2  5.848x
We will now plot the three trial functions and the exact solution both for the function and the flux (Fig. 15.1.1).

Comparison of Solutions

2.00

0.0864*x*x*x*x-0.888*x*x*x+3.3725*x*x-5.848*x+5.277
-0.348*x*x*x+2.138*x*x-4.629*x+4.839
0.427x*x-1.958*x+3.531
2/x+0.5*ln(x)
1.75
y

1.50

1.25
1.0 1.2 1.4 1.6 1.8 2.0

Fig. 15.1.1 Comparison of the function


While some difference can be noted between the quadratic and the exact solution, there is very little difference between the
cubic, quartic and the exact solutions. All the solutions have the same function value at the left boundary point, x  1 .
Comparison of Solutions

1.50
-0.346*x*x*x*x+2.664*x*x*x-6.745*x*x+5.848*x
1.043*x*x*x-4.276*x*x+4.626*x
-0.854*x*x+1.958*x
1.25 (2/x)-0.5

1.00

0.75

0.50

0.25
1.0 1.2 1.4 1.6 1.8 2.0

Fig. 15.1.2 Comparison of the flux


The differences between the exact solution and the three Galerkin’s solutions with respect to the flux is a different matter
altogether (Fig. 15.1.2). The error in the flux from the quadratic solution is large. However, the error decreases with the cubic
and the quartic functions. Note that all the solutions have the same flux value at the right boundary point x  2 . The message

S. D. Rajan, 2000-24 15-387


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

here is quite clear – (a) small errors in the function do not translate to small errors in the flux that involve the derivatives of the
function, and (b) increasing order (polynomials) trial functions yield better and converging solutions2.
The residual function is the function corresponding to the original differential equation (with all the terms on the LHS) in which
an approximate solution is substituted. It measures how close the approximation is to satisfying the DE but does not tell us
how close the approximation is to the exact solution. The Method of Weighted Residuals converts the original DE into a set of
algebraic equations that are much easier to solve. There are different approaches used in terms of weighting the residual. Of all
the techniques discussed, the Galerkin’s Method is by far the most suitable for the type of problems encountered in engineering
analysis.
In the lessons dealing with this topic, we saw how to assume the approximate solution called the trial solution and use the
Galerkin’s Method to generate the algebraic equations. The trial solution is usually a polynomial because of the properties that
they possess (continuous, differential, easy to handle etc.). We saw how to enforce the boundary conditions on the trial solution.
We also looked at the flux term. Finally, we also saw how to obtain more accurate solutions by increasing the order of the
polynomial in the trial solution.
There are two problems with this classical (or, theoretical approach). First, the trial solution is valid for the entire problem
domain. Hence it is not possible to accurately model problems in which there are known discontinuities in the solution and the
flux. Second, the manner in which the boundary conditions are enforced can become cumbersome for more complex problems.
In this next topic, we will see how to overcome both these drawbacks.

2Later we will see that the trial functions must have certain properties for this to be true. If we keep on increasing order of the trial solution
will we converge to the exact solution for this problem?

S. D. Rajan, 2000-24 15-388


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

Exercises
Problem 15.1.1
Consider the differential equation
d2 y y
 0 0 x 
dx 2 4
with y(0)  1 and y( )  0 . Find the solution using the Galerkin’s Method by assuming the trial solution as
y ( x ; a )  a 1  a 2 x  a 3 x 2  a 4 x 3 . Compare with the exact solution.

Problem 15.1.2
Consider the differential equation
d  2 dy  1
   30x  204 x  351x  110x  0x 4
4 3 2
x
dx  dx  12
with y(0)  1 and y(4)  0 . Find the solution using the Galerkin’s Method by assuming the trial solution as (i) a
quadratic polynomial, and (ii) a cubic polynomial. Compare to the exact solution.

Problem 15.1.3
Consider the differential equation
d  dy( x ) 
  x  1 0 1 x  2
dx  dx 
 dy 
with y( x  1)  1  ( x  1)   1
 dx  x  2
(a) Using the Galerkin’s method with a quadratic polynomial for the trial solution obtain an approximate solution for both
the function and the flux.
(b) Obtain a second approximate solution using a cubic polynomial for the trial solution.
(c) Compare the two solutions with each other and the exact solution.

S. D. Rajan, 2000-24 15-389


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

15.2 The Element Concept


Earlier we looked at applying the Galerkin’s Method in the classical (or, theoretical) sense. To make the approach general and
useful we need to deviate to a new format as shown below.
Step 1: Let the trial solution be assumed as
~ n
y( x ; a )  0 ( x )  a11 ( x )  a 2 2 ( x )  ......  a nn ( x )  0 ( x )   a i i ( x ) (15.2.1)
i 1

We will stick with the example from the previous topic. Using the definition of the residual, we have
d  d y( x )  2
~

R( x ; a )  x  (15.2.2)
dx  dx  x 2
 
Since we have n parameter trial function, we need n residual equations (see Eqn. (15.2.10))
xb

 R( x ; a ) ( x )dx  0
xa
i i  1,..., n (15.2.3)

Notice that we have changed the limits of integration (instead of using 1 and 2 as the limits) just to make the derivation general
and more useful. Substituting, we have
xb   ~
 
  x y( x )   2  i ( x )dx  0
d d
  dx  dx  x 2  i  1,..., n (15.2.4)
xa
   
Step 2: Integrate by parts the highest derivative term in the residual equations (which would be first term containing the second-
order derivative). Rearranging the terms, we have
xb
 
y  
~ ~
xb xb
d y d i 2   d
 dx dx
x dx    x 2 i  dx  i 
 dx   x  i  1,..., n (15.2.5)
xa xa
   xa
Three observations here – (a) the highest order derivative in Eqn. (15.2.5) is now lower (only first order derivatives compared
to second-order in Eqn. (15.2.4)), and (b) the “stiffness” term is on the LHS and the “loading” terms are on the RHS, and (c)
the loading term contains the boundary flux term.
Step 3: Substituting Eqn. (15.2.1) in Eqn. (15.2.5) and noting that
~
d y( x ; a ) d 0 ( x ) n d ( x )
   ai i (15.2.6)
dx dx i 1 dx
we have
zb
 
d y  
~
n  x b d i d  j  xb
2
 

j 1  x a dx
x
dx  x



dx  a j    2 i dx  x
dx  
i
 xa
  x a

xb
d i d 0
 x dx i  1,.., n (15.2.7)
xa
dx dx
Let
d i d  j
xb

K ij  
xa
dx
x
dx
dx

S. D. Rajan, 2000-24 15-390


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

  x
d y   b d i d 0
~
xb
2  
and Fi    2 i dx  x i   x dx (15.2.8)
x  dx   x a dx dx
xa
  
Then we can rewrite Eqn. (15.2.7) in the matrix notation as
 K 11 K 12 . . K 1n   a1   F1 
K K 22 . . K 2n  a 2   F2 
 21     
 . . . . .  .    .  (15.2.9)
 
 . . . . .  .   . 
   
 K n 1 Kn2 . . K nn  a n   Fn 
or,
K n n a n 1  Fn 1 (15.2.10)
The stiffness matrix K is symmetric since
xb
d j d i
K ji  
xa
dx
x
dx
dx  K ij (15.2.8b)

Step 4: Use a specific form of the trial solution


In the previous topic we used a polynomial as the trial solution. Polynomials are easy to deal with and we will stick with that
approach here. Let us assume the solution as
~ n
y  a1  a 2 x  a 3 x 2   a j  j ( x ) (15.2.11)
j 1

implying that
1 ( x )  1 2 ( x )  x 3 ( x )  x 2 (15.2.12)
The only difference between Eqn. (15.2.1) and the above equation is the 0 term. In this modified approach we will be applying
the BCs numerically and hence there is no need for the 0 term. Now we are ready to compute the terms in Eqn. (15.2.8).
Using (15.2.12)
d 1 d 2 d 3
0 1  2x (15.2.13)
dx dx dx
We will compute a typical term in the stiffness matrix
xb
2 3
K 23   (1)( x )(2 x )dx 
3
 x b  x a3  (15.2.14)
xa

and a force term (in two parts)


xb
2 x
F2int    x dx  2 ln b (15.2.15)
xa
x2 xa
and
 d ~y   d ~y 
F bnd 
 x  x a   x  xb (15.2.16)
2
 dx   dx 
 x a  xb
where F2int is the interior load term and F2bnd is the boundary load term. Similarly, the flux can be expressed as

S. D. Rajan, 2000-24 15-391


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

~
~ d y
  x  a 2 x  2 a 3 x 2 (15.2.17)
dx
Step 5: Generate the algebraic equations as
  1 1 
  2      ~
 |x a  |xb 
~
0 0 0    b x x a 
 
 1 a  
1 2 2 3    x   ~ 
 x b  x a2   x b  x a3   a 2    2 ln b    |x a x a   |xb x b 
~
0 (15.2.18)
 2 3    xa  ~ 
 2 3  a 3   ~
  |x a x a2   |xb x b2 
0
 3
 x b  x a3   xb  x a  
4 4
 
 2( x b  x a )
  
 
We can now proceed further by substituting the problem data.
Step 6: Substituting the numerical data.
xa  1 xb  2
 
0 0 0  ~ | ~ | 
  a1   1   ~
x 1 x 2 

0 3 14       ~ 
 a 2   2 ln 2    |x 1  |x  2 (2) (15.2.19)
 2 
3   
  a 3   2   ~ ~ 
0 14  |   | (4) 
15  
x  1 x  2

 3 
Step 7: Applying the BCs
 dy  1
First we will apply the natural boundary condition  x  
 dx  x  2 2
  ~ 1
0  |x 1  2 
0 0
   a1   1   ~ 
0 3 14       
 a 2   2 ln 2     |x 1 1  (15.2.20)
 2 3   
  a 3   2   ~ 
0 14    |x 1 2 
15  
 3 
Since the NBC is applied to the system equations it does not guarantee that the final solution will satisfy the NBC. In the classical
approach, we enforced the NBC up front on the trial solution itself so that the final solution did satisfy the NBC.
Applying the EBC y( x  1)  2 is different than what we saw in the Direct Stiffness Method (Topic 2). This is because there
is no equation directly associated with y( x ) . Imposing the EBC on the trial solution itself (Eqn. (15.2.11))
a1  a 2  a 3  2 (15.2.21)
We can write a 3 in terms of the other two parameters as
a 3  2  a1  a 2 (15.2.22)
Eliminating a 3 from Eqns. (15.2.20) using the above equation, we have

S. D. Rajan, 2000-24 15-392


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

   ~ 3 
 0 0    |x 1  2 
   
  14 3 14   a1   ~ 31 
     |x 1 2 ln 2   (15.2.23)
 3 2 3  a 2   3
 
 15 14  ~

 15    |x 1 34 
 3 
  
Eliminating the last equation (from above, Eqn. (1)-Eqn. (3), Eqn.(2)-Eqn. (3))
 31   65 
 15 3   a1   2 
      (15.2.24)
 31 43  a 2   71
 2 ln 2 
 3 6   3 
The final equations are symmetric, do not contain any boundary terms and can be solved numerically.
Step 8: Solving the system equations
a1  3.719 a 2  2.254 (15.2.25)
Substituting in Eqn. (15.2.22),
a 3  0.535 . (15.2.26)
Hence
~
y  3.719  2.254 x  0.535x 2 (15.2.27)
and
~
  2.254 x  1.070x 2 (15.2.28)
While we have overcome some obstacles with this procedure, the process has still a major drawback. We again revisit the issue
of imposing the boundary conditions.

The Element Concept


In the preceding section, the trial solution was assumed to be applicable for the entire problem domain. We will depart from
this approach and go through the process of discretization (or, creating a mesh with elements). This will enable us to develop a
very general methodology making it convenient to do a variety of things including applying the boundary conditions.

One-Element Solution
Step 4: We will once again stick with the same problem as the last section. Let us assume that the domain is discretized into a
single element. Let us now assume a typical element as shown in Fig. 15.2.1.

1 2
x
x L x2
1

y
y 2
1

Fig. 15.2.1

S. D. Rajan, 2000-24 15-393


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

The element is described by two nodes that are labeled 1 and 2 and are located at x 1 and x 2 respectively, with the length of
the element as L . Let us also assume that the solution varies linearly over the element and is y1 at node 1 and y 2 at node 2
noting that at this stage these values (called nodal values) are unknowns. We can describe the variation of the solution over the
element as a linear interpolation using the two nodal values. In other words
~
y( x )  a1  a 2 x  1 ( x ) y1   2 ( x ) y 2 (15.2.29)
This is indeed the trial solution that we have used as the starting point in the preceding sections. Using the end
conditions
y ( x  x 1 )  y1 y( x  x 2 )  y 2 (15.2.30)
and substituting in the first part of Eqn. (15.2.29) we obtain
y1  a 1  a 2 x 1
y 2  a1  a 2 x 2 (15.2.31)
Solving for the a i ’s, we have
x 2 y1  x 1 y 2 y 2  y1
a1  a2  (15.2.32)
x 2  x1 x 2  x1
Therefore,
~ x 2 y1  x 1 y 2 y 2  y 1
y( x )   x (15.2.33)
x 2  x1 x 2  x1
Rearranging
~ x2  x x  x1 x x x  x1
y( x )  y1  y2  2 y1  y2 (15.2.34)
x 2  x1 x 2  x1 L L
The trial functions
x x x  x1
1  2 2  (15.2.35)
L L
are known as the shape functions. They have special properties as shown in Fig. 15.2.2.

1 2
x
x L x2
1

1 1 2 1

Fig. 15.2.2

S. D. Rajan, 2000-24 15-394


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

Going back to Step 4 in the preceding section, we have no 0 ( x ) 3 and the i ’s are given by Eqn. (15.2.35). Hence for a typical
element
 k11 k12   y1   F1 
k      (15.2.36)
 21 k22   y 2   F2 
Let us evaluate couple of typical terms.
x2 x
d 1 d  2 2
 1   1  1 x1  x 2
k12  x dx dxx dx  x  L ( x )  L  dx  2 L (15.2.37)
1 1

x2
2 2 2 x
F1int    1 dx    ln 2 (15.2.38)
x1
x2 x1 x 2  x1 x1

 d y 
~

F bnd 
  x (15.2.39)
2
 dx 
 xb
Step 5: With all the other terms evaluated similarly,
 2 2 x2 
  ln
  x 1  x 2    y1   x 1 L x 1    |x 1 
~
1   x1  x 2 
     (15.2.40)
2L   x1  x 2   x 1  x 2    y 2   2  2 ln x 2  ~ | 
 x 2 
 x 2 L x 1 

These are the element equations similar to Eqn. (T2L2-4) etc. The flux expression is of the form
~
~ d y x
  x  ( y1  y 2 ) (15.2.41)
dx x 2  x 1
Step 6: Substituting the numerical values
x1  1 x2  2
we have
 ~ 
1  3 3  y1   2  2 ln 2    |x 1 
     ~  (15.2.42)
2  3 3   y 2   1  2 ln 2  
   |x  2 
Step 7: Applying the boundary conditions
1
First we will apply the natural boundary condition  x  2  . This results in
2
~ 
1  3 3  y1   2  2 ln 2   |x 1 
    (15.2.43)
2  3 3   y 2   1  2 ln 2    1 
 2 
Now applying the EBC y( x  1)  y1  2 is done in a manner described in the section containing Eqn. (T2L2-39). The
equations reduce to

3 It is not necessary to have this term since the BCs will be imposed numerically.

S. D. Rajan, 2000-24 15-395


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

 
1 0  2 
 y1   
    7  (15.2.43)
0 3   y 2    2 ln 2 
 2 
2 
Step 8: Solving
y1  2 y 2  1.409 (15.2.44)
Hence, the approximate solution over the element is found by substituting these values in Eqns. (15.2.34) and (15.2.41) yielding
~
y  2.591  0.591x (15.2.45)
and
~
  0.591x (15.2.46)

Comparing this solution to the exact solution shows that the results are quite in error. This is because of the linear trial solution
that was assumed and because of the fact that we used only one element.

Two-Element Solution
The finite element mesh for the two-element solution is shown in Fig. 15.2.3.

1 1 2 2 3
x
x =1 x =1.5 x =2
y y y
1 2 3
Fig. 15.2.3
This is a uniform mesh since both the elements are geometrically identical. The unknowns at the three nodes are labeled
y1 , y 2 , y 3 . Just as in the Direct Stiffness Method, we will generate the element equations.
Element 1: x 1  1 and x 2  1.5

 2 2 1.5    ~ |  
  ln    x 1  
1  1  1.5   1  1.5    y1   1 0.5 1    1 
        ~  (15.2.47)
2(0.5)   1  1.5  1  1.5    y 2   2  2 ln 1.5    |  
 x 1.5 
 1.5 0.5 1   1 
Element 2: x 1  1.5 and x 2  2

 2 2 2  ~  
  ln     |x 1.5  
1  1.5  2   1.5  2    y 2   1.5 0.5 1.5   
 2 
     (15.2.48)
2(0.5)   1.5  2  1.5  2    y 3   2  2 ln 2   ~ |  
 
 2 0.5 1.5  
x 2
 2 

We need to expand the notation to differentiate between the two elements, i.e.  |x 1  represents the flux at x  1 for
~

 1
element 1. Assembling the two equations to create the system equations, we obtain

S. D. Rajan, 2000-24 15-396


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

 3   ~  
 2  4 ln    |x 1  
 2.5 2.5 2
0   y1     
1

 2.5 6.0 3.5   y    4 ln 8    0



(15.2.49)
  2  9  
  
 0  
3.5 3.5   y 3   
 4   ~  
 1  4 ln 3     |x  2  
   2 
Note that the second term in the boundary load vector is zero. This is because when we assemble the system equations we
assume that
~ |   ~ |  (15.2.50)
 x 1.5   x 1.5 
 1  2
In other words the flux is assumed to be continuous across the two elements. As before, since this constraint is not enforced at
the system level (but a mere substitution is made for the flux on the RHS), the final solution will not show this flux continuity
across the elements4.
Now we need to impose the boundary conditions. Imposing the NBC first, we have
 3  ~ 
2  4 ln 2   |x 1 
 2.5 2.5 0   y1     
 2.5 6.0 3.5   y    4 ln 8    0  (15.2.51)
 
 2     
9  
 0 3.5 3.5   y 3   
 4   1 
 1  4 ln 3    2 
   
Now imposing the EBC yields
 
 2 
 1 0 0   y1   
 0    8 
 6.0 3.5   y 2    4 ln  5  (15.2.52)
9
 0 3.5 6.0   y 3   
1 4
 2  4 ln 3 
 
Solving the equations yields
y1  2 y 2  1.551 y 3  1.365 (15.2.53)
Hence, for element 1 (using Eqns. (15.2.34) and (15.2.41))
 ~y( x )   2  1.5  x   1.551 x  1   0.898x  2.898 (15.2.54)
     
 1  0.5   0.5 
~( x )   0.898 x (15.2.55)
 
 1
and for element 2
 ~y( x )   1.551 2.0  x   1.365  x  1.5   0.372x  2.109 (15.2.54)
     
 2  0.5   0.5 
~( x )   0.372x (15.2.55)
 
 2
Let’s compare the one-element and the two-element solutions as shown in Fig. 15.2.4.

4 This is similar to the imposition of the natural boundary condition at the system level.

S. D. Rajan, 2000-24 15-397


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

Comparison of the Solutions


0.898*x
0.372*x
2.0 -0.372*x+2.109
-0.898*x+2.898
0.591*x
2.591-0.591*x

1.5
y(x) and (x)

1.0

0.5
1.0 1.2 1.4 1.6 1.8 2.0

x
Fig. 15.2.4
(1) Both the solutions satisfy the EBC at x  1. They should, as we have imposed the EBC on the final system equations.
(2) There is an improvement in the two-element solution with respect to y( x ) (An error of 4.6% and 1.3% for the one-
element and two-element solutions, respectively at x  2 ).
(3) The error in the one-element flux is large throughout the domain. The error in the flux from the two-element solution is
much less (An error of 136% and 49% for the one-element and two-element solutions, respectively at x  2 ).
(4) While the function is continuous throughout the domain, the flux is discontinuous for the two-element model at the
element interface at x  1.5 .

Review of the Steps


Step 1: Assume the trial solution of the form
~ n
y( x ; a )  a 0  a1 x  ....  a n x n   y i i ( x )
i 1

where n is equal to the number of DOF in the element, y i are the nodal values and i ( x ) are the shape functions. Construct
the residual and substitute the trial solution in the residual equations. Note that there are as many residual equations as there are
DOF in the element.
Step 2: Integrate by parts the highest derivative term. Integrating by parts will not only lower the highest derivative term but also
generate a boundary (flux) term.
Step 3: Rewrite the equations so that the stiffness related terms are on the left and the load terms (interior and boundary) are on
the right.
Step 4: To generate the equations completely we need to assume the exact form of the trial solution, i.e. fix the value of n . This
will enable us to generate the terms in the stiffness matrix and the load vector. We can now generate the element equations in
the form
k n n u n 1  fn 1
We also need to generate the expression for the flux.

S. D. Rajan, 2000-24 15-398


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

Step 5: Using the problem data, we can generate the element equations for all the elements in the model. These equations are
then assembled into the system or global equations of the form
K N  N U N 1  FN 1
for a model with N system DOF.
Step 6: Using the problem data, we impose the NBCs first. Then we impose the EBCs resulting in equations of the form
K N  N U N 1  FN 1
Step 7: Solve the system equations for the primary nodal unknowns U .
Step 8: Using the primary unknowns we can compute the flux in each element.

S. D. Rajan, 2000-24 15-399


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

Exercises
In the following problems the quadratic element is to be used. To obtain the shape functions for the quadratic element refer to
Section 15.6.
Problem 15.2.1
Consider the differential equation
d  dy( x ) 
  x  1 0 1 x  2
dx  dx 
 dy 
with y( x  1)  1  ( x  1)   1
 dx x  2
(a) Using the element concept, use the linear polynomial as the trial solution. Apply the steps outlined in this section to obtain
an approximate solution for both the function and the flux.
(b) Repeat the problem but now use a quadratic polynomial. Compare the two solutions.
Problem 15.2.2
Consider the differential equation
d  2 dy  1
   30x  204 x  351x  110x  0x 4
4 3 2
x
dx  dx  12
with y(0)  1 and y(4)  0 . Using a quadratic polynomial as the (element) trial solution, obtain an approximate solution for
both the function and the flux. Compare this to the solution obtained earlier.

S. D. Rajan, 2000-24 15-400


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

15.3 One-Dimensional Boundary Value Problem


The one-dimensional BVP is described by (with the positive x direction left to right)
d  dy( x ) 
DE:  ( x )    ( x ) y( x )  f ( x ) x a  x  xb (15.2.56)
dx  dx 
BCs: At x  x a either y  y a or   gy  c a
At x  x b either y  y b or   hy  c b (15.2.57)
dy
where  ( x ) ,  ( x ) , and f ( x ) are known functions, y a , y b , c a , c b , g and h are constants, and   
is the flux. The
dx
BCs are EBC, NBC or mixed. Note that the mixed BC reduces to a NBC by setting g  0 and h  0 . We will look at specific
engineering problems that are governed by this differential equation in the next lesson.
We will use the Galerkin’s Method to generate the element equations. The following steps pertain to a typical element in the
mesh.
Step 1: Assume the trial solution as
~ n
y( x ; a )   y j  j ( x ) (15.2.58)
j 1

As in the previous section, we do not have the 0 ( x ) term since the boundary conditions will be imposed numerically. We will
drop the ~(tilde) notation for convenience’s sake. Substituting in the residual equations and integrating over the domain  of
the element, we have
 d  dy( x )  
   dx   ( x )

dx 
   ( x ) y ( x )  f ( x ) i ( x ) dx  0 i  1, 2,.., n

(15.2.59)

Step 2: Integrating by parts the highest-order derivative, we have


 dy d i 
  ( x ) dx   ( x ) y( x )i  dx   f ( x )i dx   i   i  1, 2,.., n (15.2.60)

dx  
where  represents the boundary of the element.
Step 3: Using Eqn. (15.2.58) in (15.2.60)
n  d i d j 
   dx  ( x ) dx dx   i ( x ) ( x ) j ( x ) y j 
j 1   

 f ( x ) ( x ) dx   

i i i  1, 2., , n (15.2.61)

Step 4: Let us use the linear interpolation two-node element from the earlier section ( n  2)
x2  x x  x1
1  2  (15.2.62)
L L
Substituting Eqn. (15.2.62) in Eqn. (15.2.61), we have the element equations as
 k11 k12   y1   F1int   F1bnd 
k      (15.2.63)
 21 k22   y 2   F2int   F2bnd 
Let us look at a typical stiffness term first.
x x
2
 1   1  2
 x2  x   x2  x 
k11      ( x )   dx      (x )  dx (15.2.64)
x1 
x 2  x 1   2 x  x 1  x
x1  2
 x 1   x 2  x1 

S. D. Rajan, 2000-24 15-401


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

To evaluate the above equation, we need to know  ( x ) and  ( x ) . The functions can then be substituted and the integral
evaluated. Instead, we will assume that we know the numerical value of these two functions at the centroid of the element, i.e.
x  x2
at x  1 . For this element in which the solution is assumed to vary linearly, this is the most accurate point to evaluate
2
the integral. Let us denote the centroidal values (constants) as  and  . Now, integrating
 L
k11   (15.2.65)
L 3
We can use a similar strategy with the load terms. When all the terms are evaluated, we have,
  L  L   fL
     y   
6   1   2  1  

 L 3 L
    (15.2.66)
   L   L   y 2   f L   2   
    
 L 6 L 3   2 
The term that requires special treatment here is the last term. In general
i    i x  i x
2 1
(15.2.67)
From Eqn. (15.2.57), we have
  ( gy  c ) for x  x a and   ( hy  c ) for x  x b .
Substituting this in Eqn. (15.2.67) for i  1
1    1 x  1 x
2 1
 0   x    gy  c x   g 1 y1  c 1
1 1
(15.2.68)
Similarly, for i  2
2    2 x  2 x
2 1
  x 2  0   hy  c x  h2 y 2  c 2
2
(15.2.69)
Note that the unknown y appears in the two equations and must be moved to the LHS. Therefore, the modified element
equations are
   L  L  
   
  y
 L 3 L 6   1 0   0 0    1  
 g1    h 2 
   L  L  0 0 0 1    y 2 

        

  L 6 L 3  
 fL
 
 2   c 1 
   (15.2.70)
 f L  c 2 
 2 
The g 1 term exists provided x 1  x a . Similarly, h2 term exists provided x 2  x b . The flux at the center of the element is
given by
dy 
      y 2  y1  (15.2.71)
dx L
The element equations are ready, and we now need to look at engineering problems that are described as 1D BVP to illustrate
these steps and the rest of the solution.
Finally, a warning – boundary conditions can be dangerous to your health! Applying the BCs incorrectly is one of the most
common forms of errors.

S. D. Rajan, 2000-24 15-402


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

15.4 Solid Mechanics

Axial Deformation of an Elastic Rod


Consider the elastic rod as shown in figure below. The axial displacement is denoted u  u( x ) and the axial loading on the rod
is w  w ( x ) . Fig. 15.4.1 shows example boundary conditions where u  0 (EBC) on the left end and the force F (NBC) on
the right end is zero.

u=0

F=0

w
u, x

Fig. 15.4.1
The governing differential equation is given as
d  du( x ) 
  A( x )E( x )   w( x ) A( x ) (15.4.1)
dx  dx 
with the boundary conditions as
At x  x a , u( x  x a )  u a or NBC (15.4.2)
At x  x b , u( x  x b )  ub or NBC (15.4.3)
The natural BC is of the form
X  nx  x (15.4.3a)
where n x is the direction cosine of the outward normal, X is the force per unit area and is positive if it acts in the positive x
direction. Let us look at some possibilities with respect to the boundary conditions.
Rod has a known displacement u a (incl. zero) at the left end
u  ua
There is a concentrated force Fa applied at the left end in the positive x direction
du du
Fa   AE or a   E
dx x xa dx x xa

There is a concentrated force Fb applied at the right end in the positive x direction
du du
Fb  AE or b  E
dx x  xb dx x  xb

Using the process discussed in the previous lesson, Eqn. (15.2.70) reduces to

S. D. Rajan, 2000-24 15-403


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

 wAL 
    
AE 1 1  1   2   F1 
 u 
       (15.4.4)
L  1 1  u 2   wAL   F2 
 2 

k 22 u 21  f 21 (15.4.5)


F F
A dimensional analysis will show that - A  L2 , E  , u  L and wA  . The LHS and the RHS have units of F .
L2 L
While we will look formally at systems modeled with discrete structural elements (truss and frame) in the second module, it
should be noted that the stiffness matrix (in the local coordinate system aligned with the axis of the element) is the stiffness
matrix for a truss element provided
(a) the end of the element are pin connections,
(b) the cross-sectional area and the modulus of elasticity are constant within the element, and
(c) f ( x )  0 since no element loads are allowed on a truss member.
Example 15.4.1
Fig. 15.4.2(a) shows a bar made of a material with modulus of elasticity of 200 GPa. Compute the nodal displacements, element
stresses and support reactions.

2 2
250 mm 400 mm

P=300 kN

150 mm 150 mm 300 mm

Fig. 15.4.2(a)
Solution: Let us use m and N as the problem units. Let us use a three-element model placing a node where the concentrated
force acts5. The FE model is shown in Fig. 15.4.2(b). We have EBCs at the left and the right ends.

1 1 2 3 3 4
2
X
0.15 m 0.15 m 0.3 m
Fig. 15.4.2(b)

5This is not necessary since we can use a Dirac Delta function w ( x ) to define the concentrated force and then compute the equivalent forces
acting at the nodes of the element.

S. D. Rajan, 2000-24 15-404


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

N
Element 1: E  200(109 ) , A  250(10 6 ) m 2 , L  0.15m , w  0 . Hence the element equations are
m2
 3.333(108 ) 3.333(108 ) U1   F11 
 8     1
 3.333(10 ) 3.333(10 )  U 2   F2 
8

N
Element 2: E  200(109 ) , A  250(10 6 ) m 2 , L  0.15m , w  0 . Hence the element equations are
m2
 3.333(108 ) 3.333(108 ) U 2   F12 
 8     2
 3.333(10 ) 3.333(10 )  U 3   F2 
8

N
Element 3: E  200(109 ) 2
, A  400(10 6 ) m 2 , L  0.30m , w  0 . Hence the element equations are
m
 2.667(108 ) 2.667(108 ) U 3   F13 
 8     3
 2.667(10 ) 2.667(10 )  U 4   F2 
8

Assembling the equations, we have


 3.333 3.333 0 0  U1   F11 
 3.333 6.667 3.333  
0  U 2  300(103 )
108      
 0 3.333 6 6.667  U 3   0 
   
 0 0 6.667 6.667  U 4   F23 
Imposing the boundary conditions, we have
 1 0 0 0  U1   0 
 0 6.667 3.333 0  U   3 
108    2   300(10 )
 0 3.333 6 0  U 3   0 
   
 0 0 0 1  U 4   0 
Solving, the nodal displacements are
 U1 , U 2 , U 3 , U 4    0, 6.23, 3.46, 0 (104 ) m
The force in each element is computed as follows. Note that flux is negative of the member force.
Element 1
A1 E1 5(107 )
F1  (U 2  U 1 )  (6.23)(10 4 )  2.077(105 )N (Tension)
L1 0.15
This should also be equal and opposite to the support reaction at the left end, i.e. Rleft  2.077(105 ) N  .
Element 2
A2 E2 5(107 )
F2  (U 3  U 2 )  (3.46  6.23)(10 4 )  9.2333(104 )N (Compression)
L2 0.15
Element 3
A3 E3 8(107 )
F3  (U 4  U 3 )  (0  3.46)(10 4 )  9.2333(104 )N (Compression)
L3 0.30
This should also be equal and opposite to the support reaction at the right end, i.e. Rright  9.2333(104 ) N  .
We can now examine the results. The displacements satisfy the EBC at the left and right ends. The force equilibrium is satisfied
since the sum of the two support reactions is equal to the applied force. In addition, the forces in elements 2 and 3 are equal.

S. D. Rajan, 2000-24 15-405


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

15.5 Heat Transfer

One-Dimensional Heat Conduction and Convection Problem


Consider a long, thin rod as shown in Fig. 15.5.1. The temperature T  T ( x ) , q  q( x ) is the heat flux, k  k( x ) is the
thermal conductivity, Q  Q( x ) is the interior volume heat source, A  A( x ) is the cross-sectional area, h  h( x ) is the
convective heat transfer coefficient, l  l ( x ) is the circumference, T is the ambient temperature. A sample set of boundary
conditions is shown with NBC on the left end and EBC on the right end. We could also have a case involving a mixed BC on
the left or right ends.
Convective heat loss
T

A
A+dA T=T0
qa q Q q+dq x

dx perimeter, l

Fig. 15.5.1
The governing differential equation is given as
d  dT ( x ) 
  A( x )k( x )   h( x )l ( x )T ( x )  Q( x ) A( x )  h( x )l ( x )T (15.5.1a)
dx  dx 
or, for a cylindrical rod with a constant cross-sectional area
d  dT ( x )  hl hl
  k( x )   T ( x )  Q ( x )  T (15.5.1a)
dx  dx  A A
with the boundary conditions as
At x  x a , T  Ta or q  c a or q  ha ( T  Ta ) (15.5.2a)
At x  x b , T  Tb or q  c b or q  hb ( T  T ) b

(15.5.2b)
Looking ahead to two and three-dimensional problems, these boundary conditions are special cases of the general form (for
specified heat flow)
q x n x  q y n y  qz nz  q S (15.5.3a)

if heat qS is flowing into the surface S , and  n x , n y , nz  are the direction cosines of the outward normal from the surface.
Similarly, for free convection from surface S , we have
q x n x  q y n y  qz nz  h ( TS  T ) (15.5.3b)
Let us look at some possibilities with respect to the boundary conditions at the left end.
Left end is at a known temperature, T  Ta

S. D. Rajan, 2000-24 15-406


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

T  Ta
Heat ( qa ) is flowing into the left end
q  qa
Left end is insulated
q0
Free convection is taking place at the left end (ambient temperature is Ta and the convective coef is ha , T  Ta )
q  ha T  ha Ta or q  ha T  ha Ta
Let us look at some possibilities with respect to the boundary conditions at the right end.
Right end is at a known temperature, T  Tb
T  Tb
Heat ( qb ) is flowing out of the right end
q  qb
Right end is insulated
q0
Free convection is taking place at the right end (ambient temperature is Tb and the convective coef is hb , T  Tb )
q  hb T  hb Tb
Comparing these equations to the general form of the 1D BVP6,
hl
y( x )  T ( x )  ( x )  k( x ) (x )  (15.5.4a)
A
hl dT
f ( x )  Q( x )  T   k q (15.5.4b)
A dx
g 1  ha h2  hb (15.5.4c)
Using the natural and mixed boundary conditions shown above Eqn. (15.2.70) reduces to
  k hl L k hl L  
      T
 L 3A L 6A   1 0   0 0    1 
 h1    h2 0 1   T  
  k hl L k hl L  0 0
         2 

  L 6 A L 3A  
 hl 
 Q  T   q   h T 1 
L A   1   1 
    q    2  (15.5.5a)
2 hl h2T 
Q  T     
2

 A 
k 22 u 21  f 21 (15.5.5b)

6 The sign convention is as follows – energy (or, heat flow) into the surface or boundary is positive.

S. D. Rajan, 2000-24 15-407


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

Note that on the LHS and the RHS some of the components will be zero depending on whether the boundary condition is
E E E E
NBC or mixed. A dimensional analysis will show that - T  T , A  L2 , k  , h 2 , l L, Q  3 , c  2 ,
tLT tL T tL tL
E
the LHS and the RHS have the units 2 .
tL
Example 15.5.1
Fig. 15.5.2(a) shows a composite wall made of three materials. The outer temperature is 20 C . Convection heat transfer place
W
on the inner surface of the wall with T  800  C and h  25 2  . The thermal conductivities are
m  C
W W W
k1  20  , k2  30  , k3  50  . We need to determine the temperature distribution in the wall.
m C m C m C

h, T 0
T =20 C
0
k k k
1 2 3

0.3 m 0.15 m 0.15 m

Fig. 15.5.2(a)
Solution: We will use a three-element model as shown in Fig. 1.5.5.2(b). Note the following - (a) This is a problem where the
convective heat exchange (gain) takes place from the left end (mixed BC). (b) There is no convective heat exchange from the
top and the bottom of the wall that are assumed to be very tall compared to the thickness of the wall. Hence, h  0  l . We
will assume a unit area for all computations, i.e A  1 m 2 . There is no internal heat generation, i.e. Q  0 . (c) The right end is
at a specified temperature (EBC). For the sake of convenience, we will not include the boundary flux load terms (assuming
inter-element flux continuity).

1 2 3 4
1 2 3
X
0.3 m 0.15 m 0.15 m
Fig. 15.5.2(b)
W W
Element 1: k  20 , L  0.3m , A  1 m 2 , h  0 , l  0 , h1  25 2  , T1  800  C , h2  0 , Q  0 , c 1  0 and
m C

m  C
c 2  0 . The element equations are as follows.
 20 20 
 0.3  25 
0.3  T1  25(800)
     
  20 20  T2   0 
 0.3 0.3 

S. D. Rajan, 2000-24 15-408


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

W
Element 2: k  30 , L  0.15m , A  1 m 2 , h  0 , l  0 , h1  0 , h2  0 , Q  0 , c 1  0 and c 2  0 . The element
m  C
equations are as follows.
 30 30 
 0.15 
0.15  T2  0 
  
  30 30  T3  0 
   
 0.15 0.15 
W
Element 3: k  50 , L  0.15m , A  1 m 2 , h  0 , l  0 , h1  0 , h2  0 , Q  0 , c 1  0 and c 2  0 . The element
m  C
equations are as follows.
 50 50 
 0.15  0.15  T3  0 
  
  50 50  T4  0 
   
 0.15 0.15 
Assembling the equations
 91.6667 66.6667 0 0  T1  20, 000 
 66.6667 266.6667 200.0 0  T   0 
   2    

 0 200.0 533.3333 333.3333 T3   0 
 
 0 0 333.3333 333.3333  T4   0 
There is no natural boundary condition (the mixed BC was taken care of when the element equations were generated). To
enforce the EBC T4  20 , we use the elimination approach. The modified equations are
 91.6667 66.6667 0 0  T1   20, 000 
 66.6667 266.6667 200.0 0  T   0 
   2    

 0 200.0 533.3333 0  T3  6666.6667 
    
 0 0 0 1  T4   20
Solving the equations we have (note decreasing temperature from node 1 to 4),
 T1 , T2 , T3 , T4    304.76,119.05, 57.14, 20  C
We can now compute the flux in each element.
Element 1:
k1 20 W
1   ( T2  T1 )   (119.05  304.76)  12380.95 2
L1 0.3 m
Element 2:
k2 30 W
2   ( T3  T2 )   (57.14  119.05)  12380.95 2
L2 0.15 m
Element 3:
k3 50 W
3   ( T4  T3 )   (20.0  57.1)  12380.95 2
L3 0.15 m
We will now examine the results. The solution satisfies the EBC at the right end. The flux is constant throughout the domain
as it should.

S. D. Rajan, 2000-24 15-409


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

Example 15.5.2
Fig. 15.5.3 shows a circular cross-section pin fin. It has a diameter of 0.3125” and a length of 5”. At the root, the temperature is
BTU
T0  150  F . The ambient temperature is T  80  F , the convective coefficient is h  6 , and the thermal
h  ft 2  F
BTU
conductivity is k  24.8 . Determine the temperature distribution in the fin.
h  ft  F

d
T
0

Fig. 15.5.3
Solution: We will use a two-element model to solve the problem. The FE model is shown in Fig. 15.5.4.

1 2 3
1 2
X
2.5” 2.5”
Fig. 15.5.4
Note the following - (a) In this problem the left end is tied to an EBC. (b) There is no convective heat exchange from the right
end (NBC with q  0 ). (c) There is no internal heat generation, i.e. Q  0 . We will select ft as the units for length.
BTU d2
Element 1: k  24.8 , L  0.208 ft , A    5.326(10 4 ) ft 2 , T  80  F ,
h  ft  F 4
BTU
h 6 , l   d  0.0818 ft , h1  0 , h2  0 , Q  0 , c 1  0 and c 2  0 . The element equations are as follows.
h  ft 2  F

 24.8 6(0.0818)(0.208) 24.8 6(0.0818)(0.208)   6(0.0818) 


 0.208  3(5.326e  4) 
0.208

6(5.326e  4)  T1  0.208  5.326e  4
(80)
  

 24.8 6(0.0818)(0.208) 24.8 6(0.0818)(0.208)  T2  2 
 6(0.0818) 


 0.208      (80)
 6(5.326e  4) 0.208 3(5.326e  4)  
 5.326e  4 

 183.12 87.285  T1  7667 


or,  87.285 183.12  T   7667 
   2   
Element 2: Same as element 1 except we use T2 and T3 .
Assembling, we have
 183.12 87.285 0  T1   7667 
 87.285 366.25 87.285  T   15334 
  2  
 0 87.285 183.12  T3   7667 
   

The NBC term is zero. To apply the EBC we use the elimination technique resulting in

S. D. Rajan, 2000-24 15-410


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

 1 0 0  T1   150 
    
 0 366.245 87.285  T2    28427 
 0 87.285 183.12  T3   7667 
Solving, we have
 T1 , T2 , T3    150, 98.82,88.97

F
The temperature decreases from left to right as expected. The element flux is computed as follows.
Element 1:
k1 24.8 BTU
1   ( T2  T1 )   (98.82  150)  6102
L1 0.208 h  ft 2
Element 2:
k2 24.8 BTU
2   ( T3  T2 )   (88.97  98.82)  1174
L2 0.208 h  ft 2
Let us carry out a few checks. The solution satisfies the EBC of 150 F at x  0 . The flux at the right end must be zero.
However, with this crude two-element model the error is large. To obtain better accuracy we must refine the mesh, i.e. add
more elements to the mesh. We will look at this aspect in the next two sections.

15.6 Higher Order Elements


So far, we have seen the element concept illustrated using linear interpolation on an element described by two nodes. We also
saw in the classical Galerkin’s solution methodology that superior results were obtained using higher order interpolation. Can
we not tie this concept to the element concept? The answer is a resounding “Yes”.
Fig. 15.6.1 shows the linear element (a) on which the solution is defined by a linear polynomial, (b) is described by two nodes
with nodal values y1 and y 2 , and (c) the resulting shape functions that use the nodal values to interpolate the solution.
~
y( x )  a1  a 2 x  1 ( x ) y1   2 ( x ) y 2
y

y
y 2
1
x

1 2
x
x L x2
1

1   1
1 2

Fig. 15.6.1
Fig. 15.6.2 shows the quadratic element (a) on which the solution is defined by a quadratic polynomial, (b) is described by three
nodes with nodal values y1 , y 2 and y 3 , and (c) the resulting shape functions that use the nodal values to interpolate the
solution.

S. D. Rajan, 2000-24 15-411


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

~
y( x )  a1  a 2 x  a 3 x 2  1 ( x ) y1   2 ( x ) y 2  3 ( x ) y 3 (15.6.1)
y

y
2 y
y 3
1
x

1 2 3
x
x L/2 x L/2 x3
1 2


1 2
1 1 3 1

Fig. 15.6.2 Quadratic element shape functions


A few comments are in order.
(a) For a one-dimensional element, we need to have nodes at the ends of the elements so that one element can be tied to the
element next to it. So we need a minimum of two nodes.
(b) When we have one degree of freedom per node, the number of nodes in the element is equal to the number of coefficients
in the trial solution. In other words, we need a total of two nodes for the linear element and a total of three nodes for the
quadratic element. Only then will we have sufficient equations to write the coefficients of the polynomials in terms of the
nodal values.
(c) The number of shape functions is also equal to the number of nodal values. We should now look at the properties of the
shape functions. First, the trial solution must be complete – a quadratic polynomial must have the constant, linear and
quadratic terms (e.g. having the constant, quadratic and cubic terms is not allowed). This is done to ensure that the element
can assume all possible solution modes. Second, the shape functions satisfy the following conditions
 i ( x j )   ij (15.6.2)
where  ij is the Kronecker’s Delta. In other words, the shape function will have a unit value at the node it is associated with
and zeros at the other nodes (see the plots shown above). This also ensures that the shape functions are linearly independent.
Now on with the quadratic element. Using Eqn. (15.6.1) and the nodal conditions, we have
1 x1 x 12  a 0   y1 
    
1 x2 x 22   a1    y 2  (15.6.3)
1
 x3 x 32  a 2   y 3 

These equations are similar to Eqn. (15.2.31). Solving for the a i ’s and collecting like terms, we have
~ ( x  x 2 )( x  x 3 ) ( x  x 1 )( x  x 3 ) ( x  x 1 )( x  x 2 )
y( x )  y1  y2  y3 (15.6.4)
( x 1  x 2 )( x 1  x 3 ) ( x 2  x 1 )( x 2  x 3 ) ( x 3  x 1 )( x 3  x 2 )
Hence,
( x  x 2 )( x  x 3 )
1 ( x )  (15.6.5a)
( x 1  x 2 )( x 1  x 3 )

S. D. Rajan, 2000-24 15-412


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

( x  x 1 )( x  x 3 )
2 ( x )  (15.6.5b)
( x 2  x 1 )( x 2  x 3 )
( x  x 1 )( x  x 2 )
3 ( x )  (15.6.5c)
( x 3  x 1 )( x 3  x 2 )
There is an easier way to compute the shape functions that we shall see in the next module. We still have two questions to
answer before we can generate the element equations for the quadratic element. First, “Where is node 2 located within the
element?” Again, a detailed answer will be generated in Module 2. For the time being let us assume that it is located at the center
of the element. Hence,
L
( x 2  x1 )  ( x 3  x 2 )   x 3  x1   L (15.6.6)
2
Second, “How do we handle the  ( x ) ,  ( x ) and f ( x ) terms?” We will develop a sophisticated technique in Module 2. For
the time being let us assume that they are constants within the element. Hence,
  7 8    4 L 2 L L 
  3L  
3L 3L   30 30 30 
1 0 0
   
   8 16

8   2  L

16 L 2 L   g 0 0 0 
  3L 3L 3L   30 30 30  1 

7    L
   0 0 0 
  
8

2 L 4 L 

  3L 3L 3L   30 30 30 
 fL 
 
 0 0 0    y1   6   c 1 
    4 fL   
h3 0 0 0    y 2     0  (15.6.7)
   6   
0 0 1    y 3  c
 fL   3 
 6 
 
The flux in the element is given by
~
~ d y  2(2 x  x 2  x 3 ) ( 4)(2x  x 1  x 3 ) 2(2x  x 1  x 2 ) 
   ( x )    y1  y2  y3 
dx  L 2
L 2
L2 

  x  4 y1  8 y 2  4 y 3   x 1  4 y 2  2 y 3   2 x 2  y1  y 3   2 x 3  y1  2 y 2   (15.6.8)
L2 
In a similar manner, we can generate higher order elements involving cubic, quartic, quintic etc. trial functions with four, five,
six etc. nodes per element. The manner in which we identify these elements is usually as follows. These elements are designated
as 1D-C m interpolation order element where C m denotes that the problem variable and its derivatives up to order m are continuous
across element boundaries. In the element formulation that we have discussed so far, only the problem variable is continuous
across the element boundaries. Hence the elements are designated 1D-C 0 linear element, or 1D-C 0 quadratic element etc.
Example 15.6.1
Let us resolve the last problem from the previous section. This time we will use the quadratic element. Let us assume that the
number of nodes is the same. The FE model is shown in Fig. 15.6.3.

S. D. Rajan, 2000-24 15-413


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

1 2 3
2.5” 2.5”
X

1
Fig. 15.6.3
2
BTU d
Element 1: k  24.8 , L  0.416 ft , A    5.326(10 4 ) ft 2 , T  80  F ,
h  ft  F

4
BTU
h 6 , l   d  0.0818 ft , h1  0 , h2  0 , Q  0 , c 1  0 and c 2  0 . The element equations are as follows (
h  ft 2  F
hl hlT
 k,  , f   )
A A
  7(24.8) 8(24.8) (24.8) 
  
 3(0.416) 3(0.416) 3(0.416) 
  8(24.8) 16(24.8) 8(24.8) 
   +
  3(0.416) 3(0.416) 3(0.416) 
  (24.8) 8(24.8) 7(24.8) 

 3(0.416)  3(0.416) 3(0.416) 
 

 4(921.52)(0.416) 2(921.52)(0.416) (921.52)(0.416)  


  
30 30 30
   T1 
 2(921.52)(0.416) 16(921.52)(0.416) 2(921.52)(0.416)    
T 
 30 30 30  2
   T3 
 (921.52)(0.416) 2(921.52)(0.416) 4(921.52)(0.416)  
 30 30 30  

 (73721.4)(0.416) 
 6 
 
 4(73721.4)(0.416) 
 
 6 
 (73721.4)(0.416) 
 
 6 
 181.7 116.38 1.4256  T1  5111.3
Or,  116.38 488.33 116.38  T    20445 
  2  
 1.4256 116.38 181.7  T3  5111.3
Applying the boundary conditions (EBC for T1 ), we have
 1 0 0  T1   150 
    
 0 488.33 116.38  T2    37902 
 0 116.38 181.7  T3  5325.2 
Solving,

S. D. Rajan, 2000-24 15-414


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

 T1 , T2 , T3    150, 99.84, 93.26



F
The flux in the element is computed using Eqn (15.6.8) and are given by
BTU
 ( x  0.0879')  6382
h  ft 2
BTU
 ( x  0.3281')  383.1
h  ft 2
The reason for selecting the two locations will be explained in the next module. Bear in mind that the flux distribution in this
element is, as Eqn. (15.6.8) shows, linear unlike the linear element in which the flux is a constant. Let’s compare the two results.
Temperature (  F )  BTU 
Flux  2 
 h  ft 
Node Linear Elements Quadratic Element Linear Elements Quadratic Element

1 150 150
2 98.92 99.84 6102 6382
3 88.97 93.26 1174 383.1

The nodal temperatures are close, but the flux values are quite different.

15.7 Mesh Refinement and Convergence


In the previous lessons we learnt two important facts. First, increasing the number of degrees of freedom increases the accuracy
of the solution. In other words, if we keep on increasing the number of elements (and nodes) in the FE model, we should
obtain better solutions that converge to the exact result. This is known as h-convergence. The h notation refers to the size of the
elements. Second, increasing the order of the polynomial in the trial solution also increases the accuracy of the solution. In other
words, if we keep on increasing the element order (and hold the number of elements constant), we should obtain better solutions
that converge to the exact result. This is known as p-convergence. The p notation refers to the polynomial order. We could also
combine the two and obtain what is known as hp-convergence.
The advantages of low-order finite elements are (a) the size of the element matrices is small, (b) the computational ease with
which the elements can be generated, and (c) the size of the hand-band width of the structural stiffness matrix is small. The
major disadvantage is that the convergence is slow. The comments pertaining to higher-order elements are just the opposite.
The FE mesh need not be uniform nor contain just one type of element. It may be advantageous to use the lower-order elements
where the solution does not change rapidly (flux has a low value) and use the higher-order elements in regions where higher
accuracy is required. By the same token, the mesh can be finer where higher accuracy is required.
The biggest disadvantage of any FE solution is the computational expense in solving the system equations. Hand calculations
(with help from equation solvers in calculators) are tedious if the number of unknowns is greater than about 10. However, the
finite element method is a numerical technique that is ideal for computer-based solutions. As we mentioned in the first topic,
the amount of human time taken to create the data and examine the results is far greater than the time taken to solve most FE
problems.
We will illustrate these ideas using the results from the computer program. If you wish you may jump to the section “Using the
1DBVP Program” to familiarize yourself with the program before reading the next section.

Example 15.7.1
Let solve Example 15.5.2 using the concepts discussed earlier. We will monitor three response quantities – temperature at the
right end, the flux at the left end that is the highest flux in the model and the flux at the right end that should converge to zero.
The results are summarized below.

S. D. Rajan, 2000-24 15-415


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

Model ID Number of Element Type Temperature at Flux at right Flux at left end
elements right end end
1 2 Linear 89 1 174 6 102
2 4 - do - 90.5 542 7 804
3 8 - do - 90.9 266 8 970
4 16 - do - 91 132 9 664
5 32 - do - 91 66 10 044
6 64 - do - 91 33 10 244
7 128 - do - 91 17 10 346

1 2 Quadratic 91.1 417 8 039


2 4 - do - 91 220 9 137
3 8 - do - 91 111 9 767
4 16 - do - 91 55 10 102
5 32 - do - 91 28 10 275
6 64 - do - 91 14 10 362

The temperature at the left end converges very rapidly to 91 F . The flux at the right end appears to converge linearly. The flux
at the left end converges much more slowly for the linear element compared to the quadratic element. Figs. 15.7.1 and 15.7.2
show the plots for the flux at the left and right ends for the linear and quadratic elements.
Flux at Right End Flux at the Left End

1250 Quadratic Element 11000


Linear Element

1000 10000

750 9000
Flux
Flux

8000 Quadratic Element


500
Linear Element

250 7000

0 6000
0 15 30 45 60 75 0 15 30 45 60 75

Number of Elements Number of Elements

Fig. 15.7.1 Fig. 15.7.2

We achieved two primary goals with the chapter. First, we generalized the Galerkin’s Method by tying it to the element concept.
This made it easier to generate the trial solutions that are valid over an arbitrary problem subdomain (the element!). The side
effect is that we can handle problems with known discontinuities in the solution, e.g. the flux. Second, we looked at the manner
in which the boundary conditions could be applied more easily. While the problem’s essential boundary conditions were satisfied
exactly, the higher-order boundary conditions were satisfied only in the limit.
The derivation of the element equations is perhaps the most important step when implementing the FE method. We saw how
we could generate a family of trial solutions and the associated elements. There are two important issues that we will deal with
in the next module that will make it easier to generate the element equations – a more rational way to generate the shape
functions, and the isoparametric formulation.
It should be pointed out that seemingly different engineering problems are all governed by the same differential equation.
However, one still needs to contend with the physics behind the parameters and the equations.
Better solutions are obtained by (a) increasing the number of elements in the mesh or, (b) by increasing the order of the element
(trial solution), or (c) both. With the easy availability of FE computer programs, it is now possible to obtain accurate solutions
in a relatively short period of time. The 1DBVP program illustrates (in its own manner) the aspects associated with pre-
processing, solution and post-processing.

S. D. Rajan, 2000-24 15-416


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

Summary
As we saw in this chapter, there are a number of engineering problems described by partial differential equations. The finite
element method is a powerful numerical technique that is routinely used in solving PDEs in a variety of engineering and scientific
areas. Very powerful computer programs capable of solving multi-dimensional, multi-physics problems are used routinely in
the design of airplanes, automobiles, electronic components, consumer goods etc.

Where to go from here?


Examine the source code in PlanarTruss© and FEAT© computer programs where object-oriented ideas are used in the context
of finite element technique. Use the 1DBVP© program to understand how the finite element method works in solving one-
dimensional boundary value problems.

S. D. Rajan, 2000-24 15-417


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

Exercises
In all the problems below, generate all the steps by hand except solve the systems equations using a programmable
calculator or a computer program. Finally check your answers using the 1DBVP© program. Explain the differences
in answers, if any.
Problem 15.1
Consider the 4 " bar shown in Fig. P15.1 that is loaded by an axial surface traction given by the function w ( x )  x 2 lb in .

Fig. P15.1
The bar properties are as follows - E  30(10 ) psi and A  2 in 2 . Determine the stresses in the bar using
6

(a) One linear element.


(b) Two linear elements.
(c) Four linear elements.

Comment on the results by comparing the answers with the analytical solution.
Problem 15.2
Resolve Problem 15.1 but use quadratic elements instead.

Problem 15.3
The structure shown in Fig. P15.3 is subjected to an increase in temperature T  80 C . In addition, the loads are given as
follows - P1  60 kN , P2  75 kN . Determine the displacements, stresses, and support reactions using linear elements. The
material properties are as follows.

Material A ( mm 2 ) E ( GPa )  ( mm mm  C )
Bronze 2400 83 18.9e-6
Aluminum 1200 70 23e-6
Steel 600 200 11.7e-6
Bronze
Aluminum
Steel
P P
1 2

800 mm 600 mm 400 mm

Fig. P15.3

S. D. Rajan, 2000-24 15-418


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

Problem 15.4
Consider a brick wall (Fig. P15.4.4) of thickness L  30cm , k  0.7 W m  C . The inner surface is at 28 C and the outer
surface is exposed to cold air at T  15 C . The heat transfer coefficient associated with the outside surface is
h  40 W m 2  C . Determine the steady-state temperature distribution within the wall and also the heat flux through the wall.
Assume a one-dimensional heat flow. Use a two linear-element model.
30 cm

h, T00
0
28 C

Fig. P15.4

Problem 15.5
Fig. P15.5 shows a thin, cylindrical rod, 1m long, composed of two different materials – the two end sections each 40
 cal  cm   cal  cm 
cm long, are made of steel  k  0.12  , and the center section is made of copper  k  0.92 
 s  cm  C 
2 
 s  cm 2  C 
is 20 cm long. The cross section is circular, with a radius of 2 cm. Heat is flowing into the left at a steady rate of
0.1cal s  cm 2 . The temperature of the right end is maintained at a constant 0 C . The rod is in contact with air at an
ambient temperature of 20 C so there is free convection from the lateral surface. The convective coefficient is given
cal
as 1.5  10 4 . Determine the temperature and flux distribution in the rod. Use both the linear and quadratic
s  cm 2  C
elements.

cal
  0.1
s  cm 2

Fig. P15.5

S. D. Rajan, 2000-24 15-419


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

Problem 15.6
A variable area rectangular cross-section fin transmits heat away from a mass as indicated in Fig. P15.6. The thickness of the fin
in the direction perpendicular to the paper is ten times that shown in the plane of the paper. There is convection on the entire
lateral surface.

h c , T00

temperature, T0
3h 0 h0 T= T00
Mass at

h c , T00

L1 L2

Fig. P15.6
With h0  5 cm , L1  L 2  10 cm , T0  400 C , T  100  C , hc  10 3 W mm 2  C and k  0.30 W mm 2  C , find the

temperature and flux distribution.

S. D. Rajan, 2000-24 15-420


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

References
Alan Jeffrey, Applied Partial Differential Equations, Academic Press, 2003.
Subramaniam Rajan, Introduction to Structural Analysis & Design, e-notes, 2017.
David Burnett, Finite Element Analysis: From Concepts to Applications, Addison Wesley, 1987.

S. D. Rajan, 2000-24 15-421


P A R T I A L D I F F E R E N T I A L E Q U A T I O N S

S. D. Rajan, 2000-24 15-422


16
E I G E N S Y S T E M S

Chapter

Eigensystems
“Excellenceisacontinuousprocessand notanaccident.”A.P.J.AbdulKalam

"When a man sits with a pretty girl for an hour, it seems like a minute. But let him sit on a hot stove for a minute and it's
longerthananyhour.That'srelativity." AlbertEinstein

There are several dynamic systems where energy stored in system components gives rise to the dynamic nature of the system.
Examples include rotating masses, compressed springs etc. Without external excitation, the energy within the system will decay
to a minimum state or will oscillate between extreme states. The rate of decay of the natural modes and the frequency of
oscillation are determined by the eigenvalues of the matrix that represents the system. In this chapter, we will examine
eigensystems with the emphasis being on numerical techniques to compute eigenvalues and eigenvectors.

Objectives
 To understand what eigenpairs represent and their properties.
 To understand how to compute eigenpairs numerically – vector iterations methods and Jacobi Method.
 Extend our knowledge of PDEs from Chapter 15 to look at one-dimensional eigenproblem suitable for modeling
engineering problems.

S. D. Rajan, 2000-24 16-423


E I G E N S Y S T E M S

16.1 Eigenproblems
A square matrix A n n has an eigenpair   , x  consisting of the eigenvalue  and the corresponding eigenvector x n1 if
A n n x n1   x n 1 (16.1.1)
From the above equation it should be clear that if x is an eigenvector then any multiple of x is also an eigenvector (though not
a distinct eigenvector). Eqn. (16.1.1) is valid if and only if
det( A   I )  0 (16.1.2)
Hence one way of computing the eigenpairs is to expand Eqn. (16.1.2) – an nth order polynomial whose roots are the eigenvalues
 . These roots could be real or complex depending on the properties of A . Finding the roots of an nth order polynomial is
neither easy nor efficient. Hence, in the rest of this chapter, we will see increasingly more efficient techniques to compute the
eigenpairs.
The system of equations described by Eqn. (16.1.1) is called the standard eigenproblem. There is another class of problem
described by
A n n X n n   n n Bn n X n n (16.1.3)
that is called the generalized eigenproblem where n n is a diagonal matrix containing the eigenvalues and X n n is a matrix
where every column contains an eigenvector. Note that the generalized eigenproblem is the same as the standard eigenproblem
if B  I .
Example 16.1
 2 1
Compute the eigenpairs of the matrix  .
2 6
Using Eqn. (16.1.2) we have
 2   1 
    2    6     2    8  10  0
2
det  
 2 6   
8  64  40
The quadratic polynomial has the following roots 1,2   1.55051, 6.44949 .
2
For 1  1.55051
 2 1  x 1  x1   x1   1 
 2 6   x   1.55051  x   x 1  2.22474 x 2 . Or, x 21   x    0.449491
  2  2  2  
For 1  6.44949
 2 1  x 1   x1   x 1  0.224745 
 2 6   x   6.44949  x   x 1  0.224745x 2 . Or, x 21   x    1

  2  2  2  
We have normalized both the eigenvectors by making the largest element in the eigenvector equal to 1.
To better understand some of the engineering applications, consider the spring-mass system shown in Fig. 16.1.1 that
is initially at rest and where the displacement u  u( t ) .

S. D. Rajan, 2000-24 16-424


E I G E N S Y S T E M S

k
..
mu
m x ku

..
u(t), u(t)
Fig. 16.1.1 System and its free-body diagram
Let us assume that the system is perturbed somehow. From the FBD, using the D’Alembert’s Principle
ku  mu  0 (16.1.4)
or, mu  ku  0 (16.1.5)
k
Let, 2  . Substituting in the above equation we have
m
u   2 u  0 (16.1.6)
Solution to the above differential equation is of the form (sum of two harmonics)
u( t )  C 1 cos t  C 2 sin t (16.1.7)
The term   k m is the angular frequency (expressed in rad/s), the term f   2 is the natural frequency (expressed in
Hz) and T  1 f is the natural period (expressed in s). To obtain the constants in Eqn. (16.1.7) we need the initial conditions.
At t  t 0 , u  u 0 which is the initial displacement, and
u  u0 which is the initial velocity.
u0
Using these initial conditions, C 1  u 0 and C 2  . Hence,

u0
u  u 0 cos t  sin t (16.1.8)

Or, u  A cos(t   ) (16.1.9)
2
 u  u
where A  u 02   0  which is the amplitude of vibration and   tan 1 0 is the phase angle. Eqns. (16.1.8) and
   u0
(16.1.9) represent the solution to free, undamped vibration.
Consider a harmonic forcing function
mu  ku  P sin t (16.1.10)
or, u   2 u  pm sin t (16.1.11)
P
where pm  (force per unit mass). The total solution consists of a general solution for the homogenous part and a particular
m
solution that satisfies the whole equation. The particular solution is
u  C 3 sin t (16.1.12)
pm
Substituting into Eqn. (16.1.11), C 3  . The total solution is then
 2  2

S. D. Rajan, 2000-24 16-425


E I G E N S Y S T E M S

u  C 1 cos t  C 2 sin t  C 3 sin t (16.1.13)


where the first two terms arise from the free vibration and the last term arises from the forced vibration. Hence,
 
1  P sin t
u   2 
(16.1.14)

 1    
k

The first part of the RHS is the “magnification factor” (1  ) and the second part is the equivalent “static” load, and the solution
represents the steady state forced response as shown in Fig. 16.1.2.


1 2

Resonance state
Fig. 16.1.2 Plot of  vs  

16.2 Properties of Eigensystems


The solution to the generalized eigenproblem1
K n n X n  n  Λ n  n M n  n X n  n (16.2.1)
generates the eigenpairs – the eigenvalues and their corresponding eigenvectors. We will assume that K is symmetric and
positive definite whereas M is symmetric. The general properties of the eigenvalue problem as described above are as follows.
 There are n real eigenvalues and eigenvectors such that
0  1  2  ...  n (16.2.2)
 The eigenvector x i corresponding to the eigenvalue i is such that
Kx i  i Mx i (16.2.3)
 The eigenvectors are such that
xTi Kx j  0 i j (16.2.4)
xTi Mx j  0 i j (16.2.5)
 The eigenvectors are generally normalized so that

1The motivation for changing the notation of the matrices from A, B to K, M is that K is normally associated with stiffness and M with
mass both of which are associated with scientific and engineering problems.

S. D. Rajan, 2000-24 16-426


E I G E N S Y S T E M S

x Ti Mx i  1 (16.2.6)
x Kx i  i
T
i (16.2.7)
Other normalization schemes are possible such as making the largest entry in the eigenvector equal to unity. We will now look
at two methods to compute the eigenpairs. These methods have limitations in terms of the size of the problem that can be
handled efficiently. It should be noted that solving an eigenproblem is computationally more expensive than solving algebraic
equations.

16.3 Vector Iteration Methods


These iteration methods use the properties of the Rayleigh quotient. The Rayleigh quotient Q( x ) is defined as
x T Kx
Q( x )  (16.3.1)
x T Mx
where x is an arbitrary vector. The basic property of the Rayleigh quotient is that
1  Q( x )  n (16.3.2)
where 1 is the smallest eigenvalue and n is the largest eigenvalue. Power iteration, inverse iteration, and subspace iteration
methods use the property of Rayleigh Quotient to evaluate the eigenpairs. Power iteration can be used to find the largest
eigenvalue.
The Inverse Iteration Method is used to evaluate the lowest eigenvalue. The process starts with an assumed (initial guess) x 0 .
The algorithm is as follows.
Step 1: Assume the initial guess for the eigenvector as x 0 . With k as the iteration counter, set k  0 .
Step 2: Set k  k  1.
Step 3: Solve Kxˆ k  Mx k 1 for xˆ k . Since the RHS will be used later, store the RHS as y k 1  Mx k 1 .
T
xˆ k y k 1
Step 4: Estimate eigenvalue  k  T
where yˆ k  Mxˆ k .
xˆ k yˆ k
xˆ k
Step 5: Normalize eigenvector x k  12
.
 x k y k 
T

 
 
 k   k 1
Step 6: Check for convergence using  tolerance . If this condition is satisfied, then x k is the eigenvector. Otherwise
k
go to Step 2.
One should be careful in selecting the initial guess for the eigenvector x 0 . Convergence will not be possible if this vector is
orthogonal to the actual eigenvector. Later on, we will see how it may be possible to generate the initial guess knowing the
physics of the problem. Step 3 is the most expensive step in the procedure. However, if we use Cholesky Decomposition, then
only the backward substitution phase needs to be used with each new RHS vector that is created every iteration.

S. D. Rajan, 2000-24 16-427


E I G E N S Y S T E M S

Example 16.3.1 Inverse Iteration

 3 2 1
Compute the lowest eigenpairs of the following problem K 33 X 33  Λ 33 M 33 X 33 with K 33   2 2 1 and
 1 1 1
1 0 0
M 33   0 1 0  .
 0 0 1 

We start with the initial guess for the eigenvector as x 0  1 1 1 and   10 5 .


T

Iteration 1 ( k  1) :
0  0  0 
     
xˆ  0 
k
yˆ  0 
k
 1
k
x  0 
k

1 1 1


     
Iteration 2 ( k  2) :
0 0  0 
     
ˆx k  1 ˆy k  1   0.4
k
x  0.2 
k

2 2  0.4 


     
Iteration 3 ( k  3) :
 0.2   0.2   0.119048 
     
xˆ k  0.8  yˆ k  0.8   k  0.333333 x k  0.47619 
 1   1   0.595238 
     
Iteration 4 ( k  4) :
 0.595238   0.595238   0.100719 
     
xˆ  1.66667 
k
yˆ  1.66667 
k
  0.314149
k
x  0.282014 
k

 1.66667   1.66667   0.282014 


     
Iteration 5 ( k  5) :
 0.382734   0.382734   0.217631 
     
xˆ  0.946763
k
yˆ  0.946763
k
  0.309414
k
x  0.538351
k

 0.846043   0.846043   0.48108 


     
Iteration 6 ( k  6) :
 0.755983   0.755983   0.126521 
     
ˆx k  1.77541 ˆy k  1.77541   0.308309
k
x  0.297134 
k

 1.50051   1.50051   0.251126 


     
Iteration 10 ( k  10) :
0.799589  0.799589   0.133737 
     
xˆ k   1.8007  yˆ k   1.8007   k  0.307979 x k  0.30118 
 1.44809   1.44809   0.242203 
     

S. D. Rajan, 2000-24 16-428


E I G E N S Y S T E M S

  0.133737  
  
Hence the solution (lowest eigenpair) is   , x    0.307979, 0.30118  
  0.242203  
  
Example Program 16.3.1 Inverse Iteration Method
Implement the Inverse Iteration Method and solve Example 16.3.1.
The algorithm presented earlier is implemented in the program. We start by examining the main program.
main.cpp

The public member function in the CInverseIteration class that computes the lowest eigenpair, GoCompute, requires 8
arguments – matrix K, matrix M, the lowest eigenvalue, a vector that goes in as the initial guess for the lowest eigenvector and
is returned as the lowest eigenvector, the maximum number of iterations, convergence tolerance, a variable (bVerbose) that is
set to true if the intermediate calculations during the iterations are to be displayed, and the output stream where the intermediate
calculations are outputted. Note the way the K and M matrices are initialized in lines 22-27. The first two arguments are the
number of rows and columns in the matrix. The values are specified row-wise. The initial guess is specified in line 28 syntactically
in a similar manner to K and M matrices but without the size of the vector being specified.
At the end of the program, we check if the computed values are accurate enough by computing the residual vector
R  K1  1M1 (16.3.3)

S. D. Rajan, 2000-24 16-429


E I G E N S Y S T E M S

whose elements should be close to zero. To facilitate these calculations, in lines 36 and 37 copies the K and M matrices are
made. The residual vector is computed in line 46 and displayed in line 47.

Errors arising from the eigenpair calculations are thrown in the CInverseIteration class and caught in the main program.
The program listing of the CInverseIteration class is not shown here. The program output is shown in Fig. 16.3.1.

Fig. 16.3.1. Program output from Example Program 16.3.1

16.4 Transformation Methods


While the Vector Iteration Method is a powerful technique to compute the lowest eigenpair, it cannot be used directly if all the
eigenpairs are required. Transformation methods are general and useful if we are interested in computing all the eigenvalues and
eigenvectors of a matrix.
16.4.1 Jacobi Method
Let us again assume that A is symmetric (but not necessarily positive definite) in Eqn. (16.1.1). Solving for the eigenvalues is
simpler if somehow A can be converted to a diagonal matrix such that the diagonal elements are in fact the eigenvalues. Clearly
this process should be such that during the diagonalization process the original eigenvalues are preserved. Let Pn n be a
nonsingular matrix. Form Bnn as
B  P-1 AP (16.4.1)
Such a transformation is called a similarity transformation, and it preserves the original eigenvalues but not the eigenvectors. It
should be noted that P 1  PT . By performing a sequence of orthogonal similarity transformations, the symmetric matrix A
can be transformed into a diagonal matrix B . A rotation matrix R is used in this transformation. Typically, rotation matrices
are such that R T R  I and R 1  R T . Assume that we need to generate R so that the resulting transformation via Eqn.
(16.4.1) makes Aij  A ji  0 . This is possible if
Rii  R jj  cos   (16.4.2a)

Rij  R ji   sin   (16.4.2b)


In other words, one would apply the rotation matrix to carry out the following transformation
B  R T AR (16.4.3)
Since A is symmetric matrix, one can also infer that B is symmetric. Eqns. (16.4.2) and (16.4.3) can be solved for the rotation
angle yielding

S. D. Rajan, 2000-24 16-430


E I G E N S Y S T E M S

1  2 Aij  
  tan 1   or,   if Aii  A jj (16.4.4)
2  Aii  A jj 4
 
The basic idea then is to apply the rotation matrix to each off-diagonal element in A making it zero one at a time and continuing
the process until all the off-diagonal elements are numerically small. The algorithm is as follows.
Step 1: With k as the iteration counter, set k  0 . Let P  I . Set values for  and kmax .
Step 2: Scan A and located the element with the largest magnitude, Aij .
Step 3: Compute  as per Eqn. (16.4.4). Set the four elements of the R matrix using Eqn. (16.4.2).
Step 4: Update P and A as P  PR and A  R T AR .
Step 5: Convergence check: If Aij   , go to Step 6. Set k  k  1 . If k  kmax , go to Step 6. Otherwise go to Step 2.
max

Step 6: The eigenvalues are the diagonal elements of A and the columns of P are the eigenvectors that can then be scaled
appropriately.

Example 16.4.1 Jacobi Method


 2 3
Compute the eigenpairs of the matrix  .
 3 10 
This is a simple problem since there is only one off-diagonal element to zero out. The calculations will converge in one iteration.
1  2( 3) 
i  1, j  2 :   tan 1    0.321751
2  2  10 
 0.948683 -0.316228 
R 2 2   
 0.316228 0.948683 
1 0 
A  R T AR   
 0 11
 0.948683 0.316228 
P  PR  IR  R   
0.316228 0.948683 
  0.948683     0.316228  
Hence  1 , x1    1,    and  2 , x 2    11,  
 0.316228     0.948683  
Example Program 16.4.1 Jacobi Method
Implement the Jacobi Method and solve Example 16.4.1.
The algorithm presented earlier is implemented in the program. We start by examining the main program.
main.cpp

S. D. Rajan, 2000-24 16-431


E I G E N S Y S T E M S

The public member function in the CJacobiMethod class that computes the lowest eigenpair, GoCompute, requires 7 arguments –
matrix A, a vector to store all the computed eigenvalues, a matrix to store all the computed eigenvectors, convergence tolerance,
the maximum number of iterations, a variable (bVerbose) that is set to true if the intermediate calculations during the iterations
are to be displayed, and the output stream where the intermediate calculations are outputted. Errors arising from the eigenpair
calculations are thrown in the CJacobiMethod class and caught in the main program. The program listing of the CJacobiMethod
class is not shown here. The program output is shown in Fig. 16.4.1.

Fig. 16.4.1. Program output from Example Program 16.4.1


16.4.2 Generalized Jacobi Method
The Generalized Jacobi Method is used to solve the generalized eigenproblem given by Eqn. (16.1.3). The basic idea is the same
as that used in the Jacobi Method – successively use the rotation matrix to convert the off-diagonal elements in K and M to
zero. When the results have converged after k iterations, we have the following.
P  P1P2 ...Pk (16.4.5)

S. D. Rajan, 2000-24 16-432


E I G E N S Y S T E M S

Kˆ  PT KP (16.4.6)
Mˆ  PT MP (16.4.7)
X  PMˆ 1 2 (16.4.8)
M ˆ 1Kˆ (16.4.9)
where
 Mˆ 111 0 
 
 Mˆ 221 
ˆ 1  
M  ..  (16.4.10)
 
 .. 
 0 ˆ 1 
M nn 

 Mˆ 111 2 0 
 
 Mˆ 1 2
22 
ˆ 1 2  
M  ..  (16.4.11)
 
 .. 
 0 Mˆ nn1 2 

It should be noted that as k   , K  Λ and M  I .
One of more practical methods is the threshold Jacobi Method in which the off-diagonal elements are tested against a cutoff
value before it is determined whether they need to be zeroed out or not. Typically, the coupling between rows and columns i
and j is tested. In other words, if

k kii kij 
12
2
ij  tol (16.4.12)
is satisfied, then the rotation matrix is not generated and used. As a convergence check we use the following equations where
s represents some tolerance value.
K ii( k 1)  K ii( k )
( k  1)
 10  s i  1, 2,..., n (16.4.13)
K ii

12
  K ( k 1) 2 
 ij
  10  s i , j ; i  j (16.4.14)
 K ii K jj 
( k  1) ( k  1)

 
The rotation matrix Pk for the kth iteration to zero out the off-diagonal elements K ij and M ij is defined as follows.

1 
 1 
 
 .. 
 
i 1  
Pk   ..  (16.4.15)
 
 .. 
j  1 
 
 .. 
 1 

S. D. Rajan, 2000-24 16-433


E I G E N S Y S T E M S

To get the two off-diagonal elements K ij and M ij as zero, we need


 K ii  1    K ij   K jj  0
(16.4.16)
 M ii  1    M ij   M jj  0
Solving the equations yields the required values for  and  . Let
a  K ii M ij  M ii K ij (16.4.17a)
b  K jj M ij  M jj K ij (16.4.17b)
c  K ii M jj  M ii K jj (16.4.17c)
Then
0.5c  sgn( c ) 0.25c 2  ab

if a  0, b  0 a (16.4.18)
a
 
b
K ij

if a  0 K jj (16.4.19)
 0
 0
if b  0 K ij (16.4.20)
 
K jj
If a  b  0 , then either Eqn. (16.4.19) or (16.4.20) may be used. The overall algorithm is the same as that used for the Jacobi
Method. In the implementation, we will assume that both K and M are symmetric and positive definite.

Example Program 16.4.2 Generalized Jacobi Method


Compute all the eigenpairs for the problem described in Example 16.3.1 by developing a class to implement the Generalized
Jacobi Method.
The algorithm presented earlier is implemented in the program. We start by examining the main program.
main.cpp

S. D. Rajan, 2000-24 16-434


E I G E N S Y S T E M S

Lines 15-28 shows statements similar to Example Program 16.3.1 except that a matrix (PHI) is declared to store the computed
eigenvectors and a vector (LAMBDA) is declared to store the computed eiganvalues. The CGenJacobiMethod class object is declared
in line 39 and the GoCompute member function is called in line 40-41 to compute all the eigenvalues and eigenpairs.

The accuracy of the computations is evaluated by computing the residual vector (stored as a matrix in dMResid) in lines 51-61.
R  KΦ  MΦ (16.4.21)

The program output is shown in Fig. 16.4.2.

Fig. 16.4.2. Program output from Example Program 16.4.2

S. D. Rajan, 2000-24 16-435


E I G E N S Y S T E M S

16.5 One-Dimensional Eigenproblem


The governing differential equation is of the form
 d du( x ) 
 ( x )
    ( x )u( x )   ( x )u ( x )  0 x a  x  xb (16.5.1)
 dx dx 
with the boundary conditions as
At x a : u( x a )  0 or  (xa )  0 (16.5.2a)
At x b : u( x b )  0 or  ( xb )  0 (16.5.2b)
A few points are in order when we compare this differential equation to the one-dimensional BVP. First, there is no driving
force, i.e. f ( x )  0 . Second, there is an additional term,  ( x )u( x ) where  ( x ) describes a physical property of the system
(usually mass or mass density) and the scalar  is called the eigenvalue. Third, there are several solutions called eigensolutions
to this problem. The eigensolutions consist of pairs of eigenfunction u( x ) and eigenvalue  . Both of these are unknowns.
Lastly, in the absence of driving forces, the condition of the system changes. This is the resonant or natural state where the
internal energy oscillates back and forth between different forms e.g. kinetic and potential, without energy exchange with the
surroundings.
Once again, we will use the Galerkin’s Approach to solve the problem.
Step 1: Residual Equations
~
In a typical element with the approximate solution as u  u( x ) (dropping the tilde notation for convenience)
 d  du( x )  
   dx  ( x )

dx 
   ( x )u( x )   ( x )u( x ) i ( x ) dx  0

i  1, 2,..., n (16.5.3)

Step 2: Integrate by parts the highest order derivative


~
d i ( x ) du ~ ~

 dx  ( x ) dx dx   i ( x ) ( x ) u dx    i ( x ) ( x ) u dx 
xn
 
d u 
~

  ( x )  i ( x )
 (16.5.4)
 dx  
   x1

where x 1 and x n are the coordinates of the ends of the element. The last term must vanish since the boundary conditions
either are essential or homogenous (meaning zero valued) natural BC’s.
Step 3: Trial solution
~ n
Let the trial solution be represented as u( x , a )   a j  j ( x ) . Hence
j 1

n
 d i ( x ) d j ( x ) 
   dx
(x )
dx
dx   i ( x ) ( x ) j ( x )dx  a j
j 1    
n 
 
    i ( x ) ( x ) j ( x ) dx  a j  0 i  1, 2,..., n (16.5.5)

j  
Writing the above equation in a compact form
k n n a n 1   mn n a n1  0 (16.5.6)

Step 4: Element equations for the 1D  C 0 linear element


Considering the C 0 linear element we have

S. D. Rajan, 2000-24 16-436


E I G E N S Y S T E M S

x2  x x  x1
1 ( x )  and 2 ( x )  (16.7.7)
x 2  x1 x 2  x1
The terms in k were evaluated earlier. We will handle the mass matrix here.
x2 x2
x2  x x x L
m11   1 ( x ) ( x )1 ( x ) dx  x  (x ) 2 dx   m 22
x1 x1 2  x 1 x 2  x 1 3
x2
L
m12   1 ( x ) ( x ) 2 ( x ) dx   m 21
x1
6
Hence
 L 2 1
m 2 2 
6  1 2 
Once the element equations are assembled into the system equations, we obtain the system eigenproblem as
KΦ  ΛMΦ (16.5.8)
that can then be solved for the eigenvalues Λ and the corresponding eigenvectors Φ .

S. D. Rajan, 2000-24 16-437


E I G E N S Y S T E M S

Summary
In this chapter, we extended the ideas from the previous chapter on partial differential equations into one-dimensional
eigenproblem. We tied the solution of eigenproblems, i.e. computing eigenvalues and eigenvectors, to a number of numerical
techniques – Inverse Iteration, Jacobi, and Generalized Jacobi techniques. Larger eigenvalue problems in the area of finite
element analysis are usually solved using the Subspace Iteration and Lanczos methods.

Where to go from here?


As the reader may have recognized, solving eigenproblems is computationally more expensive than solving algebraic equations.
As we did with algebraic equations, it may be worthwhile when developing a computer program (Problems 16.4, 16.5) to
compute the eigenvalues and eigenvectors, to find the quality of the solution by computing the residual vector for each eigenpair.

S. D. Rajan, 2000-24 16-438


E I G E N S Y S T E M S

Exercises
Appetizers
Problem 16.1
Compute all the eigenpairs of the following matrices.
1 2 3 1 3 4 
(a) A 33   2 3 4  (b) A 33   3 1 2 
 3 4 5  4 2 1 

Problem 16.2
 0.0927686  
  
Check and show that   , x    0.195126,  0.191219   is the lowest eigenpair to the problem solved in Example 16.3.1.
  
  0.12226  

Main Course
Problem 16.3
Develop a class to implement the Inverse Iteration Method. Use the class to find the lowest eigenvalue and eigenvector for the
following generalized eigenproblem.
 2 1 0 0  1  0  1 
 1 2 1 0     2   
   2       2 
 0 1 2 1 3  4 4
 0  3 
   
 0 0 1 1  44 4 41  1 44 4 41

Problem 16.4
Develop a class to implement the Generalized Jacobi Method. Use the class to find all the eigenvalues and eigenvectors for the
following generalized eigenproblem.
 2 1 0   0.5 
 1 4 1     1  
  3 3 3 3   3 3

 0 1 2  33  0.5  33

Numerical Analysis Concepts


Problem 16.5
The QR algorithm is a method to compute all the eigenvalues and eigenvectors of a matrix. Unlike the Jacobi method, the
matrix need not be symmetric. Research the theory of the method and implement the method in a class.

S. D. Rajan, 2000-24 16-439


E I G E N S Y S T E M S

S. D. Rajan, 2000-24 16-440


17
N U M E R I C A L O P T I M I Z A T I O N

Chapter

Numerical Optimization
“Behappyforthis moment.Thismomentisyourlife.”OmarKhayyam

“Successisgetting whatyouwant.Happiness iswanting whatyouget.”DaleCarnegie

“Happiness liesinthejoyofachievementandthethrillofcreativeeffort.”FranklinD.Roosevelt

Engineers and scientists are quite often interested in finding a solution to problems that are described by a central goal or
objective, and whose solution is restricted by constraints. Engineering design problems are examples of this scenario. When
these problems are defined in terms of a small number of variables, answers can be obtained using a paper-and-pencil approach.
However, for most practical problems, it is necessary to resort to numerical techniques. The area of numerical optimization is
vast. We will deal with introductory material in this chapter with particular emphasis on nonlinear programming (NLP)
problems.

Objectives
 Become familiar with the language of design optimization.
 Understand how to formulate and analytically solve simple unconstrained and constrained minimization problems.
 Learn the basics of Genetic Algorithms (GA).
 Learn how to formulate and solve nonlinear programming problems using GA.

S. D. Rajan, 2000-24 17-441


N U M E R I C A L O P T I M I Z A T I O N

17.1 Numerical Optimization


The simplest form of a mathematical programming problem is
Find x  Rn (17.1-1a)
To minimize f (x) (17.1-1b)
Mathematics provides us with a very powerful language to express our ideas. In the above equations, x represents the vector
of design variables. These are variables we seek in order to complete the design process. The notation x  R n indicates that the
design variables are real-valued and that there are n variables - x 1 , x 2 ,..., x n . The function f ( x ) is the objective function. This is
the function that drives the design process and is either directly or indirectly a function of the n design variables. Such a problem
is called an unconstrained minimization problem.
Consider the following problem.
Find x
To Minimize f ( x )  ( x  10)2  1
Since there is only one design variable, the problem is referred to as a one-dimensional unconstrained minimization problem.
f(x)

x=x *

Fig. 17.1.1
From an earlier calculus course, we know for continuous differentiable functions, the necessary condition to find the minimum
df df
of a function, f ( x ) is  0 . This condition yields the stationary point(s). Therefore,  0  2( x  10) . Solving we have
dx dx
d 2 f (x  x * )
x  10 . The sufficient condition that this point, x  x * corresponds to a minimum is  0 . Checking,
dx 2
d 2 f (x  x * )
2
 2  0 . Hence, the solution to the above problem is as follows – the optimal solution is at x *  10 and the lowest
dx
value of the objective function is f ( x * )  1 . Fig. 17.1.1 shows the problem and the solution graphically.

S. D. Rajan, 2000-24 17-442


N U M E R I C A L O P T I M I Z A T I O N

Local and global f(x)


maximum

Local
maximum

Local
Local and global minimum
minimum
x

Fig. 17.1.2
There are other characteristics of even simple problems that we must be aware of. For example, in Fig. 17.1.3, the function
f ( x )  x 4  3x 2  x has multiple local minima. Such a function is called a multimodal function (the function in Fig. 17.1.1 is a
unimodal function). Each trough or valley captures a minimum that is local. Among all these local minima, there are one or more
points that have the absolute minimum value. These points are called the global minima as identified in Fig. 17.1.2.
x^4-3x^2+x

0
-3 -2 -1 0 1 2 3
-2

-4
x

Fig. 17.1.3 A multimodal function f ( x )  x 4  3x 2  x


We can use the calculus theorems seen earlier to solve this problem.
df
 0  4 x 3  6x  1  x  -1.300839567, .1699384435,1.130901123
dx
d2 f d 2 f ( x  -1.300839567) d 2 f ( x  0.1699384435)
Using the sufficient condition,  12 x 2
 6 , we find that  0 , 0,
dx 2 dx 2 dx 2
d 2 f ( x  1.130901123)
and  0 . Hence the points x *  1.3008 and x *  1.1309 are local minima points. Moreover, the
dx 2
point x *  1.3008 is also a global minimum point. The function has the lowest value at this point. As the functions become
“more nonlinear”, the task of analytically finding the minimum value becomes more difficult. One must then resort to a
numerical technique.
Let us now turn our attention to two variable problems. Since there is now more than one design variable, the problem is
referred to as a multi-dimensional unconstrained minimization problem. As an example, consider the following problem.

S. D. Rajan, 2000-24 17-443


N U M E R I C A L O P T I M I Z A T I O N

Find  x1 , x 2 
To Minimize1 f ( x 1 , x 2 )  100( x 2  x 12 )2  (1  x 1 )2
The function is shown in Fig. 17.1.4. To find the minimum, we can use the first order condition2 as
f
 0  400x 1 ( x 2  x 12 )  2(1  x 1 )
x 1
f
 0  200( x 2  x 12 )
x 2
Solving,  x 1 , x 2   {1,1} . The second-order condition involves computing H the Hessian matrix that contains the second-
order (partial) derivatives of f ( x ) .
 2 f 2 f 
 
x 12 x 1x 2  1200x 12  400 x 2  2 400x 1 
H 2 2  2  
  f 2 f   400x 1 200 
 
 x 2 x 1 x 22 
At the point (1,1)
 802 400 
H 
 400 200 
This matrix is positive definite (all the eigenvalues are positive) implying that the point (1,1) is a minimum point.

Fig. 17.1.4 Two-variable unconstrained minimization problem

1 This function is popularly known as the Rosenbrock’s function.


 f f f 
2 The syntactically correct condition is f ( x )  0 where f ( x)   ..  is called the gradient vector.
 x 1 x 2 x n 

S. D. Rajan, 2000-24 17-444


N U M E R I C A L O P T I M I Z A T I O N

The graphical solution while providing a visual look of the design space is difficult to use in locating the precise minimum. The
purpose of this example is to show that the complexity of a problem increases exponentially with increasing dimension of the
design space and that obtaining an analytical solution is cumbersome if not impractical.
It is difficult to formulate most engineering problems as unconstrained minimization problems. Typical engineering design
problems posed in the mathematical programming format are usually of the following form.
Find x  Rn (17.1-2a)
To minimize f (x) (17.1-2b)
Subject to g i ( x )  0 , i  1, 2,..., l (17.1-2c)
h j ( x )  0 , j  1, 2,..., m (17.1-2d)
x kL  x k  x kU , k  1, 2,..., n (17.1-2e)
Performance requirements, manufacturing constraints or even the permissible range of values for the design variables can be
specified only through constraints. The constraints g i ( x ) are inequality constraints while h j ( x ) are equality constraints. The constraint
functions, similar to the objective function, can be linear or nonlinear, continuous, discontinuous or piecewise continuous,
differentiable or non-differentiable. Eqns. (17.1-2e) establish the lower and upper bounds on the permissible values of the n
design variables. These constraints are usually referred to as bound constraints or side constraints. A problem posed in the above form
is called a constrained minimization problem. An example of the design space for a two-variable constrained problem is shown in
Fig. 17.1.5.
The design variables are  x 1 , x 2  . There are two inequality constraints g 1 and g 2 . The side constraints are x 1  0 and
x 2  0. The hatch marks on the lines and curves representing these four constraints indicate the constraint boundary marking
the barrier between the feasible domain and the infeasible domain. A feasible domain contains all the design points that satisfy all
the constraints. In Fig. 17.1.5, the feasible domain is bounded by these four lines. The vertices of the feasible domain are labeled
A-B-C-D. The rest of the design space is infeasible. The objective function, f is represented in the figure as isocost contours
or curves that have a constant f value. As this example indicates, by sliding the isocost contours in the direction of decreasing
objective function value, we can locate the optimal solution. The constraint g 1 controls the design and the optimal solution is
indicated with an x. The objective function has a value c 2 at this point.
x2
g
1 Isocost
contours

Decreasing
f values
x1=0
g
C 2

D
f=c
1
f=c
Feasible 2
f=c
Domain 3
A B x1

x2=0
Fig. 17.1.5 Constrained design problem

S. D. Rajan, 2000-24 17-445


N U M E R I C A L O P T I M I Z A T I O N

Some of the different techniques to solve simple constrained minimization problems are – exhaustive search, graphical, trial
and error and ‘constraint controlled’ optimal design. Exhaustive search is too expensive for solving practical problems. The
graphical technique can work at most for two-variable problems. The trial-and-error approach is tedious and unlikely to find a
local optimum. The ‘constraint controlled’ approach was ad hoc. We will formalize the approach in Section 17.3. Before we
look at formalizing the approach, we will look at different types of constrained minimization problems.

17.2 Types of Mathematical Programming Problems


It would be easy to state that there is one standard type of mathematical programming (MP) problem. It would perhaps be easy
then to create one or more solution techniques to solve that problem. The reality is that there are numerous types of MP
problems. We will categorize the problems now.
Types of design variables: The design variables commonly encountered in engineering applications can be of different types.
Continuous design variables are those that vary continuously. For example, the height and width of a concrete beam can be taken
as continuous design variables since they can, in theory, be cast in any size. Similarly, plate girders can be manufactured to any
practical dimensions.
Discrete design variables are those that are available in discrete or predefined values. For example, steel I-beams are usually
manufactured in predefined sizes and dimensions. The AISC sections are examples of such beams.
Integer design variables assume only integer values. An example of an integer design variable is the number of panel points in a roof
truss. This number is a positive integer (greater than zero).
Zero-one design variables, as the name suggests, have either a zero or one value. In structural design, one could interpret the zero-
one values as being equivalent to present-absent state. For example, we could designate the presence or absence of a member
in a truss as being a zero-one design variable. If the design variable value is one, then the member is assumed to be a part of the
truss.
Note that the properties of the design space or domain are dictated by the design variable type. With continuous design variables,
the design space can be continuous. With the other design variable types, the design space is discontinuous.
Types of functions: The functions that are used to designate the objective and the constraint functions can be of several types.
Note that the independent parameters are the design variables.
The simplest function is a linear function.
Nonlinear functions are those where the relationship between the function and the independent variable (e.g., design variable) is
nonlinear. Note that a nonlinear function is not necessarily a polynomial. For example, f ( x )  x 2  3x is nonlinear and so is
e x1
f (x)  .
sin(x 2 )
Posynomial functions have a special form given by
f ( x )  C 1 x 1a11 x 2a12 ... x na1n  C 2 x 1a 21 x 2a 22 ... x na2 n  ...  C s x 1as 1 x 2as 2 ... x nasn (17.2-1)
where C i  0 , a ij is a known coefficient, and x i  0 .
Types of constraints: There are two types of constraints.
Equality constraints are used to express the relationship between two or more design variables using the equality operator. The
function used to describe an equality constraint can be of any one of the forms defined earlier.
Inequality constraints are used to express the relationship between two or more design variables using a greater than, less than,
greater than or equal to and less than or equal to operators. The function used to describe an inequality constraint can be of any
one of the forms defined earlier.
Types of objective functions: There are two major types of objective functions – single objective and multi-objective
functions.
Single objective. Design problems discussed in this text are driven primarily by a single, primary objective.
Multi-objective. However, there are engineering problems where more than one objective is important and needs to be minimized
simultaneously. For example, an unconstrained minimization problem with multiple objectives can be stated as
Find x  Rn (17.2-2)

S. D. Rajan, 2000-24 17-446


N U M E R I C A L O P T I M I Z A T I O N

To minimize  f 1 ( x ), f 2 ( x ),...., f r ( x ) (17.2-3)


An example of a multi-objective structural design problem is:
Find The design variables
To minimize [Project Cost, Construction Time]
There are two objective functions – the project cost and the construction time. In this example, it should be recognized that
minimizing construction time can lead to an increase in the overall project cost.
Using the different types of objective and constraint functions, and design variables, different types of mathematical
programming problems can be described.
Linear Programming (LP) Problem: Consider the following problem:
Find  x (17.2-4a)
n
To maximize c
k 1
k xk (17.2-4b)

Subject to a i 1x 1  a i 2 x 2  ...  a in  bi i  1,..., l (17.2-4c)


a i 1x 1  a i 2 x 2  ...  a in  bi i  l  1,..., m (17.2-4d)
a i 1x 1  a i 2 x 2  ...  a in  bi i  m  1,..., r (17.2-4e)
x 1  0, x 2  0,..., x n  0 (17.2-4f)
where the coefficients c k and a ij are constant coefficients, and bi are fixed real constants which are required to be nonnegative.
As you can see from the above problem formulation, the objective and the constraint functions are linear functions of the
design variables; hence the problem is called a linear programming problem. To solve the above problem, the problem definition
is transformed so that all the constraints are equality constraints and b  0 . The standard LP problem is then:
Find  x (17.2-5a)
To maximize cT x (17.2-5b)
Subject to A r n x n 1  br 1 (17.2-5c)
x0 (17.2-5d)
Plastic designs (encountered in design of steel structural systems) can under restrictive conditions, be posed as an LP problem.
Dynamic Programming (DP) Problem: DP methodology is applicable to engineering problems that can be broken into
stages and exhibit Markovian3 property. While not widely used in the area of structural design, one could design a problem that
could be solved effectively as a DP problem. Consider a building system consisting of a roof system, a set of floors consisting
of beams and columns, and a foundation system. The load path typically starts at the roof and is finally transmitted to the
foundation via the floor system starting at the top floor and progressing to the bottommost floor. Hence one could design the
building in stages, starting at the roof and progressing to the foundation.
Non-Linear Programming (NLP) Problem: As we had mentioned earlier in this chapter, most engineering problems require
constraints to be satisfied. These constraints, and frequently, the objective function, are nonlinear leading to an NLP problem.
Find x  Rn (17.2-6a)
To minimize f (x) (17.2-6b)
Subject to g i ( x )  0 , i  1, 2,..., l (17.2-6c)
h j ( x )  0 , j  1, 2,..., m (17.2-6d)
x kL  x k  x kU , k  1, 2,..., n (17.2-6e)

3 A Markovian property is encompassed in a process if the decisions for optimal return at a stage in the process depend only on the current

state of the system and subsequent decisions.

S. D. Rajan, 2000-24 17-447


N U M E R I C A L O P T I M I Z A T I O N

Special cases of this formulation include those where the design variables are integers (Integer Programming Problem), or
boolean (Zero-One Programming Problem) or discrete (Discrete Programming Problem). Perhaps the biggest difference
between the NLP and LP problems is the likelihood of multiple solutions with NLP problems and the difficulty of finding the
solution effectively.
Our focus in this chapter is to look at NLP problems. Simple NLP problems can be solved ‘by hand’. However, most
engineering problems must be solved numerically.

17.3 Non-Linear Programming (NLP) Problem


As we saw in the previous section, the non-linear programming problem is by far the most commonly encountered structural
design problem. We will restate the nonlinear programming problem as follows.
Find x  Rn (17.3-1a)
To minimize f (x) (17.3-1b)
Subject to g i ( x )  0 , i  1, 2,..., l (17.3-1c)
h j ( x )  0 , j  1, 2,..., m (17.3-1d)
x kL  x k  x kU , k  1, 2,..., n (17.3-1e)
In addition, we will assume that (i) x is continuous and real-valued, and (ii) f ( x ) , g i ( x ) and h j ( x ) are continuous and
differentiable. The objective is to find x  x * so that the objective function has the lowest possible value without violating the
constraints.
Mathematical Background: In Chapter 6, we examined the concept of derivatives or gradients looking at these quantities as
scalar values. We must now extend this idea into multi-dimensions. The building block is a vector. For example, in two-
T
dimensional space a vector can be expressed as g 21   g x g y  . An example of vector in two-dimensions in a vector of
direction cosines (or, unit vector) - g 21   0.6 0.8  . If f  f ( x ) , then the gradient vector of f ( x ) is written as f ( x ) .
T

f f
For example, let f ( x )  x 12  2 x 1 x 2  x 23 . Then  2x 1  2x 2 and  2 x 1  3x 22 , and we can write the gradient of
x 1 x 2
T
f ( x ) as f ( x )21   2 x 1  2 x 2 2 x 1  3x 22  .

Consider the problem of minimizing f ( x ) subject to h j ( x )  0 . A point x * is a regular point provided h(x * )  0 and the
gradients of all the constraints at x * are linearly independent. Linear independence arises from the fact that no two gradients
are parallel to each other nor is it possible to write a gradient as a linear combination of two or more of the other gradients. For
example, let h1 ( x 1 , x 2 )  2 x 12  3x 2 and h2 ( x 1 , x 2 )  12 x 12  18 x 2 . Then h 1   4 x 1 3 and h 2   24 x 1 18 
T T

. Let x *  2,1 . Then h 1 ( x * )   4 3 and h 2 ( x * )   24 18  . The two vectors h1 and h 2 are parallel (or,
T T

linearly dependent) to each other since h 2  6h1 .


17.3.1 Kuhn-Tucker Conditions
Let us consider the following problem.
Find x  Rn (17.3.1-1a)
To minimize f (x) (17.3.1-1b)
Subject to h j ( x )  0 , j  1, 2,..., m (17.3.1-1c)
Lagrange is credited with developing a simple but effective way of solving the problem. He suggested that a function L (called
the Lagrangian) be developed as

S. D. Rajan, 2000-24 17-448


N U M E R I C A L O P T I M I Z A T I O N

m
L  f (x)    j h j (x) (17.3.1-2)
j 1

where  j is called the Lagrange multiplier. The regular point x * is a stationary point if

L f m h j ( x )
0  j i  1, 2,..., n (17.3.1-3)
x i x i j 1 x i
and h j (x)  0 j  1, 2..., m (17.3.1-4)
In other words, solving Eqns. (17.3.1-3) and (17.3.1-4) is equivalent to solving the original problem given by Eqns. (17.3.1-1a)-
(17.3.1-1c). Note that there are ( m  n ) unknowns in these ( m  n ) equations. However, the form of these equations is
dependent on the form of the objective function and the equality constraints, and the equations are usually not linear equations.
These conditions are known as first-order necessary conditions. The second-order sufficient conditions establish whether x *
is a local minimum or not. Treatment of these second-order conditions is outside the scope of this text4.
Example 17.3-1 Constrained Minimization with Equality Constraint
Find  x1 , x 2 
To minimize 4x 1  x 2
Subject to 2 x 12  x 22  1

Solution
Step 1: Form the Lagrangian
L  4 x 1  x 2  1  2 x 12  x 22  1
Step 2: Using Eqn. (17.3.1-3)
L 1
 0  4  4 1x 1  x 1 
x 1 1
L 1
 0  1  21 x 2  x 2 
x 2 21
Substituting in Eqn. (17.3.1-4) or the constraint equation, we have
2 2
 1  1 
2     1
 1   21 
3
Solving, 1   . For each root, we have
2
3 2 1 7
1  : x 1*  and x 2*  . f ( x * ) 
2 3 3 3
3 2 1 7
1   : x 1*   and x 2*   . f ( x * )  
2 3 3 3
2 1 7
Obviously, the minimum is at x 1*   , x 2*   and f ( x * )   .
3 3 3
Now let us consider the following problem with equality and inequality constraints.
Find x  Rn (17.3.1-5a)

4 See a book on optimal design. A comprehensive list is provided in the Bibliography at the end of the text.

S. D. Rajan, 2000-24 17-449


N U M E R I C A L O P T I M I Z A T I O N

To minimize f (x) (17.3.1-5b)


Subject to h j ( x )  0 , j  1, 2,..., m (17.3.1-5c)
g i ( x )  0 , i  1, 2,..., l (17.3.1-5d)
The Lagrangian function for this problem can be defined as
m l
L  f ( x )    j h j ( x )   i g i ( x ) (17.3.1-6)
j 1 i 1

where  j and i are the Lagrange multipliers. The regular point x * is a stationary point if

L f m h j ( x ) l g ( x )
0  j   i i k  1, 2,..., n (17.3.1-7)
x k x k j 1 x k i 1 x k
h j (x)  0 j  1, 2..., m (17.3.1-8)
gi (x)  0 i  1, 2..., l (17.3.1-9)
i g i ( x )  0 i  1, 2..., l (17.3.1-10)
i  0 i  1, 2..., l (17.3.1-11)
In other words, solving Eqns. (17.3.1-7)-(17.3.1-11) is equivalent to solving the original problem given by Eqns. (17.3.1-5a)-
(17.3.1-5d). We will illustrate the usage of these conditions using an example.
Example 17.3-2 Constrained Minimization with inequality constraint
Find  x1 , x 2 
To minimize ( x 1  3)2  ( x 2  4)2
Subject to x1  x 2  5  0
x 1  0, x 2  0
Solution
Step 1: We will not use the last two constraints requiring that the design variable be positive but will use them to select the
optimal solution. Form the Lagrangian as
L  ( x 1  3)2  ( x 2  4)2  1  x 1  x 2  5 
Step 2: Hence, the necessary conditions to be satisfied are
L
 0  2 x 1  6  1 (1)
x 1
L
 0  2x 2  8  1 (2)
x 2
1  x 1  x 2  5  =0 (3)
x1  x 2  5  0 (4)
1  0 (5)
Step 3: The key to solving these conditions is to recognize the importance of Eqn. (3). Eqn. (17.3.1-10) represents the switching
conditions since from those equations either i  0 or g i  0 . Using the switching conditions, the possibilities for this
problem are discussed below.
Case 1: 1  0 (meaning that the inequality constraint, g 1 is not active)

S. D. Rajan, 2000-24 17-450


N U M E R I C A L O P T I M I Z A T I O N

From (1) and (2), 2 x 1  6  0 and 2x 2  8  0 , yielding x 1  3 and x 2  4 . However, this solution does not satisfy Eqn. (4).
Hence this is an unacceptable case.
Case 2: g 1  0 (meaning that the inequality constraint, g 1 is active)
Hence,
2x 1  6  1  0
2x 2  8  1  0
x1  x 2  5  0
Solving these linear simultaneous equations, x 1  2 , x 2  3 and 1  2 . These values satisfy Eqns. (4), (5) and the
requirements that x 1  0, x 2  0 . Hence x *   2, 3 . For these values, the objective function is f ( x * )  2 .
Example 17.3-3 Constrained Minimization with inequality constraints
Find  x1 , x 2 
To maximize 4 x 12  2 x 2

Subject to 2x 1  x 2  4

x 1  2x 2  2

x 1  0, x 2  0

Solution
Step 1: First thing to notice about the problem is that it is not in the standard form. We must first convert the problem to a
minimization problem. Maximizing f ( x ) is equivalent to minimizing  f ( x ) . Hence, f ( x )  4 x 12  2 x 2 . Second, the second
constraint must be transformed to a  0 constraint. Note that g ( x )  a is equivalent to  g ( x )  a . Hence
g 2  x 1  2x 2  2  0 . As with the previous problem, we will not use the last two constraints requiring that the design
variable be positive but will use them to select the optimal solution. Form the Lagrangian as
L  4 x 12  2 x 2  1 ( 2 x 1  x 2  4 )   2 (  x 1  2 x 2  2)
Step 2: Hence, the necessary conditions to be satisfied are
L
 0  8 x 1  2 1 x 1   2 (1)
x 1
L
 0  2  1  2 2 (2)
x 2
1  2x 1  x 2  4  =0 (3)
2  x 1  2x 2  2  =0 (4)
2x 1  x 2  4  0 (5)
x 1  2x 2  2  0 (6)
1  0 , 2  0 (7)
Step 3: Using the switching conditions, there are four possibilities.
Case 1: 1  0 , 2  0 (meaning that the inequality constraints are active, g 1  0 and g 2  0 )

S. D. Rajan, 2000-24 17-451


N U M E R I C A L O P T I M I Z A T I O N

Hence, 2x 1  x 2  4  0
x 1  2x 2  2  0
8x 1  2 1 x 1  2  0
2  1  2  2  0
From the first two equations, x 1   6 5 . This is an unacceptable solution.
Case 2: 1  0 , 2  0 (meaning that the inequality constraint g 1  0 )
Hence, 2x 1  x 2  4  0
8x 1  2 1x 1  0
2  1  0
From the last equation, 1  2  0 , and is an unacceptable solution.
Case 3: 1  0 , 2  0 (meaning that the inequality constraints are inactive)
8x 1  0
20
The second equation expresses an invalid condition, and the solution is unacceptable.
Case 4: 1  0 , 2  0 (meaning that the inequality constraint g 2  0 )
Hence, x 1  2x 2  2  0
8x 1  2  0
2  2 2  0
1 15 
Solving, x 1  1 8 , x 2  15 16 and  2  1 . The constraint g 1 is satisfied. Hence, x *   ,  and f ( x * )  1.9375 .
8 16 
The two examples illustrate the strengths and weaknesses of the Kuhn-Tucker (K-T) conditions. Now, a few notes about the
first-order K-T conditions.
(a) A regular point is a candidate K-T point. In other words, a non-regular point cannot be used to satisfy the K-T conditions.
(b) Points that satisfy K-T conditions are local minimum points. However, all local minimum points are not K-T points. In
other words, K-T conditions may not be able to locate all local minimum points.
(c) It may not be possible to obtain an analytical solution to K-T conditions. This is because the resulting equations may be
nonlinear equations requiring a numerical method.

S. D. Rajan, 2000-24 17-452


N U M E R I C A L O P T I M I Z A T I O N

Exercises
Appetizers
Problem 17.3.1
An aluminum can manufacturer needs to design a (closed) cylindrical soft drink can. The volume of the can must be 400 ml.
The height, h of the can must be at least twice the diameter d and cannot be more than thrice the diameter. Packaging
considerations restrict the height to 25 cm and the usage of the least amount of sheet metal. Formulate the optimal design
problem by clearly identifying the design variables, objective function and constraints.
Problem 17.3.2
A box frame is made of steel angle and hollow tube sections as shown in Fig. P17.3-2. It must be designed for least cost. The
angle sections cost $200/m and the tube sections cost $500/m. The box frame must enclose a space of 1000 m3. A side of the
box cannot be less than 2 m. Formulate the optimal design problem by clearly identifying the design variables, objective function
and constraints.

Vertical
hollow tube
members

l
b h Horizontal
angle section
members

Fig. P17.3.2
Problem 17.3.3
A factory makes two products labeled A and B. The manufacturing of these products takes place in two stages – Stage One and
Stage Two. Departments 1 and 2 handle Stage One and Stage Two respectively. Product A requires 2 hours for Stage One and
2.25 hours for Stage Two. Product B requires 2 hour for Stage One and 1.75 hours for Stage Two. Each of the two Departments
can be operational for a total of 20 hours per day, seven days a week. The profit from the sale of product A is $1.35 per unit
and for product B is $1.2 per unit. How many units of A and B should be produced per week so as to maximize the profit?

Main Course
Problem 17.3.4
Solve Problem 17.3.1 using a graphical technique. Verify your answer using K-T conditions.
Problem 17.3.5
Solve Problem 17.3.2 using the K-T conditions.
Problem 17.3.6
Solve Problem 17.3.3 using a graphical technique. Verify your answer using K-T conditions.

Structural Concepts
Problem 17.3.7
Answer True or False. If False, state the reason(s) why.

S. D. Rajan, 2000-24 17-453


N U M E R I C A L O P T I M I Z A T I O N

(a) A feasible design is the one with the lowest objective function value.
(b) An optimal design problem must have constraints.
(c) The number of inequality constraints in a problem cannot be greater than the number of design variables for a problem to
be well-posed.
(d) Every optimal design problem has one or more optimum solutions.
(e) An inequality constraint h( x )  0 can be expressed as two inequality constraints h( x )  0 and h( x )  0 .

S. D. Rajan, 2000-24 17-454


N U M E R I C A L O P T I M I Z A T I O N

17.4 Genetic Algorithm


As you have seen earlier on in this book, problems such as root finding or solution of linear algebraic equations can be carried
out by more than one solution technique. Different techniques have different assumptions, restrictions, strengths and
weaknesses. Similarly, there are several numerical techniques that can potentially be used to solve NLP problems. They are
broadly classified as either gradient-based techniques or direct-search techniques.
Gradient-Based Techniques: The nonlinear nature of the problem makes it difficult to solve the problem directly. Instead,
the solution is obtained iteratively. The original nonlinear problem is transformed into a more manageable subproblem whose
solution can be obtained numerically. Solution to these subproblems requires that not only the function values be known but
also that their gradients (or, derivatives) with respect to the design variables be known. Some of the popular solution techniques
are Sequential Linear Programming (SLP) technique, Sequential Quadratic Programming (SQP) technique, Feasible Directions
Method, Generalized Reduced Gradient (GRG) Method, Augmented Lagrangian technique, Sequential Unconstrained
Minimization Technique (SUMT) etc. Treatment of one or more of these techniques can be found in any book on optimization
techniques.
Direct Search Techniques: The direct search techniques do not require derivatives. Hence, they have the advantage of being
able to solve not only problems where the derivatives are discontinuous but also have the ability to find the global minimum.
This advantage is offset by an increase in computational requirement – usually the function values are required at a very large
number of locations in the design space. Some of popular solution techniques are Hooke and Jeeves Method, Powell’s Method
of Conjugate Directions, Simulated Annealing (SA), Genetic Algorithm (GA) etc.
In the next section we will look at the Genetic Algorithm in some detail and understand how to use the method for solving
constrained minimization problems.
Genetic Algorithm. Genetic Algorithm (GA) is a search strategy based on the rules of natural genetic evolution. Before the
traits of genetic systems were used in solving optimization problems, biologists had used digital computers to perform
simulations of genetic systems as early as the early 50’s. The application of genetic algorithms for adaptive systems was first
proposed by John Holland (University of Michigan) in 1962, and the term "Genetic Algorithms" was first used in his student’s
dissertation.
GA’s have been used to solve a variety of structural design problems. They include the optimal design of all the structural
systems that we have seen in this text and more. Because of their discrete nature, GA’s lend themselves well to the process of
automating the design of skeletal structures. GA’s do not require gradient or derivative information. For this reason alone, it
has been applied by researchers to solve discrete, non-differentiable, combinatory and global optimization engineering
problems, such as transient optimization of gas pipeline, topology design of general elastic mechanical system, time scheduling,
circuit layout design, composite panel design, pipe network optimization, and several hundred others. GA’s are recognized to
be different than traditional gradient-based optimization techniques in the following four major ways [Goldberg, 1989].
1. GA’s work with coding the design variables and parameters in the problem, rather than with the actual parameters themselves.
2. GA’s make use of population-type search. Many different design points are evaluated during each iteration instead of
sequentially moving from one point to the next.
3. GA’s need only a fitness or objective function value. No derivatives or gradients are necessary.
4. GA’s use probabilistic transition rules to find new design points for exploration rather than using deterministic rules based
on gradient information to find these new points.
The idea behind GA is to simulate the behavior of natural evolutionary selection. Although there exist many different variations
of GA’s, the basic structure is the same as shown in Fig. 17.4.1.

S. D. Rajan, 2000-24 17-455


N U M E R I C A L O P T I M I Z A T I O N

Initial Randomly
Generated Population

Fitness
Evaluation

Competition
(Fitter individuals survive)

Mating Pool
(Reproduction phase)

Crossover and Mutation


(Exchange of information)

Offsprings
(New generation)

Fitness
Evaluation

No Stop?

Fig. 17.4.1 Flow in a Simple Genetic Algorithm (SGA)


We will explain the basic flow shown in Fig. 17.4.1 next.
17.4.1 The Basic Algorithm
The genetic algorithm is used to solve the following problem.
Minimize fˆ ( x ) (17.4.1-1a)
Subject to x kL  x k  x kU , k  1, 2,..., n (17.4.1-1b)
Note that the problem is primarily an unconstrained minimization problem with lower and upper bounds on the design
variables. We will first present the necessary background before detailing the algorithm. In the language of GA, we will be
computing fˆ ( x ) , the fitness function, not f ( x ) , the objective function. The two functions are related and the distinction
between the two will be made later.

Binary Encoding and Decoding of Design Variables


Binary encoding is the most popular way of encoding the design variables. A binary number is represented as ( bm ...b1b0 )2 where
bi is either 0 or 1. For example, 1012 is a three-digit or three-bit binary number. To understand this concept, we will first
explain the relationship between binary and decimal numbers.
( bm ...b1b0 )2  (2 0 b0  21 b1  ...  2n bm )10 (17.4.1-2)
Hence, as an example
1012   2 0  1  21  0  2 2  110   5 10
In other words, the binary number 1012 is equal to decimal 5.

S. D. Rajan, 2000-24 17-456


N U M E R I C A L O P T I M I Z A T I O N

The process of taking a decimal number and constructing its binary representation (not value) is called encoding. Decoding is
the inverse process of taking the binary encoded value and constructing its decimal equivalent.
Continuous Design Variables: A design variable x i is between x iL and x Ui . Note that x i is a decimal number. If m bits are
available to represent x i , then the precision pi with which the number is represented is given by
x Ui  x iL
pi  (17.4.1-3)
2m  1
To understand the term in the denominator, let us look at the example with 3 bits. The possible binary representations with 3
bits are 000, 001, 010, 011, 100, 101, 110 and 111. Or, 8 possible combinations. Similarly, with 4 bits we have 0000, 0001, 0010,
0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110 and 1111. Or, 16 possible combinations. In other
words, if there are m bits then there are 2 m combinations or 2 m  1 intervals. The range of values between x iL and x Ui is
divided into 2 m  1 intervals.
8 1
For example, if x L  1 , x U  8 and m  3 , then p   1 . The following table shows the relationship between the
7
binary representation and their decimal equivalents with this example.
Binary Representation Decimal Equivalent
( b2b1b0 )2 x  x L  p( b2 b1b0 )2
000 1.0
001 2.0
010 3.0
011 4.0
100 5.0
101 6.0
110 7.0
111 17.0

The decoding is achieved using


x  x L  p( bm ...b1b0 )2 (17.4.1-4)
where decimal values are used. The previous example is quite straightforward since the bounds and the precision are integers.
10  1
Now consider the following example. Let x L  1 , x U  10 and m  3 . Then p   1.28571 if we assume that the
7
precision is finite. The new relationship is as follows.
Binary Representation Decimal Equivalent

000 1.0
001 2.25571
010 3.57143
011 4.85714
100 6.14286
101 7.42857
110 8.71429
111 10.0
Integer Design Variable: The above example illustrates the encoding and decoding problems when the range is not a multiple of
2 m  1 . With integer design variables, one approach is to apply Eqn. (17.4.1-3) with the precision p being 1 and compute the
least number of bits required to achieve the precision. The number of bits obviously is an integer. For example, let x L  1 ,
x U  10 and p  1 . Using

S. D. Rajan, 2000-24 17-457


N U M E R I C A L O P T I M I Z A T I O N

x Ui  x iL
2m  1   m  log  x Ui  x iL  1 log(2) (17.4.1-5)
p
we have m  log(10  1  1)/ log(2)  3.32193 . We will round 3.32193 to the next highest integer, 4. Hence, the new precision
10  1
with 4 bits is p  4  0.6 . Once we compute the decimal equivalent, we can either truncate or round the value to an
2 1
integer.
Binary Representation Decimal Equivalent Integer Equivalent
( b3b2b1b0 )2 x  x  p( b3b2 b1b0 )2
L (rounded value)
0000 1.0 1
0001 1.6 2
0010 2.2 2
0011 2.8 3
0100 3.4 3
0101 4.0 4
0110 4.6 5
0111 5.2 5
1000 5.8 6
1001 6.4 6
1010 7.0 7
1011 7.6 8
1100 8.2 8
1101 8.8 9
1110 9.4 9
1111 10.0 10
While problems exist with the procedure (note that some integers - 2,3,5,6,8,9 appear more than once), experience has shown
that the procedure works quite well for most problems.
Discrete Design Variable: The representation is similar to integer design variables with x L  1 and x U  q where there are q
possible discrete values. The discrete values are usually stored in a table in some sorted manner and the integer value between 1
and q is used as an index to obtain the corresponding value(s) from the table.
Zero-One (Binary) Design Variable: There is nothing special that needs to be done since we need exactly one bit to represent a zero-
one design variable.
Chromosome: To represent all the design variables in a problem, we need to create the chromosome for the problem. A
chromosome is a concatenated binary string of all the binary representations of the design variables. If there are n design
variables with m  3 to represent each design variable, then the chromosome looks as shown in Fig. 17.4.1.1 with x being 0 or
1.
xxx xxx xxx ...... xxx
x1 x2 x3 xn
Fig. 17.4.1.1 Possible chromosome (or, gene)
The number of bits do not have to be equal for all the design variables nor do the design variables have to ordered from 1 to n
in the chromosome.
The basic steps in the algorithm are discussed next.
Initial Population: The first step is to create the initial population. Unlike gradient-based methods where the search for the
optimal solution takes place by moving from one point to the next, in a GA the traits of a population (of members) are used to
move from one generation to the next. Fig. 17.4.1.2 shows an initial population consisting of z members. The initial population
is usually created randomly.

S. D. Rajan, 2000-24 17-458


N U M E R I C A L O P T I M I Z A T I O N

xxx xxx xxx ...... xxx


Member 1 xn
x1 x2 x3
xxx xxx xxx ...... xxx
Member 2 xn
x1 x2 x3

...........

xxx xxx xxx ...... xxx


Member z xn
x1 x2 x3
Fig. 17.4.1.2 Initial population
With the example in Fig. 17.4.1.2, the size of the chromosome is 3n bits. A random number generator can be used to generate
a random number between 0.0 and 1.0. Invoking the random number generator 3n times, we can generate each member of
the population as follows – if the random number  0.5 is then a 0 is assigned to that bit otherwise if the number is  0.5 a 1
is assigned to that bit.
We will study the effect of the size of the population, z at the end of this section.
Fitness Evaluation
Once the initial population is generated, the actual search process starts. The chromosome is decoded to obtain the values of
the design variables x , and the fitness function value is computed for each member of the population. In other words, there
are z fitness values fˆ ( x ) that are calculated.
Reproduction
To generate the members of the next generation, the reproduction phase has at least three distinct steps. First the mating pool
is created. Typically, the weaker members (higher fitness values) are replaced with stronger members (lower fitness values). To
produce offspring, two members from the mating pool are selected and a crossover operation is carried out to create the
chromosome of the offspring. Finally, to bring diversity into the population, the mutation operation is carried out.
Mating Pool: The mating pool is constructed by selecting members from the population. We will describe two commonly used
methods. In the roulette wheel selection, the chance of being selected is based on fitness value. The individual members of the
population are mapped to segments of a line such that the length of the segment is related to its fitness value.
Individual 1 2 3 4 5 6 7 8 9 10 11 12 13 Sum
Fitness 1.20 1.50 1.70 2.20 2.50 2.70 3.90 4.50 4.70 5.20 5.50 6.10 7.80 49.50
41.25 33.00 29.12 22.50 19.80 18.33 12.69 11.00 10.53 9.52 9.00 8.11 6.35 231.21
Selection
Probability 0.18 0.14 0.13 0.10 0.09 0.08 0.05 0.05 0.05 0.04 0.04 0.04 0.03 1.00
Cumulative
Value 0.18 0.32 0.45 0.54 0.63 0.71 0.76 0.81 0.86 0.90 0.94 0.97 1.00
13 13
S
The sum of the fitness values is S   fˆi  49.5 . A scaled fitness value is created as fˆis  . Let SS   fˆis  231.21 .
5

i 1 fˆi i 1


The selection probability, pi  is . As can be seen from the table, the length of the segment is more for lower fitness values
SS
than for larger fitness values.

5 To generate a general procedure the fitness values are transformed, if necessary, so that all the values are greater than zero.

S. D. Rajan, 2000-24 17-459


N U M E R I C A L O P T I M I Z A T I O N

0.76 0.86 0.94

0 0.18 0.32 0.45 0.54 0.63 0.71 0.81 0.90 0.97 1.0
For selecting the individual into the mating pool, a random number between 0 and 1.0 is generated. For example, if 7 random
numbers are generated as  0.79, 0.10, 0.33, 0.01, 0.99, 0.51, 0.83 , then the individuals selected are  8,1, 3,1,13, 4, 9  .
In the tournament selection method, using a random number generator, two members of the population are selected. Their fitness
values are compared head-to-head and the one with the lower fitness value is put into the mating pool. This is done z times to
create the mating pool of size z . In a “double elimination” tournament selection method, all the individuals in the population
are placed in a bag. Two individuals are chosen at random. Their fitness values are compared head-to-head and the one with
the lower fitness value is put into the mating pool. These two individuals are then eliminated from the bag and the process is
repeated until the bag is empty. This will occur when the mating pool is half full. To complete the mating pool, the process is
repeated once again.
In a simple GA, once the mating pool is constructed, two parents are selected, and the reproduction process is carried out using
the crossover and mutation operators.
Crossover: There are several types of crossover operators. We will illustrate the three most commonly discussed operators.
One-point crossover. Consider two chromosomes selected randomly from the mating pool. They are labeled Parent 1 and
Parent 2 in Fig. 17.4.1.3.
Parent 1 10001001
Parent 2 00110111
Fig. 17.4.1.3 Parents selected for the crossover operation
Based on a predetermined probability a single crossover point is chosen. If the length of the chromosome is nc bits, then a
random number is generated between 1 and nc . This point or location is used as the crossover point. Two offspring are formed
and they become part of the next generation. The first offspring is formed by taking the front or left section of Parent 1 and
the rear or right section of Parent 2. The second offspring is formed by taking the front or left section of Parent 2 and the rear
or right section of Parent 1. The results are shown in Fig. 17.4.1.4.

Parent 1 10001001
Parent 2 00110111

Offspring 1 10010111
Offspring 2 00101001

Fig. 17.4.1.4 Offspring resulting from one-point crossover operation occurring at location 3
Two-point crossover. The idea of the single point crossover can be extended to include multi-point crossover locations. The
section between the first variable and the first crossover point is not exchanged. However, the bits between every other
successive crossover point are exchanged between the two parents. This process is illustrated with a two-point crossover
example (Fig. 17.4.1.5).

S. D. Rajan, 2000-24 17-460


N U M E R I C A L O P T I M I Z A T I O N

Parent 1 10001001
Parent 2 00110111

Offspring 1 10110001
Offspring 2 00001111

Fig. 17.4.1.5 Offspring resulting from two-point crossover operation occurring at locations 2 and 5
Uniform crossover. In uniform crossover, every location is a potential crossover point. First, a crossover mask is created
randomly. This mask has the same length as the chromosome and the bit value (parity) is used to select which parent will supply
the offspring with the bit. If the mask value is 0 then the bit is taken from the first parent; otherwise, the bit is taken from parent
2 as shown in Fig. 17.4.1.6.
Parent 1 10001001
Parent 2 00110111
Mask 00101011
Inverse mask 11010100
Offspring 1 10100011
Offspring 2 00011101
Fig. 17.4.1.6 Example showing uniform crossover
If two offspring are needed, the mask is used with the parents to create the first offspring and the inverse of the mask is used
to create the second offspring.
Mutation: This operator occurs much less frequently both in nature and in GA. Offspring variables are mutated by the small
random changes with a low probability. The basic idea is to introduce some diversity into the population. In other words, delay
the situation where all the population becomes so homogenous that no further improvement is possible. If the length of the
chromosome is nc bits, then a random number is generated between 1 and nc . The bit at that location is switched. An example
is shown in Fig. 17.4.1.7.
Before 10010111
After 10000111
Fig. 17.4.1.7 Example showing mutation taking place at location 4

Next Generation
The new generation is formed when sufficient offspring are generated in the reproduction phase. The whole process of fitness
evaluation and reproduction starts all over again with this new population. Obviously, somewhere along the evolutionary
procedure the iterative process is stopped. Typically, this is done if a predetermined number of iterations have been completed
or if the fitness function does not change appreciably. Unlike most gradient-based techniques, there is no convergence criterion
for the iterative process associated with the GA.
17.4.2 Problem Formulation
GA’s were developed to tackle unconstrained optimization problems. However, as was mentioned before, most engineering
and structural design problems are constrained optimization problems. The standard approach is to transform the original
constrained problem to an unconstrained problem as follows.

S. D. Rajan, 2000-24 17-461


N U M E R I C A L O P T I M I Z A T I O N

Find x
l m
To minimize fˆ ( x )  f ( x )   c i  max(0, g i )  c j  hj (17.4.2-1)
i 1 j 1

Subject to x kL  x k  x kU , k  1, 2,..., n
where c i and c j are penalty parameters. The selection of appropriate penalty weights c i and c j is always problematic even in
traditional NLP schemes. Typically, a large value is used initially for these parameters. These values are then reduced as the
design iterations continue. This feature is implemented in the EDO-GUIWB program.

17.5 Design Examples


In this section we solve several design problems using the Genetic Algorithm as implemented in the EDO-GUIWB program6.
Default values are chosen for the parameters associated with the GA with the option of changing these values if required. The
GA parameters under the control of the user are shown in Fig. 17.5.1.

Fig. 17.5.1 User-controllable GA parameters


Probability Values
Crossover: The probability value used to determine if crossover should or should not take place.
Mutation: The probability value used to determine if mutation should or should not take place.
Crossover for Design Variables
There are three types of crossovers – one point, two point and uniform, and they can be selected for use with Boolean,
discrete/integer, and real/continuous design variables.
Strategy
Selection: Selection of a member using either Roulette Wheel or Tournament selection strategies.
Replacement: None or Elitist where the best member of current population is carried over to the next generation, or Top
Few where a specified % of the top population is carried over to the next generation.
Miscellaneous Values
Seed Value: Seed value used in random number generation.
Population Factor: The size of the population is taken as Population Factor times the chromosome size.
Generation Factor: The total number of generations is taken as Generation Factor times the chromosome size.

6 The EDO-GUIWB Tutorial and User’s Manual is in the manual folder of the directory where the program is installed.

S. D. Rajan, 2000-24 17-462


N U M E R I C A L O P T I M I Z A T I O N

We will present a few guidelines for formulating and solving problems using GA’s.
(1) It is a good idea to start solving a problem with as few design variables as possible. It is easier to debug the problem
formulation with a manageable number of design variables.
(2) The selection of the lower and upper bounds must be done with care. It is necessary to have some prior knowledge of the
possible range of values that the design variables can assume. One approach is to start with a wide range and obtain the
solution. Once a solution is obtained, one can reduce the range by increasing the lower bound or decreasing the upper
bound or both.
(3) The penalty approach to handling constrained optimization problems works best if the constraints are normalized. For
example, consider a problem where 0  x 1  10 and 10  x 2  5 . Instead of writing the following two constraints as
g 1 ( x )  x 12  4 x 23  12000  0
1000
g 2 ( x )  40 x 1 x 22  0
x 2  20
one can rewrite them as
x 12  4 x 23
g1( x )  1 0
12000

g (x) 
x x x
1
2
2 2 1 20 
0 
2 4
10 400
The basic idea is to avoid very large positive and negative values.
(4) Avoid using equality constraints. More often than not, equality constraints can be rewritten expressing one design variable
in terms of the others. In other words, a design variable can be eliminated from the problem. Consider a problem where
an equality constraint is
24 x 3  4 x 4  36  0
The constraint can be rewritten as
x 4  6x 3  9
and x 4 can be eliminated as a problem parameter.
(5) Changing the default GA parameters: For those occasions when the GA does not lead to a feasible solution or leads to an
unsatisfactory solution, it may be worthwhile changing the default values of the GA parameters.
Example 17.5.1 Constrained Minimization (Box Design Problem)
Find the optimal solution to the following problem.
Find x 1 , x 2 , x 3 
To minimize f ( x )  x 1x 2 x 3
Subject to g 1 ( x )  42  x 1  0
g 2 ( x )  42  x 2  0
g 3 ( x )  42  x 3  0
g 4 ( x )  x 1  2x 2  2x 3  0
g 5 ( x )  72   x 1  2x 2  2x 3   0
0  x i  100 i  1, 2, 3

Solution
Step 1: We will rewrite the problem as follows, normalizing the constraints.

S. D. Rajan, 2000-24 17-463


N U M E R I C A L O P T I M I Z A T I O N

Find x 1 , x 2 , x 3 
To minimize f ( x )  x 1x 2 x 3
Subject to g 1 ( x )  1  x 1 42  0
g 2 ( x )  1  x 2 42  0
g 3 ( x )  1  x 3 42  0
x 1  2x 2  2x 3
g4 (x)  0
500
 x 1  2x 2  2x 3 
g5 ( x )  1  0
72
0  x i  100 i  1, 2, 3
Step 2: Based on the problem formulation, the following variables and functions are necessary.

Variable Name Remarks


x1 Design variable x1
x2 Design variable x2
x3 Design variable x3

Function Expression Remarks


-x1*x2*x3 Objective function
1-x1/42 Constraint g1
1-x2/42 Constraint g2
1-x3/42 Constraint g3
(x1+2*x2+2*x3)/500 Constraint g4
1-(x1+2*x2+2*x3)/72 Constraint g5

Design Lower Upper Precision


Variable Bound Bound
x1 0 100 0.0002
x2 0 100 0.0002
x3 0 100 0.0002

Step 3: The result of executing the GA option in the EDO-GUIWB program is shown in Fig. E17.5.1(a) after 70 generations.
The obtained solution is
x   24.1954,12.547,11.7152
The constraint values are
g  0.423919, 0.701262, 0.721067, 0.14544, 0.00999722

S. D. Rajan, 2000-24 17-464


N U M E R I C A L O P T I M I Z A T I O N

Fig. E17.5.1(a) Fig. E17.5.1(b)

To see whether we can obtain a better solution, we will reduce the upper bound of all the design variables to 30.
Step 4: With the new upper bound, the result of executing the GA option in the EDO-GUIWB program is shown in Fig.
E17.5.1(b) after another 70 generations. The obtained solution is f ( x )  3559.7 . The values of the design variables are
x   24.188,12.3136,11.9516
and the values of the constraints are
g  0.424095, 0.706819, 0.715438, 0.145437, 0.00997778

The optimal solution is x   24,12,12 with f ( x )  3456 . The 5th constraint  g 5  controls the design, i.e., is active at the
optimum.

Example 17.5.2 Column Design


Fig. E17.5.2(a) shows a column with a rectangular cross-section that must support an axial force of 100 kN. The column must
not fail due to axial stress as well as Euler Buckling (in the X-Y plane). The allowable axial stress is 20 kN / cm 2 . The modulus
of elasticity of the material is 1 GPa. The primary objective is to minimize the material volume.
P

200 cm

Fig. E17.5.2(a) Column design

S. D. Rajan, 2000-24 17-465


N U M E R I C A L O P T I M I Z A T I O N

Solution
Step 1: The design problem can be formulated as follows. Converting all quantities to cm and kN,
(i) the volume of material can be expressed as 200bh cm 3 ,
100
(ii) the axial stress in the member is  20 kN cm 2 , and
bh
 2 EI
(iii) the Euler buckling requirement can be stated as Pcr   100 kN .
L2
Hence,
Find x  {b , h}
to minimize f ( x )  200bh
5
subject to g1( x )  1  0
bh
g 2 ( x )  1  2.056(10 5 )bh 3  0
b
g3(x)  1  0
h
b, h  0
Note that all constraints are normalized.

Step 2: Based on the problem formulation, the following variables and functions are necessary.

Variable Name Remarks


b Cross-section width
h Cross-section height

Function Expression Remarks


200*b*h Objective function : column volume
5/(b*h)-1 Axial stress constraint
1-2.056e-5*b*h**3 Euler buckling constraint
b/h-1 Cross-section shape

Design Variable Lower Bound Upper Bound Precision


b 1 30 0.2
h 1 30 0.2

The choice of lower and upper bounds should be based on some knowledge of the problem. In this example, the precision is
the smallest (or, finest) precision that the program will allow. One should also ask the question – When the column is fabricated
or constructed, what is the precision (or, tolerance) with which it will be made? The basic strategy is to solve the problem in
stages. If at the end of the first stage, a refined solution is needed, then you can increase the lower bound, or decrease the upper
bound or both. The net effect is that you can then reduce the precision value and (hopefully) obtain a better solution.
Step 3: The result of executing the GA option in the EDO-GUIWB program is shown in Fig. 17.5.2(b) after 70 generations.
The obtained solution is b  4.6 cm and h  22 cm and f ( x )  20240 cm 3 . This solution is very close to the optimal solution.

S. D. Rajan, 2000-24 17-466


N U M E R I C A L O P T I M I Z A T I O N

Fig. E17.5.2(b)
The 2nd constraint  g 2  controls the design, i.e., it is active at the optimum.

Example 17.5.3 Beam Design


Fig. E17.5.3(a) shows a simply-supported beam. The beam is made of Douglas fir (modulus of elasticity =1,800 ksi , mass
density = 1.0 slugs / ft 3 ). Find the width b and height h so that the maximum normal stress is less than 2 ksi and the shear
stress is less than 0.1ksi so that the resulting beam is the lightest beam. The height of the beam should not exceed twice the
width. Neglect self-weight of the beam.
1 k/ft

15 ft
A B
Fig. E17.5.3(a)

Solution
Step 1: Using lb , in as the units, the design problem can be formulated as follows.
Find  b , h
To minimize f ( b , h )  180bh

S. D. Rajan, 2000-24 17-467


N U M E R I C A L O P T I M I Z A T I O N

1.0085(10 3 )
g1( x )  1  0
bh 2
112.05
g2(x)  1  0
bh
2b
g3(x)  1  0
h
7in  h , b  15in
Step 2: Based on the problem formulation, the following variables and functions are necessary.

Variable Name Remarks


b Cross-section width
h Cross-section height

Function Expression Remarks


180*b*h Objective function : column volume
1008.5/(b*h**2)-1 Normal stress constraint
112.05/(b*h)-1 Shear stress constraint
1-(2*b/h) Cross-section shape

Design Variable Lower Upper Bound Precision


Bound
b 7 15 Auto
h 7 15 Auto

Step 3: The result of executing the GA option in the EDO-GUIWB program is shown in Fig. E17.5.3(b) after 50 generations.

Fig. E17.5.3(b)

S. D. Rajan, 2000-24 17-468


N U M E R I C A L O P T I M I Z A T I O N

The obtained solution is b  14.3in and h  7.79in and f ( x )  20051 in 3 . The 2nd constraint  g 2  and the 3rd constraint
 g 3  control the design.

Example 17.5.4 Truss Member Design


Fig. E17.5.4(a) shows a two-bar truss. Each member is made of a solid square cross-section. The modulus of elasticity, E, is
30(10 6 ) psi. The allowable normal stress is 10,000 psi. The axial stress in the members should be less than the allowable normal
stress and the members should satisfy the Euler buckling requirement. The primary objective is to minimize the material volume.
10000 lb

A 120 in B

45 0 120 in

Fig. E17.5.4(a) Two-Bar Truss design

Solution
Step 1: Using lb and in as the problem units, the optimal design problem can be stated as follows.
Find x  {a AB , a BC }
to minimize f ( x )  V  120( a 2AB  a BC
2
)
PAB
subject to g1( x )  2
1  0
10000 a AB
PBC
g2(x)  2
1  0
10000 a BC
( Pcr ) AB
g3(x)  1  0
PAB
( Pcr )BC
g4 (x)  1  0
PBC
a AB , a BC  0
where a AB , a BC cross-section sides for members AB and BC
PAB , PBC magnitude of the axial forces in members AB and BC
 2 EI
Pcr  represents the Euler buckling capacity of the member in compression
L2
Using method of joints, PAB  10, 000 lb and PBC  14,150 lb with both the members in compression.

Step 2: Based on the problem formulation, the following variables and functions are necessary.

S. D. Rajan, 2000-24 17-469


N U M E R I C A L O P T I M I Z A T I O N

Variable Type Expression Remarks


Name
a1 Simple Cross-section side for member AB
a2 Simple Cross-section side for member BC
E Derived 30e6 Modulus of Elasticity
pi Derived 3.1415926 Pi
I1 Derived a1**4/12 Moment of inertia (member AB)
I2 Derived a2**4/12 Moment of inertia (member BC)
stress1 Derived 10000/a1**2 Stress in member AB
stress2 Derived 14150/a2**2 Stress in member BC

Function Expression Remarks


120*(a1**2+a2**2) Objective function : Truss volume
stress1/10000-1 Axial stress constraint : member AB
Stress2/10000-1 Axial stress constraint : member BC
1-(pi**2*E*I1)/(120**2*10000) Euler Buckling : member AB
1-(pi**2*E*I2)/(120**2*14150) Euler Buckling : member BC

Design Variable Lower Bound Upper Bound Precision


a1 1 2 Auto
a2 1 2 Auto

This example illustrates how to use simple and derived variables to reduce the amount of hand-calculations necessary to
formulate the design problem. We could have used more derived variables than shown above.
Step 3: The result of executing the GA option in the EDO-GUIWB program is shown in Fig. 17.5.4(b) after 70 generations.

Fig. E17.5.4(b)

S. D. Rajan, 2000-24 17-470


N U M E R I C A L O P T I M I Z A T I O N

The obtained solution is a1  1.55in and a 2  1.69in and f ( x )  631.8 in 3 . The 3rd constraint  g 3  and the 4th constraint
 g4  control the design.

S. D. Rajan, 2000-24 17-471


N U M E R I C A L O P T I M I Z A T I O N

Summary
Numerical optimization is increasingly becoming an integral part of engineering design software systems. Design engineers find
numerical optimization tools to be of immense help in the design of next generation phones, automobiles, aircrafts,
transportation systems, and so on. In this chapter, we examined in great detail one of the population-based techniques to solve
NLP problems – Genetic Algorithm (GA). While GA can be used to solve a wide variety of problems, there are other numerical
techniques that are perhaps better at obtaining the solution to a narrower class of problems. For example, gradient-based
techniques are excellent in obtaining the local minima for problems that are continuous and differentiable.

Where to go from here?


The EDO-GUIWB© program is an excellent program to learn (a) how to pose an NLP problem, and (b) how to obtain the
solution using three different optimization techniques – the Genetic Algorithm, Differential Evolution, and the Method of
Feasible Directions. Structural optimization can be carried out using the GS-USA Frame© program.

S. D. Rajan, 2000-24 17-472


N U M E R I C A L O P T I M I Z A T I O N

Exercises
Appetizers
Solve the following problems using pencil and paper.
Problem 17.5.1
Fig. P17.5.1 shows a determinate planar beam. The cross-section is rectangular (height h and width w ). The allowable normal
stress is 12 MPa and the shear stress is 5 MPa. Design the minimum volume beam so that the height is not more than twice the
width.
10 kN

30

5m 5m 3m
C
A B
Fig. P17.5.1
Problem 17.5.2
How would you formulate and solve the previous problem if the weight density of the beam material was given as 6000 N m 2
?

Main Course
Problem 17.5.3
A steel pipe is moved to place by a crane using the system shown in Fig. P17.5-3. The inner pipe diameter is 40 in and the wall
thickness is 0.375 in. The length of the pipe is 24 ft. The maximum axial load capacity of the cable is 200 000 lb. Find the
distance d between the lifting points to minimize the maximum bending stress in the pipe.

d
24 ft

Fig. P17.5.3
Problem 17.5.4
Fig. P17.5.5 shows a planar two-bar truss. Support C is located directly above A. The allowable stress in member AB is 100
MPa and in member 2 is 200 MPa. Find the cross-sectional areas of members AB and BC, and the distance to design the
minimum volume truss.

S. D. Rajan, 2000-24 17-473


N U M E R I C A L O P T I M I Z A T I O N

A B
4m

10 kN
Fig. P17.5.4

C++ Concepts
Problem 17.5.5
Develop a class CSGA that implements a simple GA, using object-oriented concepts.
Solve the following problems using computer software such as EDO-GUIWB©.
Problem 17.5.6
It is required to design a support bracket as shown in Fig. P17.5-5. Member ADC is W16x31. Member BD has a circular hollow
cross-section. Supports A and B are pin supports and connection at D is a pin connection. Design the lightest steel member
BD so that the normal stress in the member is less than 10 ksi and Euler buckling is prevented with a safety factor of 2. The
wall thickness of the pipe cannot be less than 15% of the inner radius. Also find the optimal values for x and d .

10 lb/in
A C

D
x

B
d

10 ft

Fig. P17.5.5

S. D. Rajan, 2000-24 17-474


18
C O M P U T E R G R A P H I C S

Chapter

Computer Graphics
“Ambition isa lustthatisneverquenched,butgrowsmore inflamedandmadderbyenjoyment.” ThomasOtway

“Buildabettermousetrapandtheworldwillbeatapathto yourdoor.”RalphWaldoEmerson

“It is not the greatness of a man's means that makes him independent, so much as the smallness of his wants”. William
Cobbett

Simply defined, computer graphics implies the use of computers in creating graphical images. For engineers and scientists, these
could be as simple as an x-y graph all the way to a rendition of virtual reality scenes. In this chapter we will study some of the
basics of computer graphics using Microsoft’s Microsoft Foundation Classes (MFC)1 and Fast Light Toolkit (FLTK)2.

Objectives
 To understand and practice the concepts associated with computer graphics.
 To draw and manipulate three-dimensional wireframe images using MFC.
 To draw using FLTK.

1 https://learn.microsoft.com/en-us/cpp/mfc/mfc-desktop-applications?view=msvc-170
2 https://www.fltk.org/

S. D. Rajan, 2000-24 18-475


C O M P U T E R G R A P H I C S

18.1 Introduction
Computer-generated images are of immense help to engineers and scientists as they help in the visualization and understanding
of concepts. A simple x-y graph shows the relationship between two variables. Fig. 18.1.1 shows the relationship between the
length of a cantilever beam and its tip displacement for a given tip load and beam.

Fig. 18.1.1 A simple x-y graph


A wireframe rendition of a three-dimensional object is one showing the object without the hidden lines removed. This may be
acceptable for viewing certain objects like a space truss shown in Fig. 18.1.2(a). However, for truly three-dimensional objects,
the wireframe image is usually not suitable (Fig. 18.1.2(b)).

Fig. 18.1.2 Wireframe image of (a) a 3D truss (b) a 3D bracket


More sophisticated three-dimensional displays involve removing the hidden lines (Fig. 18.1.3(a)) and rendering and shading the
object with one or more light sources (Fig. 18.1.3(b)).

S. D. Rajan, 2000-24 18-476


C O M P U T E R G R A P H I C S

(a) (b)
Fig. 18.1.3 Rendition of 3D bracket using OpenGL (a) Hidden lines removed (b) Shading with one light source
TM

In the rest of the chapter, we will examine how to display wireframe images on a computer screen. Most computer screen
displays are either liquid crystal displays (LCDs) or organic light-emitting diodes (OLED). The resolution of these devices
(display resolution) is expressed in pixels and dot pitch. For example, today the high-end displays can have a resolution of 3440
x 1440 pixels at 60 Hz, and a pixel pitch of 0.233 mm x 0.233 mm. These displays are driven by video cards that support the
display resolution and other characteristics.

18.2 Graphics Operations


In its simplest form, manipulating an image requires two operations.
Drawing a line: A straight line is drawn from a start location  x 1 , y1  to an end location  x 2 , y 2  .

Displaying text: A string of characters is displayed at a specified location  x , y  using specified font characteristics.
These operations require several lower-level instructions to generate the required information.
Obtain Drawing Area Dimensions: Query the system to obtain the dimensions of the rectangular area in which the display will be
generated.
Select Pen: Define a pen with a drawing style (solid, dashed etc.), a width (number of pixels) and a color, and select it as the current
pen to use.
Move: The instruction is to move the current position to the point specified by its x and y coordinates.
Draw: The instruction is to draw a line from the current position to the point specified by its x and y coordinates using the
currently selected pen.
Output text: The instruction is to write a character string at the specified location using the currently selected font, background
color, and text color.
In the last section of this chapter, we will see how to use these functions from the Microsoft Foundation Classes.

18.3 Transformations and Projections


Graphical images consist of points that can be manipulated in a variety of forms.

Translation
Translation of a point (x,y,z) through (a,b,c) can be simply achieved by computing the new coordinates as

S. D. Rajan, 2000-24 18-477


C O M P U T E R G R A P H I C S

x   x  a 
     
y
    y   b  (18.3.1)
z     
 new  z   c 

Rotation about Coordinate Axes


The transformation matrices for rotation about the three coordinate axes are given below.
Rotation about x
1 0 0 
T33 
  0 cos  sin   (18.3.2)
 0  sin  cos  
Rotation about y
 cos  0  sin  
T33   0 1 0  (18.3.3)
 sin  0 cos  
Rotation about z
 cos  sin  0
T33    sin  cos  0  (18.3.4)
 0 0 1 
Example 18.3.1: Consider a triangle lying on the x-y plane with coordinates as (3,-1,0), (4,1,0) and (2,1,0). Consider a
counterclockwise (or positive) rotation of 90 about the z-axis. The transformation matrix is then
 0 1 0
T33   1 0 0 
 0 0 1 
Hence, the new coordinates of the three points can be computed as P33 T33 where P33 is the matrix with each row containing
the (x,y,z) coordinates of the corresponding point. Hence
 3 1 0   0 1 0   1 3 0 
 4 1 0   1 0 0    1 4 0 
    
 2 1 0   0 0 1   1 2 0 
Example 18.3.2: Consider a triangle lying on the x-z plane with coordinates as (0,0,0), (0,0,3) and (5,0,0). Consider a
counterclockwise (or positive) rotation of 90 about the y-axis. The transformation matrix is then
 0 0 1
T33   0 1 0 
 1 0 0 
Hence, the new coordinates of the three points can be computed as P33 T33 where P33 is the matrix with each row containing
the (x,y,z) coordinates of the corresponding point. Hence
 0 0 0  0 0 1  0 0 0 
0 0 3  0 1 0    3 0 0 
    
 5 0 0   1 0 0   0 0 5 
In the following section we will see how these operations in use.

S. D. Rajan, 2000-24 18-478


C O M P U T E R G R A P H I C S

18.4 Three-Dimensional Graphics


Finally, we can put all the pieces of the wireframe display together by taking the (x, y, z) coordinates of the points that define
the object and transforming each set of (x, y, z) coordinates into the current display’s (x, y) coordinates. In order to facilitate the
transformations, we need to define three sets of coordinate system. The first is the physical or model coordinate system as
shown in Fig. 18.4.1. The units of this coordinate system can be any length units, e.g. inches, meters, etc.

Z X

Fig. 18.4.1 Physical or model coordinate system (MCS)


Based on the current viewing direction, these coordinates can be transformed into a device neutral coordinate system called
virtual coordinate system (VCS). This is a normalized, unitless two-dimensional coordinate system as shown in Fig. 18.4.2. The
coordinates could range from (0, 0) to (1000, 1000), or (0.0, 0.0) to (1.0, 1.0). Finally, the coordinates from the VCS can be used
to generate the display on the screen in the two-dimensional device coordinate system (DCS) and example of which is shown
in Fig. 18.4.3. The DCS units are typically pixels.
(0,0) (0,0)
xv
xd
yv

yd

(1000, 1000)
(1024, 768)
Fig. 18.4.3 An example device coordinate system
Fig. 18.4.2 Virtual coordinate system
Here are the key concepts as the display of the object moves from MCS to VCS to DCS. The range of x, y, z values must be
scaled so that the initial display of the object fits into the square VCS. This ensures that initially, the entire object will fit into the
screen with the center of the object centered in the VCS coordinate system. The final step is then to transform the (x, y)
coordinate of each key point in the VCS to its appropriate location in the DCS. Usually, the display is not a square and hence,
the shorter dimension (usually the height) is used in computing the scaling factor from VCS to DCS so that the object is centered
and fully visible on the screen. As we will show in the wireframe display algorithm, the scaling from VCS to DCS can be isotropic
so that there is no distortion (a square in the VCS remains a square in the DCS but there may be a large dead area on the left
and right sides of the display), or anisotropic (so that the object completely occupies the screen).

S. D. Rajan, 2000-24 18-479


C O M P U T E R G R A P H I C S

The algorithm for the wireframe display of a space truss is given below. Note that A is the dimension of the VCS and a is the
border on the four sides in which no drawing takes place. For the DCS,  x drange , y drange  represent the range of device coordinates
(pixels) available in the x and y directions.

Algorithm
Step 1 Compute model limits in physical or model coordinates
Loop thro’ all nodes to compute  X min , X max  ,  Ymin , Ymax  ,  Z min , Z max 

Compute center of model as  X mid , Ymid , Z mid  . For example,


X min  X max
X mid  (18.4.1)
2
Set T33 = I33 as the initial transformation matrix.
Step 2 Compute scaling factor
Find coordinate translation to center the viewing box in the virtual coordinate system using the formula
X i    X   X mid 
     
Y
 i  T 3 3   Y    Ymid   (18.4.2)
Z     
  Z   Z mid 
 i 31

Apply the above formula for the 8 corners of the viewing box and compute X v
min
v
, X max  , Yminv , Ymax
v
,
Z v
min
v
, Z max  using those 8 transformed coordinates.
Compute the coordinates of the center of the model as
Xb   X min
v
 X max
v

  1 v 
 Yb    Ymin  Ymax 
v
(18.4.3)
 Z  2  Zv  Zv 
 b  min max  31

 Aa Aa 
Compute scaling factor s  min  v , v v 
(18.4.4)
 X max  X min Ymax  Ymin 
v

Step 3 Draw the truss


Loop thro’ all elements.
X i    X   X mid  
     
For the start node, compute the virtual coordinates. Y
 i  T3 3   Y    Ymid   (18.4.5)
Z    Z   Z mid  
 i 31

x v    X i   X b   A 2
   s          (18.4.6)
 yv    Yi   Yb    A 2
Now compute the device coordinates using one of the following formulae:

x d   x v  0.5 
     
 yd   yv  0.5 
(18.4.6)
 x d   x x v  0.5 
  
 y d   y yv  0.5 

S. D. Rajan, 2000-24 18-480


C O M P U T E R G R A P H I C S

where
 x  x drange A ,  y  ydrange A (18.4.7a)
and   min( x ,  y ) (18.4.7b)

Move to  x d , yd  .

Repeat these computations for the end node and draw to  x d , yd  .

Example 18.4.1 (Planar Truss)


Consider the truss shown in Fig. E18.4.1.1. Draw the initial view of the truss assuming that A  1000, a  100 and
x range
d , y drange   (1024, 768) .

4m
1 Y 2

1 3
X

3m 3m
Fig. E18.4.1.1
Solution
Step 1: Here are the results.
 X min , X max   ( 3, 3) , Ymin , Ymax   (0, 4) and  Z min , Zmax   (0, 0) .

 X mid , Ymid , Zmid   (0, 2, 0) .


Step 2: The corners of the viewing box have the following coordinates before and after translating to the virtual coordinate
system.

Corner Physical Coordinates Virtual Coordinates


1 (‐3,0,0) (‐3,‐2,0)
2 (‐3,0,0) (‐3,‐2,0)
3 (‐3,4,0) (‐3,2,0)
4 (‐3,4,0) (‐3,2,0)
5 (3,0,0) (3,‐2,0)
6 (3,0,0) (3,‐2,0)
7 (3,4,0) (3,2,0)
8 (3,4,0) (3,2,0)

S. D. Rajan, 2000-24 18-481


C O M P U T E R G R A P H I C S

Hence,  X min
v v
, X max   ( 3, 3) , Yminv , Ymax
v
  ( 2, 2) and  Zmin
v v
, Z max   (0, 0) . Also,  X b , Yb , Zb   (0, 0, 0) . The scaling
factor can then be computed as
 1000  100 1000  100 
s  min  ,   150
 33 22 

Step 3: For each element, here are the computed numbers.


Element 1, Start Node
X i    3 0    3 
        
 Yi   T33   0    2     2 
Z    0  0   0
 i 31  
x v    3  0   500   50 
   150             
 yv    2  0   500   200 

Isotropic Mapping
x d   x v  0.5   50  0.5   39 
          0.768      
 yd   yv  0.5   200  0.5  154 

Anisotropic Mapping
 x d   x x v  0.5   1.024  50  0.5   52 
    y  0.5     
 yd   y v  0.768  200  0.5 154 

Element 1, End Node


X i   0  0   0 
        
 Yi   T33  4    2    2 
Z    0  0   0 
 i 31  
x v   0  0   500  500 
   150             
 yv    2  0   500  800 

Isotropic Mapping
x d   x v  0.5  500  0.5  385 
          0.768        
y
 d y
 v   0.5 800  0.5  615 

Anisotropic Mapping
 x d   x x v  0.5  1.024  500  0.5  513
    y  0.5     
 yd   y v  0.768  800  0.5  615

Element 2, Start Node


The results are the same as end node for element 1.

S. D. Rajan, 2000-24 18-482


C O M P U T E R G R A P H I C S

Element 2, End Node


X i    3  0   3
        
Y
 i  T3 3  0    2     2 
Z   0  0   0
 i 31  
x v    3  0   500  950 
   150             
 yv    2  0   500   200 

Isotropic Mapping
x d   x v  0.5  950  0.5  730 
          0.768        
 yd   yv  0.5   200  0.5  154 

Anisotropic Mapping
 x d   x x v  0.5  1.024  950  0.5  973 
    y  0.5     
 yd   y v  0.768  200  0.5 154 

(0, 0) (0, 0)
(730, 154) (973, 154)
(39, 154) (52, 154)
1 3 1 3

2 2
(385, 615) (513, 615)

(1024, 768) (1024, 768)


Isotropic Mapping Anisotropic Mapping
Fig. E18.4.1.2
The truss appears to be inverted (Fig. E18.4.1.2) and it is! Hence, we need to change the default transformation matrix as follows.
1 0 0 
T33   0 1 0 
 0 0 1 

18.5 Case Study: 3D Wireframe Viewer


Finally, we will illustrate the development of the graphical program using MFC through Example 18.5.1.

Example 18.5.1 A Simple Graphics Program


Launch the program and the initial window should look similar to Fig. 18.5.1(a) (the Count is x on the left pane will be different).
This Windows program has two panes – the background color of the left pane is black and of the right pane is white. In the
program, the left pane is managed by the CGraphicsView class and the write pane is managed by the CGraphicsLegendView
class. Both classes are derived from the CView class. A typical MFC application is event driven. That is events such as resizing

S. D. Rajan, 2000-24 18-483


C O M P U T E R G R A P H I C S

the window, clicking the left mouse button, clicking the right mouse button, typing on the keyboard etc. generate a message
that is sent to the program. The program responds to these events through message handlers. One of the key functions in any
CView derived class is the OnDraw function. The OnDraw function is called whenever the program window needs to be redrawn,
e.g., if the window is resized, or if the underlying object that is being displayed has changed, or if a window covering the program
window has closed, etc.
Now, click File, then Open and select the file star.dat. This will read the contents of the file and display the object defined in the
file star.dat – Figure 18.5.1(b). Use the Visual Studio editor to examine the contents of the file. You will see the file define the
number of points and lines, followed by the (x, y) coordinates of the five points (in pixels), and the (start point, end point) of
the five lines. If you are wondering where the file is read and the data is stored, look for the function void
CGraphicsView::OnFileOpen().

(a) (b)
Fig. 18.5.1 Program screen after adjusting the window height and width a few times (a) before the input file is read
(b) after the file star.dat is read
Let’s look at the source code for the void CGraphicsView::OnDraw(CDC* pDCX) function (open the file graphicsView.cpp) in
bits and pieces. The first part is shown in Fig. 18.5.2.

Fig. 18.5.2 Source code in void CGraphicsView::OnDraw(CDC* pDCX) function


CDC is the class used to create device-context objects, e.g., a class that
has drawing functionalities. In order to avoid flickering,
the drawing can be completely done in memory before it is projected to the screen – line 114. In order to better understand
when the OnDraw function is called, we have created a message that shows how many times the function is called – lines 115-
118. TRACE0 is a an MFC macro that displays the string Visual Studio’s Output window. In lines 121-122, we obtain the

S. D. Rajan, 2000-24 18-484


C O M P U T E R G R A P H I C S

rectangular dimensions of the left pane in Fig. 18.5.1. The background is set to black in line 125. In lines 128-129 a white pen
is created. The current pen is replaced with the white pen in lines 132-134. The text color is also set to white – line 137.

Fig. 18.5.3 Source code in void CGraphicsView::OnDraw(CDC* pDCX) function


The next part of updating the screen display is shown in Figs. 18.5.3 and 18.5.4. The objective is to draw a rectangle with the
white pen with a suitable border. Lines 142-143 capture the left pane’s height and width (in pixels). The rectangle is drawn in
lines 145-149. Note the calls to low-level instructions, pDC‐>MoveTo and pDC‐>LineTo, that updates (moves) the location of the
current position followed by drawing the four sides of the rectangle. Lines 152-161 show how the lines are drawn – move to
the start point location and draw to the end point location. Lines 164-170 show how the point numbers are displayed next to
the location of each point. Finally, in line 172, the message string constructed in line 117 is displayed at location (100, 100). The
pen that is dynamically constructed in this function is deleted and the old pen is restored – lines 173-174.

Fig. 18.5.4 Source code in void CGraphicsView::OnDraw(CDC* pDCX) function


The ScreenCoordinate function is used to compute the screen coordinates (x, y) using the current transformation matrix – see
Eqn. (18.4.2) with the center coordinates as zero.

Fig. 18.5.5 Function ScreenCoordinate


The right pane’s OnDraw function is shown in Fig. 18.5.6. The function displays the current number of points in the defined
object – lines 57-58 using the default text and window properties.

S. D. Rajan, 2000-24 18-485


C O M P U T E R G R A P H I C S

Fig. 18.5.6 Source code in void CGraphicsLegendView::OnDraw(CDC* pDCX) function


You should try to make small changes to the program to better understand how the object is displayed. Here are some
suggestions – (a) change the default transformation matrix from an identity matrix to something else – lines 61-65, and (b) create
a new input file with the definition of a different object.

18.6 Using FLTK


FLTK (pronounced “fulltick”) “is a cross-platform C++ GUI toolkit for UNIX®/Linux® (X11), Microsoft® Windows®,
and macOS®. FLTK provides modern GUI functionality without the bloat and supports 3D graphics via OpenGL® and its
built-in GLUT emulation.
FLTK is designed to be small and modular enough to be statically linked but works fine as a shared library. FLTK also includes
an excellent UI builder called FLUID that can be used to create applications in minutes.”
The very basics of FLTK can be found in Chapter 4 of the FLTK Programming Manual3 and hence, will be lightly covered
here.
(1) The header file Fl.H contains the major declarations and needs to be included in the main program.
(2) A main window is created with the creation of a Fl_Double_Window object.
(3) A widget (button, menu, drawing, etc.) can then be created and displayed with a call to show() member function.
(4) A call to ::run() function puts the program into a loop waiting for user input.
The relationships between the different entities that make the display possible are shown in Fig. 18.6.1.
Popup
Window 1

Display
Screen

Program Popup
Window 2
Window

Fig. 18.6.1 Schematic diagram showing how a typical FLTK program displays widgets

3 F. Costantini, D. Gibson, M. Melcher, A. Schlosser, B. Spitzak, and M. Sweet, FLTK 1.3.8 Programming Manual, Rev. 9.8, 2021.

S. D. Rajan, 2000-24 18-486


C O M P U T E R G R A P H I C S

The Program Window is created when the program execution starts. From within the program, multiple Popup Windows can
be launched. Withing each Popup Window the widgets supported by FLTK can be displayed as we will show in Example
18.6.1. Note that the origin of the Popup Window coordinate system is at the top left similar to what is shown in Fig. 18.4.3.

Example 18.6.1 A Simple Graphics Program


We will develop a program to create and display information in three popup windows. In the first window we will display a
triangle and a circle. In the second window, we will display a series of (x, y) values that are stored in CVector containers. In the
third window, we will use FLTK’s in-built chart widget and display a sine function. We will examine the source code starting
with the main() program shown in Fig. 18.6.2.

Fig. 18.6.2 Source code in main () showing how the first Popup Window is displayed
Lines 168-169 are used to obtain the height and the width of the available screen using Fl::h() and Fl::w(). The display strategy
is to launch the Popup Windows with their initial size being half the screen width and height. In line 175, the first Popup
Window is defined. The term Fl_Double_Window denotes a class where double-buffering is implemented to minimize screen
flickering. In lines 178-179 the first widget is defined. The first widget is a triangle. The DrawTriangle object is used to define a
triangle with vertices at (50, 10), (250, 10), and (150, 300). Similarly, the DrawCircle object is used to define a circle with the
center at (400, 400) with a radius of 150. Note that the values are in integer device coordinates. The cautionary note on line 170
should indicate that the (x, y) coordinates cannot have any value. Lines 180 and 185 are used to make the Popup Window
resizable using the mouse to resize the height and width of the window. Finally, line 188 is used to show the widgets drawn in
the first Popup Window on the screen.

Fig. 18.6.3 Source code in main () showing how the second Popup Window is displayed

S. D. Rajan, 2000-24 18-487


C O M P U T E R G R A P H I C S

Line 195 is used to define the second Popup Window (Fig. 18.6.3). Eleven (x, y) data points are defined and stored in lines 197-
198. In lines 199-200, the DrawXY object is used to define a series of straight lines that connect these eleven points with the first
point being at (0, 0) and the last point at (1000, 225). The entire window is used, and the color of the line is red. The default line
type and thickness are used. To change the line type and thickness, the following function needs to be called after setting the
color.
void fl_line_style (int style, int width=0, char* dashes=0);

Finally, Fig. 18.6.4 shows the source code to display the third window. The graph of sin( t ), 0  t  15 is created with an
increment of 0.5 (line 216) and is scaled by a factor of 125.0 (line 218) so that the entire range of values is between -125.0 and
125.0 (line 212).
Line 229 shows the call to Fl::run() function that puts the program into a loop waiting for user input either from the keyboard
or from the mouse. In this example program, execution can be terminated simply by closing the Program Window.

Fig. 18.6.4 Source code in main () showing how the third Popup Window is displayed
The actual drawing and display of the widgets takes place in the three classes – DrawTriangle that draws a triangle, DrawCircle
that draws a circle, and DrawXY that draws a series of straight lines connecting n points. Each of these classes is publicly derived
from the Fl_Widget and has two important functions – the overloaded constructor and the draw() function. The reader is urged
to review the code, e.g., lines 40-79 where the triangle is defined and drawn on demand. Finally, the overall screen display is
shown in Fig. 18.6.5.

S. D. Rajan, 2000-24 18-488


C O M P U T E R G R A P H I C S

Fig. 18.6.5 Display on the screen when at line 229

S. D. Rajan, 2000-24 18-489


C O M P U T E R G R A P H I C S

Summary
The basics of computer graphics were covered in this chapter. Functionalities in the Microsoft Foundation Classes were used
to display the wireframe image of a three-dimensional truss. The examples in this chapter illustrate the basic ideas behind a
GUI-program. MFC is an older technology that is still supported by Microsoft for a variety of reasons. There are excellent
toolboxes available that support MFC with the most popular one being the one sold by BCGSoft4. In addition, we also looked
at open-source library, Fast Light Toolkit (FLTK) that provides excellent integration with C++ source code.

Where to go from here?


Visualization of data is indispensable in understanding the system associated with the data. Open Graphics Library (OpenGL)
provides a very powerful application programming interface (API) for writing computer programs where speed and the graphics
rendering are extremely important. The images shown in Fig. 18.1.3 have been rendered using OpenGL API’s working in
conjunction with MFC.

4 https://www.bcgsoft.com/

S. D. Rajan, 2000-24 18-490


C O M P U T E R G R A P H I C S

Exercises
Appetizers
Problem 18.1
Use the truss shown in Example 18.4.1 and rotate the display by 300 about the z-axis. Compute the new screen coordinates and
draw the truss.

Main Course
Problem 18.2
Consider the space truss shown in Fig. P18.2. Draw the initial view of the truss assuming that A  1000, a  100 and
x range
d , y drange   (1024, 768) .

4
7’

6’ 2 3

1 3’
1 3
x 9’
y
2
Fig. P18.2

C++ Concepts
Problem 18.3
Develop a MFC program to display an x-y graph given a set of (x, y) data from an external file.
Problem 18.4
Modify the program ASUTruss© by adding the following capabilities.
(1) Carry out finite element analysis of the truss.
(2) Display both the undeformed and deformed truss using a specified magnification factor.
(3) Support rotation using the mouse left button.
(4) Support zoom in and zoom out using the mouse left button.

S. D. Rajan, 2000-24 18-491


C O M P U T E R G R A P H I C S

References
C. Pokorny, Computer Graphics – An Object-Oriented Approach to the Art and Science, Franklin, Beedle & Associates, Inc., Wilsonville,
OR, 1994.
J. Prosise, Programming with MFC, Microsoft Press, 1999.
E. Kain, The MFC Answer Book: Solutions for Effective Visual C++ Applications, Addison-Wesley Professional, 1998.

S. D. Rajan, 2000-24 18-492


R E F E R E N C E S

Bibliography
C++ Programming Language

Gary Bronson, C++ for Engineers and Scientists, Brooks/Cole Publishing Company, 1999.

Gary Bronson, Program Development and Design Using C++, Brooks/Cole, 2000.

Dietel and Dietel, C++ How to Program, 3rd Edition, Prentice-Hall, 2000.

Cay Horstmann, Computing Concepts with C++ Essentials, Wiley, 1997.

Stanley Lipmann, Josee Lajoie and Barbara Moo, C++ Primer, Addison-Wesley, 2005.

Rick Mercer, Computing Fundamentals with C++, 2nd Edition, Franklin, Beedle and Associates, 1998.

Scott Meyers, Effective C++, Addison Wesley, 1999.

Walter Savitch, Absolute C++, Addison Wesley, 2002.

Victor Shtern, Core C++ - A Software Engineering Approach, Prentice-Hall, 2000.

Bjarne Stroustrup, The C++ Programming Language, Addison-Wesley, 1997.

Data Structures

Gilberg and Forouzan, Data Structures: A Pseudocode Approach with C++, Brooks/Cole, 2001.

Kruse and Ryba, Data Structures and Program Design in C++, Prentice Hall, 1999.

Clifford Shaffer, Data Structures and Algorithm Analysis, Prentice-Hall, 1997.

Numerical Analysis

Atkinson, An Introduction to Numerical Analysis, Wiley, 1978.

Burden and Faires, Numerical Analysis, PWS-Kent, 1988.

Press, Flannery, Teukolsky and Vetterling, Numerical Recipes in C, Cambridge Press, 1988.

Mathews and Fink, Numerical Methods Using Matlab, Prentice-Hall, 1999.

S. D. Rajan, 2000-21 R-493


R E F E R E N C E S

Chapra and Canale, Numerical Methods for Engineers, McGraw-Hill, 2002.

Schilling and Harris, Applied Numerical Methods for Engineers Using Matlab and C, Brooks/Cole, 2000.

Rao, Applied Numerical Methods for Engineers and Scientists, Prentice Hall, 2002.

Object-Oriented Programming and Software Engineering

Cockburn, Writing Effective Use Cases (The Crystal Collection for Software Professionals), Addison Wesley, 2000.

Bergin, Data Abstraction – The Object-Oriented Approach Using C++, McGraw-Hill, 1994.

Page-Jones, Fundamentals of Object-Oriented Design in UML, Addison Wesley, 2000.

Pressman, Software Engineering: A Practitioner’s Approach, McGraw-Hill, 2001.

Schach, Classical and Object-Oriented Software Engineering with UML and C++, McGraw-Hill, 1999.

Lee and Tepfenhart, UML and C++: A Practical Guide to Object-Oriented Development, Prentice-Hall, 2001.

S. D. Rajan, 2000-21 R-494


A
U S I N G M I C R O S O F T V I S U A L S T U D I O

Appendix

Using Microsoft Visual Studio 2022


“It’sdejavualloveragain.”YogiBerra

“Thegoalofeducationistheadvancementofknowledge anddisseminationoftruth.” JohnF.Kennedy

“Giveamanafishandyoufeedhimforaday;teachamantofishandyoufeedhimfora lifetime.” Maimonides

1.0 Introduction
Microsoft Visual Studio (MSVS) is an Integrated Development Environment (IDE) that programmers use to develop (create,
debug and maintain) and launch computer programs written in a number of high-level languages. In this document we will see
how two types of computer programs – console applications and Microsoft Foundation Classes (MFC) applications can be
developed using C++. We will also see how to use the debugging features of MSVS.
Console applications are programs where user input is text-based typically via a keyboard and/or external file and output is also
text-based typically to the console (computer monitor) and/or external file. As we saw in Chapter 18, MFC (Microsoft
Foundation Class) is a collection of classes that is provided with VS 2022 to help developers build graphical-user interface (GUI)
programs to work with various flavors of Windows OS.
There are several differences between a console application and an MFC application that the reader must be aware of. Here is
a short but important list.
(1) The usual std::cout and std::cin will not work with an MFC application.
(2) There is no main program – MFC applications are event-driven.
(3) MFC provides resources normally associated with Windows applications – toolbar, menu, status bar etc.
(4) MFC applications use precompiled headers.

2.0 Building C++ Applications


In this section we will first see how to develop a console application followed by the steps for a MFC application.
2.1 Building a Console Application Suitable for Debugging
Step 1: Select a Console Application
Launch Visual Studio. The program interface should look similar to Fig. A2.1.1.

Fig. A2.1.1 Graphical User Interface upon launching MSVS 2022

S. D. Rajan, 2000-24 A-495


U S I N G M I C R O S O F T V I S U A L S T U D I O

Click File, New and then Project. Make sure that C++, Windows and Empty Project are selected as shown in Fig. A2.1.2(a).

Fig. A2.1.2(a) Selecting a C++ Empty Project


Click Next. Now, type in the Project name of the project (for example, MyFirstPgm) and make sure the location points to the
correct directory. If you wish, you should check Place solution and project in the same directory – Fig. A2.1.2(b).

Fig. A2.1.2(b) Configuring the project


Click the Create button. Visual Studio creates the project workspace as shown in Fig. A2.1.2(c).

S. D. Rajan, 2000-24 A-496


U S I N G M I C R O S O F T V I S U A L S T U D I O

Fig. A2.1.5 VS2022 user interface showing the system ready for program development. Note that the current
version is Debug and x64 (64-bit application). Both these options can be changed by a different selection from the
dropdown combo-box menu

Step 2: Developing the Source Code


We are now ready to develop the source code one file at a time. In this example, we will show how to create one header file and
one source file. Let us create the source file first. Click Project, followed by Add New Item. Fig. A2.1.6 shows the dialog box
that is launched.

Fig. A2.1.6 Dialog box for creating source code


Select Code and C++ File (.cpp). Enter the Name of the file (for example, main). Click the Add button. This launches the
editor, and you are now ready to type in the source code (Fig. A2.1.7). The file name, main.cpp, appears under the Source Files
section in the Solution Explorer window. In addition, the mouse cursor appears in the first column and first line of the C++

S. D. Rajan, 2000-24 A-497


U S I N G M I C R O S O F T V I S U A L S T U D I O

editor window. The line number is shown in the editor pane by turning that option on (Tools-Options-Text Editor-All
Languages-General, then check Line numbers).

Fig. A2.1.7 Program interface showing C++ editor ready to accept creation of file called main.cpp
Type in the C++ statements as shown in Fig. A2.1.8. Note that we have not saved the file yet (Hint: main.cpp* indicates that
the file contents have changed and that the file has not been saved).

Fig. A2.1.8 A complete Hello World-type of C++ program

Step 3: Compiling, Linking and Executing the Program


We are now ready to compile, link and execute the program. Click Build and then Build Solution. This compiles and links the
program. In other words, contents of the .cpp file are used by the compiler to create an object file (.obj file) and if successful,
the linker uses the object file to create the executable file (.exe file). Fig. A2.1.9 shows success in compiling and linking the
program to create the executable MyFirstPgm.exe.

S. D. Rajan, 2000-24 A-498


U S I N G M I C R O S O F T V I S U A L S T U D I O

Fig. A2.1.9 A successful compile and link steps


Now we are ready to execute the program. Click Debug and then Start Without debugging. The executable file is executed
and since this is a console application, a DOS-like window appears with the output that we expect to see (Fig. A2.1.10). Press
any key to close the console window.

Fig. A2.1.10 Successful execution of the program

Step 4: Modifying the Program and Adding a Header File


We will modify the main program by creating a function that would output the welcome message to the console. To edit the
main program, double click main.cpp in the Solution Explorer. Make the changes as shown in Fig. A2.1.11.

S. D. Rajan, 2000-24 A-499


U S I N G M I C R O S O F T V I S U A L S T U D I O

Fig. A2.1.11 Modified main program


Now we are ready to create the function ShowMessage () first in a separate header file showmessage.h and then the actual
function in showmessage.cpp. Click Project and then Add New Item. Unlike before (Fig. A2.1.6) select Header File (.h).
Type the name of the file as showmessage.h. Finally, type in the statements shown in Fig. A2.1.12.

Fig. A2.1.12 showmessage header file


Click Project and then Add New Item. Select C++ File (.cpp). Type the name of the file as ShowMessage. Finally, type in the
statements shown in Fig. A2.1.13.

S. D. Rajan, 2000-24 A-500


U S I N G M I C R O S O F T V I S U A L S T U D I O

Fig. A2.1.13 ShowMessage() function contained in file ShowMessage.cpp

Step 5: Compiling and linking the new version of the program


Click Build and then Build Solution. The resulting output is shown in Fig. A2.1.14.

Fig. A2.1.14 Successful build


The built program can now be executed as shown in Step 3 by clicking Debug, then Start Without Debugging.

S. D. Rajan, 2000-24 A-501


U S I N G M I C R O S O F T V I S U A L S T U D I O

2.2 Building a Single Document Interface (SDI) Application


Step 1: Select an MFC Project
Launch Visual Studio. Select from the Create a new project select C++, MFC App as shown in Fig. A2.2.1, and click Next.

Fig. A2.2.1 Selecting a MFC Project


Type in the Name of the project and make sure the location points to the correct directory. Click the Create button (see Fig.
A2.2.2).

Fig. A2.2.2 MFC Application Wizard


Step 2: Specify the type of MFC Project
Follow these steps.
(a) Click Application Type and make the selections as shown below (Fig. A2.2.3).

S. D. Rajan, 2000-24 A-502


U S I N G M I C R O S O F T V I S U A L S T U D I O

Fig. A2.2.3 Application Type selections


(b) Click Next and make the selections as shown in Fig. A2.2.4.

Fig. A2.2.4 Document Template Properties

We will use the default options for the rest of the selection. Click the Finish button and the MFC project is created (Fig. A2.2.5).

S. D. Rajan, 2000-24 A-503


U S I N G M I C R O S O F T V I S U A L S T U D I O

Fig. A2.2.5 MFC project ready for user customization


Step 3: Build and Execute the Project
Click Debug, and then Start Without Debugging, and the resulting program window is shown in Fig. A2.2.6.

Fig. A2.2.6 MyMFCApplication1 program window

S. D. Rajan, 2000-24 A-504


U S I N G M I C R O S O F T V I S U A L S T U D I O

3.0 Interactive Debugging


Debugging is the process of fixing errors in programs that fail to execute properly or fail to build at all. Some of these errors
may be due to incorrect syntax, misspelled keywords, or incorrect type matching. These errors are called compile-time errors,
and they prevent the building of programs. Once compile-time errors are fixed, the debugger can be used to locate and correct
other errors in logic, sequencing, and interactions among program functions and body.
The debugging features built into the development environment enable the user to test the user’s code. The user can set and
manage breakpoints, and view and change variables. During the debugging process, you can execute the program command
line by command line.
We will illustrate the debugging process through a modified version of the example used in Section 2.1. Before we look at the
debugging process, let us look at the terminology associated with MSVS debugging.

Output Window
As we have seen before the output window normally appears at the bottom right corner. It is in this window that compile, link,
and possibly, error messages appear.

Breakpoints
The breakpoints set the locations where execution should be stopped to allow the user to examine the program code, variables,
and to make changes, continue execution, or terminate execution. The breakpoints can be toggled on and off by the F9 key.
The debugging stage requires that at least one breakpoint is set in the code.
Action: Build the program as we discussed in Section 2.1. First, edit the files main.cpp, showmessage.h and showmessage.cpp
as shown in Fig. A3.1. Then, build the solution.

Fig. A3.1 Program ready to debug

S. D. Rajan, 2000-24 A-505


U S I N G M I C R O S O F T V I S U A L S T U D I O

Action: Position the cursor at line 9 in the file main.cpp and press the F9 key. A red dot appears in the grey border at the
beginning of the line as shown in Fig. A3.2.

Fig. A3.2 Breakpoint set at line 9


Once the breakpoint(s) is/are set, we can start debugging by selecting one of the debug options under the Build menu. Fig.
A3.3 shows the Debug menu and the shortcut keys.

Fig. A3.3 The Debug menu options


The easiest way to initiate the debugging process is to press the F5 key that launches the interactive debugger.

S. D. Rajan, 2000-24 A-506


U S I N G M I C R O S O F T V I S U A L S T U D I O

Action: Press the F5 key. The debugger launches our program, and the execution is suspended once execution reaches line 9.
The resulting screen is shown in Fig. A3.4. Note that at any stage pressing the F5 key continues the debugging process until the
next breakpoint is reached, or an error is encountered, or the program terminates.

Fig. A3.4 Execution is suspended at line 10


The yellow arrow indicates which executable statement is about to be executed. Note the new windows that appear at the
bottom – the Call Stack window on the right and several tabbed windows on the left. Individual tabbed windows can be
activated by clicking on the tab.

Call Stack
This window shows all the functions in the stack starting with the function where execution is currently suspended.

Autos Window
The Autos window (Debug -> Windows -> Autos) will display variables and expressions from the current line of code, and the
preceding line of code. Note the values of variables nA, nB and nSum (the variable is uninitialized!).

Locals Window
You can open the locals window from the Debug menu (Windows -> Locals). The locals window will automatically display all
local variables in the current block of code. If you are inside of a method, the locals window will display the method parameters
and locally defined variables.

S. D. Rajan, 2000-24 A-507


U S I N G M I C R O S O F T V I S U A L S T U D I O

Watch Window
The Watch window is used to examine user specify variables and expressions while debugging your program. You can also
modify the value of a variable using this window. Just double-click on the value and type in another value.
Action: We will step into the AddThem function. “Step into” means that the debugging execution will step into the function
and pause before the first statement in that function is executed. Press the F11 key.
The resulting screen is shown in Fig. A3.5.

Fig. A3.5 Execution suspended at the beginning of function AddThem


Action: Press the F10 key. This key is used to step to the next statement. Note the difference between the F10 key (steps to the
next statement in the same function) and the F11 key (steps into the beginning of the function). Highlight the expression n1+n2
and note that the debugger evaluates the expression and shows the value. Keep pressing the F10 key and trace the execution of
the program till the last statement in the main program after which you should press F5 to terminate execution of the program.

S. D. Rajan, 2000-24 A-508


U S I N G M I C R O S O F T V I S U A L S T U D I O

4.0 Miscellaneous Topics


4.1 Specifying Command Line Arguments (Use project Example9_6_1)
Step 1: Press Alt and F7. This launches the Property Pages dialog box (Fig. A4.1.1).

Fig. A4.1.1 Property Pages dialog box


Step 2: Expand Configuration Properties item. Click Debugging item. Now enter the command line arguments in the
Command Arguments edit box as shown in Fig. A4.1.2.

Fig. A4.1.2 all.dat hello as Command Arguments


4.2 Creating a Release Version
Step 1: Change the Solution combo-box entry to Release.

S. D. Rajan, 2000-24 A-509


U S I N G M I C R O S O F T V I S U A L S T U D I O

Fig. A4.2.1 Creating a Release version of the program


Step 2: Click Build and select Rebuild Solution.
4.3 Creating a Static Library (see MyLib project)
Step 1: Click File, New and then Project. Make sure that Visual C++, Win32 and Win32 Project are selected. Specify the
Name of the project, e.g., MyLib, and select the appropriate location. Click OK and then Next.
Step 2: Click Application Settings and select the Application type as Static library. Uncheck Precompiled Header. Click
Finish.

Fig. A4.3.1 Creating a static library


Step 3: Add files (either new or existing) to the project as you would with any other project.
Step 4: Click Rebuild Solution or Build Solution to create the library. Note that the default library is of the Debug type.
Change the project to the Release version (as shown in Section 4.2) if you wish to build a release version of the library. A sample
project is shown in Fig. A4.3.2.

S. D. Rajan, 2000-24 A-510


U S I N G M I C R O S O F T V I S U A L S T U D I O

Fig. A4.3.2 Screen output showing creation of a static library


4.4 Linking with a Static Library (See UsingMyLib project)
Step 1: Follow Steps 1, 2 and 3 as explained in Section 2.1. A sample project is shown in Fig. A4.4.1 where the project name is
UsingMyLib.

Fig. A4.4.1 Creating a console application by linking it with a static library


Step 2: Now specify the library to link against. Click Project. Then select the last menu item (… Properties) where … is the
name of the project. This launches the “… Property Pages” dialog box as shown in Fig. 4.4.2. Expand Linker, select Input
and type in the name of the library file as shown in the Additional Dependencies field.

S. D. Rajan, 2000-24 A-511


U S I N G M I C R O S O F T V I S U A L S T U D I O

Fig. A4.4.2 Linker options


Step 3: Click Build and then Build Solution. This compiles and links the program.

S. D. Rajan, 2000-24 A-512


B
S T A N D A R D T E M P L A T E L I B R A R Y

Appendix

Standard Template Library


“Don’tletschooling interferewithyoureducation.”MarkTwain

“Itisa miraclethatcuriositysurvivesformaleducation.”AlbertEinstein

“Teachingisaverynobleprofessionthatshapesthecharacter,caliber,andfutureofanindividual.Ifthepeopleremember
measagoodteacher,thatwillbethebiggesthonourforme.”A.P.J.AbdulKalam

1.0 Introduction
Standard Template Library, or STL for short, is a library of data structures and generic algorithms. The initial ideas of STL was
drafted and created by Alexander Stepanov back in 1979. Over the intervening years, Stepanov collaborated with a number of
researchers at various locations – AT&T Bell Labs and HP Research Labs. Finally, working with David Musser and Meng Lee
Stepanov presented the first draft of the STL to the ANSI/ISO committee for C++ standardization in 1993. Since then almost
all C++ compiler developers have included STL as a part of the C++ compiler suite. STL functionalities can be divided into
three parts – containers, iterators, and algorithms. We will discuss these three elements in some more detail next.

Containers
A container is an object that stores data. There are 4 major container classes in STL – sequence containers, container adaptors,
associative containers, and unordered associative containers.
Sequence Container: A sequence container contains arbitrary elements (int, float, user-defined objects, etc.) that are ordered, i.e.,
their location in the container is a function of when and how they are created but not on their value. Examples of sequence
containers include vector, deque, and list.
Associative Container: An associative container contains arbitrary elements that are sorted, i.e., their location in the container is a
function of their value. Examples of associative containers include set, multiset, map, and multimap.
Unordered Associative Container: An unordered associative container is like an associative container except as the name implies, the
elements are not ordered. Examples of unordered associative containers include set unordered_set, unordered_multiset,
unordered_map, and unordered_multimap.
Container Adaptor: A container adaptor can be used to change the interface of another component. For example, the stack
container is an adaptation of deque to provide last-in, first-out behavior.

All container classes have member functions that allow them to work with iterators.
begin() Returns an iterator that points to the beginning of the container.
end() Returns an iterator that points past the end of the elements in the container.

Iterators
An iterator is an object that can be used to move through all or part of the elements in a container and represents a specific
position in the container. A pointer is the most widely used form of the iterator. The following operations are supported.
Operator *
Returns the element at the current position. Depending on the type of the element, the ‐> can also be used.

S. D. Rajan, 2000-24 B-513


S T A N D A R D T E M P L A T E L I B R A R Y

Operator ++
Is used by the iterator to position to the next element.
Operator == and !=
These are boolean operators used to check if two iterators represent the same location or not.
Operator =
Is used to assign a position to an iterator.

For example, to traverse through all the elements in a deque object storing float values, the following loop can be used
std::deque<float>::const_iterator dqitAny;
for (dqitAny = dqAny.begin(); dqitAny < dqAny.end(); ++dqitAny)
{
std::cout << *dqitAny << ", ";
}

Algorithms
An algorithm, as we saw first in Chapter 1, is a set of clearly defined steps that defines how a particular problem can be solved
and once that can be readily translated to a computer program. Data stored in containers can be processed via several useful
algorithms such as copying, modifying, reordering, searching, sorting, etc. Since the processing functionalities are available as a
template function they can be called from any part of a program. As we will see in Section 4, sorting the elements in a vector
can be carried out simply by
std::sort (x.begin(), x.end(), std::greater<int>());
where x is a vector and std::greater is a binary function object that returns true or false depending on the values of the two
arguments compared via the > operator. A function object is an instance of a class where the operator () is defined as a member
function. No additional code needs to be written if the data type is a C++ standard data type – int, float, double, etc. Additional
code can be written, customized for other data types.

S. D. Rajan, 2000-24 B-514


S T A N D A R D T E M P L A T E L I B R A R Y

2.0 Deque
An example of a sequence container is deque (pronounced deck) that is short for double-ended queue. Its size can be
dynamically varied by expanding or contracting at its front or back end. While conceptually, a deque appears to be like a vector,
the internal implementations of the two data structures are very different. The biggest difference is that unlike a vector, elements
in a deque may not occupy adjacent or contiguous memory locations. Typically, data stored in a deque can be inserted or
removed more efficiently if these operations take place at the two ends. Here is a list of functionalities associated with the deque
sequence container template.

Example STL_DQ_Advanced
This example illustrates most of the functionalities of the deque container. Note that #include <deque> is needed to use this
template. Snippets of the example code are shown and discussed below.

The objects – a deque container storing float values and an iterator to refer to the deque object are declared in lines 61-62.

S. D. Rajan, 2000-24 B-515


S T A N D A R D T E M P L A T E L I B R A R Y

Initially 4 values (1.1, 2.0, 3.3, 4.4) are stored at the end of the deque by using the push_back function in lines 65-68. The function
display_deque is then called to show the current contents.

Note the use of a const iterator to iterate through the contents and dereferencing the pointer to access the value (line 54).
The size member function is used to display the current size of the deque object. Several operations are shown below to
manipulate the contents.
The [] operators can be used to access known
elements or locations (line 73).
New values can be inserted at specified locations
using the insert member function (line 78). This
function returns a pointer to the first location of
newly inserted element(s).

Elements can be deleted using the erase member


function (line 90). Similarly, the pop_back function
can be used to delete the last element (line 95) and the
erase function can be used to delete all the
elements (line 100).

The program output is shown below.

S. D. Rajan, 2000-24 B-516


S T A N D A R D T E M P L A T E L I B R A R Y

3.0 List
So far, we have seen two containers – vector and deque. Unlike these two, a list container does not allow random access, but
insertions and deletions are extremely fast. Here is a list of functionalities associated with the list container template.

Example STL_List_Example
This example illustrates some of the functionalities of the list container. Note that #include <list> is needed to use this
template. Snippets of the example code are shown and discussed below.

In the example, two integer lists are constructed. They are then merged into a single list and sorted. A copy of the merged list is
created. One of the instances of the list is then checked and all the duplicate values are removed. The main program is shown
below.

The push_back function is used to populate the


two lists.

The contents of the two lists are displayed on the


screen.

Note that the sorting is done to store the values in


descending order and the function greaterthan is
used to facilitate the sorting process.

The function to help sort the values in descending order is shown below.

S. D. Rajan, 2000-24 B-517


S T A N D A R D T E M P L A T E L I B R A R Y

4.0 Sorting
There are several algorithms that are supported in STL and we will look at one of the most widely useful one – sorting. Sorting
can be done on C++ data types or on user-defined objects. In the example shown below, we look at using sorting with C++
data type, integers.

Example STL_Sorting
This example shows how the sorting functionality within STL can be used. Note that #include <algorithm> and #include
<functional> are needed.

Integer values are stored in a vector container.

Note how the contents of the vector are accessed using an


iterator.

The bool template function greater is available in the


STL functional library. Similarly, the less function can
be used to sort the numbers in ascending order.

The output from the program is shown below.

S. D. Rajan, 2000-24 B-518


S T A N D A R D T E M P L A T E L I B R A R Y

5.0 Map
A map is a sorted associative container where a unique key is associated with a “value” like an array. For example, in the US,
the social security number (SSN) is a 9-digit number that is an identification number – no two people in the US have the same
number. Hence if the problem is to store information such as name, age, date of birth, gender and so on, associated with each
person, the SSN can be used as the key and the associated information as the “value”. Here is a list of functionalities associated
with the map container template.

Example STL_MAP_Example
There are two examples in the program, but we will focus only on Example 1. Note that #include <map> is needed to use the
map container. The data handled in the program deals with storing and manipulating the max. temperature in various cities on
a given day. The interactive program is used to store, delete, find, and list the <city, temperature> data. The main function
associated with this example is shown below.
Line 17 shows the declaration of the map container.

CityData is the object associated with the map container


(line 77).
The user-typed command is restricted to 5 commands
with the max. number of characters in any command as
6 characters (lines 78-79).

The user-typed command is read and the designated


function implementing the action associated with the
command is called.

Initially CityData contains no data. The AddData function is called to obtain the name of the city and the max. temperature for
the data.

S. D. Rajan, 2000-24 B-519


S T A N D A R D T E M P L A T E L I B R A R Y

Lines 35-36 show how the name of the city and the
temperature values are obtained.

Note how the data is stored using the [] operator. The


index is the key, i.e., the name of the city.

In line 46 the name of the city whose data is to be


deleted is obtained.
The find member function is used to locate the
element. If the element exists, then the element is
deleted using the erase member function.

The ListData function shows how the map container is


traversed and how the key and the value are accessed. The
class has two members – first stores the key and second
stores the value associated with the key.

A sample program execution is shown below.

The reader is urged to look at Example 2 in the program for an enlightening usage of the map container in the context of a
finite element program.

S. D. Rajan, 2000-24 B-520


S T A N D A R D T E M P L A T E L I B R A R Y

6.0 Multimap
The multimap is an associative container where duplicate keys are allowed. The elements are formed by a combination of a key
value and a mapped value. For example, one could store the names of all the members in a family tree where the last name is
the key, and the first name is the value. Here is a list of functionalities associated with the multimap container template.

Example STL_MultiMap_Example
Note that #include <multimap> is needed to use the map container. In this example program, we will expand on the map
example discussed in the previous section, by storing the temperature at various locations in a given city. The underlying data is
shown below.
The struct tl_data is used to store the temperature at a
specified location (intersection of two roads).

The multimap is defined with the city as the key (string)


and the (temperature, location) pair as the mapped
value.

The main program is shown next.


The basic objects are defined in lines 74-75.

Data is created involving two cities – Tempe and


Chandler, and 4 different locations.

Line 86 shows the call to display the current data.

The member function count returns the number of


elements associated with Chandler while
equal_range is used locate all the Chandler data.

Line 102 shows the call to clear member function


that erases all the data.

S. D. Rajan, 2000-24 B-521


S T A N D A R D T E M P L A T E L I B R A R Y

The display_map function essentially loops through


all the multimap data and displays the contents – the
key which is the city and the values (temperature and
location). Note the similarity with the map container
where first and second refer to the key and the
mapped value.

S. D. Rajan, 2000-24 B-522


S T A N D A R D T E M P L A T E L I B R A R Y

References
Nicolai M. Josuttis, The C++ Standard Library – A Tutorial and Reference, Addison-Wesley, 1999.
Herbet Schildt, STL Programming from the Ground Up, Osborne/McGraw-Hill, 1999.
David Musser, Gillmer Derge and Atul Saini, STL Tutorial and Reference Guide, 2nd Edition, Addison-Wesley, 2001.

S. D. Rajan, 2000-24 B-523


S T A N D A R D T E M P L A T E L I B R A R Y

S. D. Rajan, 2000-24 B-524


C
S T A N D A R D T E M P L A T E L I B R A R Y

Appendix

C++ Odds and Ends


“Theindividual'swholeexperienceisbuiltupontheplanofhislanguage.”HenriDelacroix

The table below shows a comprehensive but not complete list of all major C++ keywords and identifies the first time the
keyword is introduced and discussed in the book. A blank entry indicates that the keyword is not discussed in this version of
the book simply because there is no need for the keyword to be used in the context of material in the text.
Keyword Remarks Introduced
in Chapter
and Same as &&
asm To declare that a block of code is to be passed to the assembler
auto A storage class specifier that is used to define objects in a block 4
bitand Same as &
bitor Same as |
bool Boolean false-true type that can hold either the false or true literals 2
break Terminates a switch statement or a loop 3
case Used specifically within a switch statement to specify a match for the statement's 3
expression
catch Specifies actions taken when an exception occurs 4
char Fundamental data type that defines character objects 2
class To declare a user-defined type that encapsulates data members and operations or 7
member functions
const To define objects whose value will not alter throughout the lifetime of program 2
execution
consteval Used with a function that is executed at compile time
constexpr Used to declare that a function is fit for use in constant expressions.
const_cast Used to cast away the constness of variables
constinit Used with a variable that are initialized at compile time
continue Transfers control to the start of a loop 3
decltype Asks the compiler to determine the type of a specified expression
default Handles expression values in a switch statement that are not handled by case 3
delete Memory deallocation operator 8
do Indicates the start of a do-while statement in which the sub-statement is executed 3
repeatedly until the value of the expression is logical-false
double Fundamental data type used to define a floating-point number 2
dynamic_cast Casts a datum from one pointer or reference type to another, performing a runtime 13
check to ensure the validity of the cast.
else Used specifically in if-else statement 3
enum To declare a user-defined enumeration data type 3
explicit To declare an explicit constructor
export Allows a template definition to be accessible from another translation unit.
extern An identifier specified as extern has external linkage to the block 4
false Boolean literal of value zero 2
float Fundamental data type used to define a floating-point number 2
for Indicates the start of a for statement to achieve repetitive control 3

S. D. Rajan, 2000-24 C-525


S T A N D A R D T E M P L A T E L I B R A R Y

Keyword Remarks Introduced


in Chapter
friend A class or operation whose implementation can access the private data members of a 9
class
goto Transfer control to a specified label 3
if Indicate start of an if statement to achieve selective control 3
inline A function specifier that indicates to the compiler that inline substitution of the 7
function body is to be preferred to the usual function call implementation
int Fundamental data type used to define integer objects 2
long A data type modifier that defines a 32-bit int or an extended double 2
main Name of the function where execution is initiated 2
mutable Allows an object member to override constness
namespace Defines a scope 2
new Memory allocation operator 8
not Same as !
not_eq Same as !=
nullptr Represents a null pointer value.
operator Overloads a C++ operator with a new declaration 7
or Same as |
or_eq Same as |=
private Declares class members which are not visible outside the class 7
protected Declares class members which are private except to derived classes 9
public Declares class members which are visible outside the class 7
reinterpret_cast Used to convert one pointer to another pointer of a different type 12
return Returns an object to a function's caller 2
short A data type modifier that defines a 16-bit int number 2
signed A data type modifier that indicates an object's sign is to be stored in the high-order bit 3
sizeof Returns the size of an object in bytes 12
static The lifetime of an object defined static exists throughout the lifetime of program 4
execution
static_cast Performs an explicit type conversion at compile time 3
struct To declare new types that encapsulate both data and member functions 7
switch Switch statement 3
template Parametrized or generic type 4
this A class pointer which points to an object or instance of the class 9
throw Generate an exception 4
true Boolean literal of value one 2
try Indicates start of a block of exception handlers 4
typedef Synonym for another integral or user-defined type
typeid The typeid () operator returns the type of its operand
typename Within a template typename indicates that a qualified name denotes a type
union Similar to a structure, struct, in that it can hold different types of data, but a union can
hold only one of its members at a given time.
unsigned A data type modifier that indicates the high-order bit is to be used for an object 3
using using declaration and using directive 2
virtual A function specifier that declares a member function of a class which will be redefined 13
by a derived class
void Absent of a type or function parameter list 4
volatile Define an object which may vary in value in a way that is undetectable to the compiler
wchar_t Wide character type
while Start of a while statement and end of a do-while statement 3

S. D. Rajan, 2000-24 C-526


S T A N D A R D T E M P L A T E L I B R A R Y

Most of the C++ keywords not appearing in this text will be covered in the follow up text tentatively titled Software Development
for Engineers using C++ that is currently under preparation.

S. D. Rajan, 2000-24 C-527

You might also like