Rajan Object-Oriented Numerical Methods Via C++
Rajan Object-Oriented Numerical Methods Via C++
Numerical Methods
via C++ - 2 Edition
nd
S. D. Rajan
O B J E C T - O R I E N T E D N U M E R I C A L M E T H O D S
Object-Oriented Numerical
Methods via C++
S. D. Rajan
School of Sustainable Engineering & the Built Environment
Arizona State University
This book is a copyrighted document. It is against the law to copy copyrighted material on any medium except as
specifically allowed in a license agreement. No part of this book including computer programs may be reproduced or
transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or information storage
or retrieval systems, without the express written permission of the author.
©2000-24, S. D. Rajan
Chapter 9 has been expanded to introduce and show the move constructor. Chapter 18 now discusses and shows how Fast Light
Toolkit (FLTK) can be used to draw simple graphical objects. Most projects have been moved to MSVS 2022.
S. D. Rajan
Tempe, Arizona
July 2023
Why C++?
There are several choices one could make in selecting a programming language – FORTRAN, Basic, C, Java and C++. There
are several reasons for selecting C++ as the vehicle for implementing the numerical methods. C++ is a mature language with
a ready availability of mature Integrated Development Environment (IDE) on a variety of computer platforms. Students find
these environments intuitive and useful in writing and debugging programs. The student versions of these IDEs are relatively
inexpensive and sometimes free (https://www.visualstudio.com/vs/community/).
C++ supports the use of objects and all the advantages that objects have. It is easy to write engineering applications – accuracy,
speed and problem size are non-issues. Programs written in ANSI C++ can be readily extended by the addition of (system-
dependent!) graphical-user interfaces (GUIs) including computer graphics. Lastly, there are hundreds of books on programming
with C++ and perhaps, millions of man-hours of experience in programming with C++. This continuously evolving language
has provided and will continue to provide the bulk of engineering computer programs in the future.
Additional resources connected with the book are available here: http://structures.asu.edu/rajan/object-oriented-numerical-
analysis/ and they include the following:
(1) The electronic version of the book as an Adobe pdf file.
(2) All the computer programs (with ISO-compliant C++ source code), discussed in the book arranged in separate directories
and combined into a single compressed file (.rar file).
(3) Additional computer programs such as 1DBVP©, weDraw©, ASUTruss©, EDO-GUIWB© and GS-USA Frame©.
S. D. Rajan
Tempe, Arizona
July 2017
Chapters 4, 9 and 13 have been suitably modified to add material on exception handling that are explained via new example
programs.
S. D. Rajan
Tempe, Arizona
December 2018
Contents
Chapter 1 Introduction 1-1
1.1 What is a Computer? 1-3
1.2 What is a Computer Program? 1-3
1.3 Programming in C++ 1-4
1.4 What is Numerical Analysis? 1-6
1.5 What are Objects? 1-8
1.6 Why Object-Oriented Numerical Analysis? 1-8
1.7 Tips and Aids 1-9
References R-493
Appendix A Using Microsoft Visual Studio 2022 A-495
Appendix B Standard Template Library B-513
Appendix C C++ Odds and Ends C-525
Chapter
Introduction
“Thereareonlytwo kindsoflanguages: theonespeoplecomplainaboutandtheonesnobodyuses..”BjarneStroustrup
“Theworldhateschange,yetitis theonlythingthathasbroughtprogress.”CharlesF.Kettering
“Mentakeonlytheirneedsintoconsideration,nevertheirabilities.”NapoleonBonaparte
The steam engine is said to have fueled the Industrial Revolution. In a similar vein, the microprocessor has fueled the
Information Age and affected every single facet of humanity from health to education to work and play. Scientists and engineers
have played a pivotal role in fueling this revolution. Developments in computer languages as well as the development of
numerical analysis techniques have made it possible to create computer systems that can perform amazing tasks – allow two
people thousands of miles away to communicate with each other, fly a spacecraft from the earth to Mars, help in the design and
manufacture of artificial limbs, create virtual environments for the development and testing of aircrafts, automobiles, and a host
of other products.
Learning a new human language can be a daunting task. But as linguists would tell you, the key to learning a new language is to
read, write, listen, and speak as much as possible. Learning a language involves knowing the “alphabet”, sentence construction,
the grammar, ability to read, write and speak. What does one mean when one says, “I am fluent in Spanish.”? Does that indicate
a fluency in reading, or writing, or speaking, or all? What does fluency mean? Are there different grades of fluency? There are
no definitive answers to these questions. While the situation with computer languages is similar, there are subtle differences.
Language standards developed by American National Standards Institution (ANSI) and International Standards Organization
(ISO) have strongly discouraged the proliferation of language dialects. On paper, a program written using the standards should
compile and execute on any hardware-software platform. These standards evolve over time and are agreed on by the Standards
Committee unlike human languages that have their own evolutionary scheme. The most important difference is that one can
make mistakes in “writing and speaking” programs and learn to correct them anonymously without peer comments or stranger
criticism!
Tens (if not hundreds) of computer languages have been developed and used by programmers throughout the world over the
last several decades. Some of these include BASIC, FORTRAN (FORmula TRANslation), COBOL (COmmon Business
Oriented Language), Lisp, Ada, Pascal, C, Smalltalk, Java and so on. The C++ language is an extension of C. Bjarne Stroustrup
developed this language in the early eighties at AT&T Bell Laboratories. It would be incorrect to refer to C as a subset of C++
or C++ as a superset of C. C++ has features that make it a programming language of choice for business, scientific and
engineering applications.
We hope this book will open your minds (and the doors) to making this world a better place to live in.
The constructs of the C++ language are slowly introduced throughout the text. The basics of the language are introduced in
Chapter 2. More useful ideas and constructs are discussed in Chapters 3 and 4. This background is enough to start a serious
study of numerical analysis techniques. We start with basic ideas such as approximations and series expansions in Chapter 5
followed by roots of equations and numerical integration and differentiation in Chapter 6. In Chapter 7 we see a gentle
introduction to object-oriented (OO) ideas. The applications of OO ideas especially with regards to development of scientific
and engineering software development are shown in Chapters 8 and 9. Having introduced some of the basic building blocks of
numerical analysis – arrays (vectors and matrices), we move on to Chapter 10 where we see a number of matrix operations
including solutions to linear algebraic equations. We follow this with associated numerical analysis ideas – interpolation,
polynomial approximation and curve fitting.
We see more advanced ideas and topics in the second half of the book. In Chapter 12, file handling constructs are discussed.
We follow this in Chapter 13 with more advanced ideas dealing with classes and objects such as inheritance and polymorphism.
In Chapters 14 through 17, we see important numerical analysis ideas dealing with ordinary differential equations, partial
differential equations, eigenproblems and numerical optimization. Finally in Chapter 18, we see an introduction to computer
graphics.
C
C --- CONVERT TO NEWTONS
NEWTONS = 4.448*POUNDS
C
C --- DISPLAY THE CONVERSION
WRITE (*,101) POUNDS, NEWTONS
101 FORMAT (1X, 1PE15.8, ' POUNDS IS EQUAL TO ', 1PE15.8,
1 ' NEWTONS')
C
C --- ALL DONE.
STOP
END
Typically, one would create the source code using an editor. An editor is a computer program that allows the creation of and
subsequent editing of the contents of a text file. Microsoft Word© or Notepad are examples of an editor. Programmers use a
custom-made environment called an Integrated Development Environment (IDE) to develop, write, edit, debug and execute
computer programs. The source statements are stored in one or more files, and they have a special file extension. C/C++
source files have file extensions as .cpp, .c, .h etc.
Compile: Once the source statements are ready, they need to be compiled. A compiler (another computer program!) takes the
source statements (in one or more files) and creates object files. The compiler issues warnings and error messages if the source
statements do not follow the C++ syntax. One can look at object files as intermediate files that by themselves cannot be used
but are needed to create the executable image. Object files have file extensions as .obj, .o etc.
Link (or Build): Once all the source statements are compiled and the object files are created, the linker (another computer
program!) is used to “tie” the object files to other C++-enabled components (or libraries) so as to produce the executable image.
Executable files have file extensions as .exe etc. though on Unix systems, by default, the linker created executable image is called
a.out. If the linker is not able to find all the components, it will issue error messages and the executable image is usually not
created.
Execute: Once the executable image or the program is created, it can be executed on the hardware-software platform for which
it was developed. Programs may not execute correctly because of either logical errors or run-time errors. If a program runs from
start to finish without any errors but does not produce the correct output is said to have logical errors. On the other hand, if
the program “crashes” during execution then it has encountered a run-time error. Examples include illegal operation (divide by
zero, overflow, underflow, etc.), illegal memory access (access violation) etc.
Debug: Programs rarely work correctly the first time. Finding and correcting both logical errors and run-time errors are
challenging. However, there are systematic approaches and debugging tools that can be used to find these errors in programs
small and large.
Writing excellent computer programs is an art whereas the skills necessary for writing good computer programs can be learnt
through good programming habits (much as learning good scientific or engineering practices). There are three distinct
components that we must deal with in our quest to program effectively. First, we have the language itself. Second, is the
environment in which the computer programs are developed, written, debugged and executed. A good IDE certainly helps.
Learning to effectively use IDE tools – editor, debugger etc. can be a life saver. Last, we have the programmer with his or her
thought processes, ability to visualize, plan and execute, idiosyncrasies, troubleshooting capabilities, and hopefully, a whole lot
of patience.
As we will repeatedly see in this book, skills can be mastered through practice and hard work. Note that there is nothing more
satisfying than a completed, running program however little or much it may do!
An interesting site that can be used as a starting point to have answers to questions about C++ is
https://isocpp.org/faq
I will use four (philosophical) questions and answers from that site.
Question: Is C++ a practical language?
Answer: Yes. C++ is a practical tool. It's not perfect, but it's useful.
In the world of industrial software, C++ is viewed as a solid, mature, mainstream tool. It has widespread industry support which
makes it "good" from an overall business perspective.
Question: Is C++ a perfect language?
Answer: Nope. C++ wasn't designed to demonstrate what a perfect OO language looks like. It was designed to be a practical
tool for solving real world problems. It has a few warts, but the only place where it's appropriate to keep fiddling with something
until it's perfect is in a pure academic setting. That wasn't C++'s goal.
Question: What is the big deal with OO?
Answer: Object-oriented techniques using classes and virtual functions are an important way to develop large, complex software
applications and systems. So are generic programming techniques using templates. Both are important ways to express
polymorphism – at run time and at compile time, respectively. And they work great together in C++.
There are lots of definitions of “object oriented”, “object-oriented programming”, and “object-oriented programming
languages”. For a longish explanation of what Stroustrup thinks of as “object oriented”, read Why C++ isn’t just an object-oriented
programming language1. That said, object-oriented programming is a style of programming originating with Simula (about 40 years
ago!) relying on encapsulation, inheritance, and polymorphism. In the context of C++ (and of many other languages with their
roots in Simula), it means programming using class hierarchies and virtual functions to allow manipulation of objects of a variety
of types through well-defined interfaces and to allow a program to be extended incrementally through derivation.
Question: Is C++ better than Java? (or C#, C, Objective-C, JavaScript, Ruby, Perl, PHP, Haskell, FORTRAN,
Pascal, Ada, Smalltalk, or any other language?)
Answer: Stop. This question generates much much more heat than light. Please read the following before posting some variant
of this question.
In 99% of the cases, programming language selection is dominated by business considerations, not by technical considerations.
Things that really end up mattering are things like availability of a programming environment for the development machine,
availability of runtime environment(s) for the deployment machine(s), licensing/legal issues of the runtime and/or development
environments, availability of trained developers, availability of consulting services, and corporate culture/politics. These business
considerations generally play a much greater role than compile time performance, runtime performance, static vs. dynamic
typing, static vs. dynamic binding, etc.
Those who ignore the (dominant!) business criteria when evaluating programming language tradeoffs expose themselves to
criticism for having poor judgment. Be technical, but don’t be a techie weenie. Business issues really do dominate technical
issues, and those who don’t realize that is destined to make decisions that have terrible business consequences — they are
dangerous to their employer.
The most widely circulated comparisons tend to be those written by proponents of some language, Z, to prove that Z is better
than other languages. Given its wide use, C++ is often top of the list of languages that the proponents of Z want to prove
inferior. Often, such papers are “published” or distributed by a company that sells Z as part of a marketing campaign.
Surprisingly, many seem to take an unreviewed paper written by people working for a company selling Z “proving” that Z is
best seriously. One problem is that there are always grains of truth in such comparisons. After all, no language is better than
every other in all possible ways. C++ certainly isn’t perfect, but selective truth can be most seductive and occasionally completely
misleading. When looking at a language comparison consider who wrote it, consider carefully if the descriptions are factual and
fair, and also if the comparison criteria are themselves fair for all languages considered. This is not easy.
It should be noted that a programming language is a vehicle; the programmer is the driver and must chart the course.
1 https://www.stroustrup.com/oopsla.pdf
Algorithm
Solutions to problems usually follow a specific path – examination of the problem statement, identification of the problem
parameters by differentiating between the input and the output parameters, recognizing what theoretical details (methods,
techniques etc.) are applicable, and finally, development of the algorithm that bridges the gap between the input and the output
parameters.
For the problems discussed in this text we will define the solution process in terms of algorithms. An algorithm is a sequence
of detailed steps that is general enough to be applicable for most situations in solving a problem. These steps involve the input
variable(s) to the procedure, and lead to the determination of the output variable(s).
Steps in a typical algorithm involve input, output, assignment, and control structures.
Input and Output
Every algorithm involves either input or output or, more often than not, both. For example, the general procedure for the
analysis of beam deflections using the solution of an ordinary differential equation uses the beam cross-sectional and material
properties, the different span lengths, the loading on the different spans, and the manner in which the beam is supported as
input to generate the output – the rotations, displacements, and the internal forces along the beam.
Assignment
An assignment in the form of an equation or expression is made up of one or more of algebraic operators, variables, constants,
and mathematical functions. Examples of algebraic operations include addition, subtraction, multiplication, division etc.
Examples of mathematical functions and operators include sin , , etc. We will be introduced to specific C++ examples
of assignments in Chapter 2.
Control structures
While an algorithm may have several steps, the computations are executed in a specific order that may involve only a few steps,
or certain steps may be executed repeatedly. Control structures help the programmer in sequencing the execution of the steps
in an algorithm. Research in program development has shown that control structures can be divided into three types – the
sequence structure, the selection structure, and the repetition structure. Sequence structure means that statements are executed in order –
this is the manner in which an algorithm is developed in terms of ordered steps. As the name suggests, the selection structure
involves execution or skipping of specific steps in an algorithm. Finally, the repetition structure involves repeated execution of
a sequence of steps. We will be introduced to specific C++ examples of control structures in Chapter 3.
We will illustrate these ideas through an example.
1.4.1 A Sample Algorithm
Consider the problem of finding a root, x̂ of a nonlinear equation given as f ( x ) 0 . A well-known solution technique is the
Newton-Raphson Method that we will see in detail in Chapter 6. The basic idea is to start with an initial guess, x 0 for the root.
A better estimate x k 1 for the root is generated as
f (x k )
x k 1 x k k 0,1, 2,... (1.4.1.1)
f x k
The iterative process of finding a better estimate continues until an appropriate termination criterion is reached. For example,
one could establish the maximum number of iterations, kmax , or one could compare the change in the estimate against a
predefined tolerance, , as x k 1 x k , or compare the change in the function value against a predefined tolerance, , as
f x k 1 f x k , or even f ( x k 1 ) .
Before a computer program is written, one must translate the theory and process discussed above into an algorithm. A good
algorithm is a detailed set of instructions that a computer programmer can use to translate into appropriate computer statements.
Algorithm for Newton-Raphson Method
(1) Input: Establish the values for kmax , , , (see Step 3) and the initial guess x 0 .
(2) Set k 0 .
(3) Compute f ( x k ) , f ( x k ) .
(4) If f ( x k ) , note that the solution did not converge and go to Step 8.
Problem Description
2k 4k 2k
10’ 10’ C
A B
15’
D
Abstraction
Model
...
Define a truss
...
Read the input data.
...
Analyze.
...
Generate the output.
...
The proficiency of higher-level OO model should provide the software designer with real-world, programmable
components, thereby reducing software development costs.
Its capability to share and reuse code with OO techniques will reduce time to develop an application.
Its capability to localize and minimize the effects of modifications through programming abstraction mechanisms will
allow for faster enhancement development and will provide more reliable and more robust software.
Its capability to manage complexity allows developers to address more difficult applications.
Object technologies lead to reuse, and reuse of program components leads to faster software development and higher-quality
programs. Object-oriented software is easier to maintain because its structure is inherently decoupled. This leads to fewer side
effects when changes must be made and less frustration for the software engineer and the customer. In addition, object-oriented
systems are easier to adapt and easier to scale – large systems can be created by assembling reusable subsystems [Pressman,
2001].
Summary
In this first chapter of the book, we saw the basic ideas associated with computer programming using C++ and its links to
numerical solution techniques. It is important that readers be aware that the basic quest in the book is to develop and use reliable
tools to solve scientific and engineering problems.
Exercises
Appetizers
Problem 1.1
Research the world wide web to find answers to the following questions.
(a) What other computer languages not mentioned in this Chapter were or are being used by programmers? Write a short
paragraph on each of these languages.
(b) What hardware advances have taken place in the last 5 years that are now available in computer systems?
(c) What are the most commonly used operating systems? What hardware platforms do these operating systems require?
Problem 1.2
Write an algorithm for finding the maximum of a set of numbers.
Problem 1.3
Write an algorithm for finding the minimum of a set of numbers.
Problem 1.4
Consider an airplane as an object. Identify its attributes. List its behavior that one may want to incorporate in a computer
program.
Problem 1.5
Are the following hardware, software, firmware or none of the above? (a) Microsoft Excel© (b) CPU (c) Adobe Acrobat© (d)
Cache (e) BIOS (Basic Input/Output System) (f) Personal Device Assistant (PDA).
Main Course
Problem 1.6
Consider a point in space defined in terms of its x , y , z coordinates. Write an algorithm (including error detection) for each
one of the following tasks.
(a) Compute the distance between two points.
(b) Compute a unit vector between two points.
(c) Compute the shortest distance of a point from a straight line. Assume that the line is defined by two points.
Problem 1.7
Consider each of the following as an object. Identify their attributes. List their behavior that one may want to incorporate in a
computer program. (a) Time (b) Bank account (c) Employee (d) Fuel sources.
Problem 1.8
Make a list of frequently asked questions (FAQs) on how to use your favorite IDE. Get the answers to these questions.
C++ Concepts
Problem 1.9
Write an algorithm for the tic-tac-toe game. Make the following assumptions. The game will be played exactly once. Who (user
or computer) will play first will be determined by a coin toss. The person who plays first marks an X and the other person uses
an O. The cells in the grid are identified by numbers 1 through 9 - the user inputs a valid number 1 through 9. You have to
develop strategies for the computer (aim being to win the game if possible).
Problem 1.10
Describe the steps that you would take to convert the ideas from Problem 1.9 into a computer program.
References
Gary Bronson, C++ for Engineers and Scientists, Brooks/Cole Publishing Company, 1999.
Gary Bronson, Program Development and Design Using C++, Brooks/Cole, 2000.
Dietel and Dietel, C++ How to Program, 3rd Edition, Prentice-Hall, 2000.
Cay Horstmann, Computing Concepts with C++ Essentials, Wiley, 1997.
Stanley Lipmann, Josee Lajoie and Barbara Moo, C++ Primer, Addison-Wesley, 2005.
Rick Mercer, Computing Fundamentals with C++, 2nd Edition, Franklin, Beedle and Associates, 1998.
Scott Meyers, Effective C++, Addison Wesley, 1999.
Walter Savitch, Absolute C++, Addison Wesley, 2002.
Victor Shtern, Core C++ - A Software Engineering Approach, Prentice-Hall, 2000.
Bjarne Stroustrup, Programming: Principles and Practice Using C++, Addison-Wesley, 2009.
Chapter
Programming in C++
“Maninventedlanguagetosatisfyhis deepneedtocomplain.”LilyTomlin
“Amanofgreatmemorywithoutlearninghatharockandaspindleandnostafftospin.”GeorgeHerbert
“Realprogrammerscanwriteassemblycodeinanylanguage.”LarryWall
In this chapter we will see the power of the C++ language through simple yet powerful programs. We will start by examining
a short and complete program to show the different components that are found in most C++ programs. Learning to use C++
is in some ways learning a human language. You will have to learn the syntax of the language. You will have to practice on how
to use the language correctly. You will have to seek the help of more experienced people when you encounter a problem.
Sometimes, you will have to put logic and common sense aside and accept the idiosyncrasies of the language. And finally, you
will have to be very patient, organized and determined.
Use a hands-on approach to programming. Try to build and execute the program. There is simply no substitute for practice,
practice, and practice in learning how to develop computer programs. It is highly recommended that you use an Integrated
Development Environment (IDE) to develop, debug and refine your programs. What is an IDE? An IDE is an environment
(a visual computer program) that gives a programmer a variety of tools to make the development and maintenance of programs
easier to do. A typical IDE provides (a) an editor to create and edit the source code, (b) a program “make” capability – a tool
that instructs the compiler and the linker on what and how to compile and link a program, and (c) an interactive debugger that
would help the programmer step through and debug the program etc. Microsoft Visual Studio is an example of an IDE that
provides an environment for building applications or programs in a variety of languages – Basic, C++, FORTRAN etc.
This chapter is just the beginning. However, it will provide a quick jump start and rapidly introduce several useful features. A
full understanding of the features will come as we use a feature repeatedly throughout the text.
C++ is an extremely powerful language. We will learn more about the language features and capabilities throughout this text.
Finally, it is recommended that you go through the entire chapter, perhaps more than once, in order to get a firm grasp of the
basics – creating, editing, compiling, linking, executing and debugging a program.
Objectives
To understand the basic syntax of C++ programs.
To understand the concepts associated with data types, variables, arithmetic expressions, assignment statements and simple
input and output.
To understand and practice writing complete C++ programs.
To compile, link and execute programs.
To learn the art of troubleshooting.
SUPPLEMENTAL MATERIAL
All sample programs shown in this text are available on the book web site. The programs have been compiled and executed
using Microsoft Visual Studio 2022 compiler running under Windows 10/11. Unless otherwise noted, the programs should
compile and execute on all ISO-compliant systems.
2.1 Introduction
Every language, whether human or computer based, has its syntax. We must recognize and use the syntax. Unlike human
languages and their usage, the computer language syntax is unforgiving and hence must be followed correctly. If we don’t, it
would be impossible not only to execute computer programs correctly but also it may not be possible to build computer
programs. Below we present a small but complete C++ program. This program is designed to display the string Welcome to
Object‐Oriented Numerical Analysis. on the screen.
The source code to a typical computer program is made up of several lines of input contained in a text file. A text file is a file
that can be viewed in an editor or viewer such as Microsoft Windows Notepad. Often, the file extension associated with a C++
source file is cpp. For example, naming a file main.cpp would imply that the file contains C++ source code.
Example Program 2.1.1 A Simple C++ Program
In the example shown below, the contents of the actual text file (also called source code) are shown with the line numbers on
the left so that references to individual lines can be made later in the text. The color coding is via Microsoft Visual Studio editor.
main.cpp
The example shows some program components of a long list that you will possibly find in a C++ program. The lines of input
in a C++ program are free format input meaning that one can type the input starting at any location in a line. One can also
break the input into several lines. There are exceptions to this rule as we will see later.
How are the statements executed? In C++ as in most other languages, the statements are executed sequentially as they appear
in the program. In other words, the statement in line 1 is executed first, followed by the execution of the statement in line 2 and
so on. Sometimes, some lines are not executed because they are comment lines and sometimes the program logic requires that
these statements be skipped.
Lines 1 through 9 are comment lines. These lines are ignored by the compiler. Good programming habit is to add comment
lines to all programs not only to explain to ourselves what is being done in the program but also to help others who may have
to use our computer programs. A comment section starts with the pair /* and ends with the pair */. Line 10 is a blank line that
has been used to improve the readability of the program as are lines 12 and 16. In line 11 a different style of adding comments
is shown – anything that follows (to the right of) // on a line is treated as a comment and is ignored.
A typical C++ program will not be totally self-contained. It will leverage components created by others. Those starting new
with C++, may find it confusing as to what is a part of the language, what is not and how the language can be extended by the
programmer to meet the programmer’s requirements. C++ has a set of reserved keywords and symbols (or tokens) that are a
part of C++ language. In other words, as a programmer, these keywords and symbols should be used in a specific manner –
following the syntax associated with the keyword or symbol. For example, the reserved keywords used in the program are the
following: include, using, int, main, and return. Each of these keywords has a special syntax. For example, the #include
statement has the following syntax.
#include filename
The reserved symbols used in the program are # ; { } “ ” \ / * : <<. C++ provides additional functionalities through the
use of library functions etc. If we wish to use them, then we must explicitly state in our program what functionalities are being
used. The cardinal rule in C++ is that declarations must precede usage. Otherwise, there is ambiguity in the statements that
follow. We will introduce and discuss these reserved symbols throughout the text.
Going back to the example, we need to find a way to display the string on the computer screen. The correct terminology is
output the string to the screen. C++ provides several classes for input and output. The class that is used here is called iostream.
The information about the iostream class is contained in the file called iostream (such files are called header files1). The specific
object associated with this class that is most commonly used for outputting streams of characters is called cout(std::cout is
an ostream class object taken from C++’s standard library). Whenever cout is used, it must be followed by the operator << that
is then followed by the information to be displayed contained in a proper form. We will see more about this in Section 2.5.
Since the cout statement is used later in the program, the include statement is used to tell the compiler that we wish to use the
iostream class (line 11). The # sign used before the include keyword, must be placed in the first column (some compilers will
accept # as the first nonblank character in the line). The declaration in line 11 makes it syntactically correct when cout is used in
line 15.
Every C++ program must have one and only one special function called main. A typical function is a program component that
is usually called from other parts of the program. They are used to simplify the development of computer programs. main is a
function! However, the main function is not called in any program. Instead, the program execution starts in the main function.
We will study functions in greater detail in Chapter 4. The main function starts on line 13. The int keyword signifies that an
integer value will be returned from the main (function) to the program that calls main. The body of main is contained within the
symbols { and }. These symbols are known as curly braces or brackets. Hence, the three lines 15 through 17 form the body of
main. Line 15 carries out the only task that this program is designed for. In this example, the literal contents of the character
string between " " symbols are output to the screen except for \n. \n is a special formatting symbol that signifies a new line
character should be sent to the screen so that the cursor rests on the next line. The generated output is shown in Fig. 2.1.1(a).
Fig. 2.1.1(b) shows the program output if \n is not used. Finally, since we declared main to return an integer value, the last
statement in line 17 has the statement to return a zero value. Note that the ; (semicolon) symbol is used to terminate a statement.
The syntax associated with the use of a particular feature will indicate if a semicolon is required or not. As we can see from this
example, the output via cout and the return statements require a semicolon to terminate the statements.
Go ahead and compile, link and execute Example 2.1.1.
1 A header file contains function prototypes that give the compiler information on the functions used in a program.
Expressions: Expressions are created using constants, variables, operators, functions etc. Expressions are evaluated and the result
is a value.
Statements: A statement is made up of one or more of the following - variables, expressions, C++ keywords and tokens.
Input and Output: Information is acquired by the program as input from the keyboard, mouse, file, etc. and information is sent
by the program to the monitor, file, printer etc.
Functions: A function is a program component that can be called from other parts of the program. We will see the syntax and
functionalities of functions, and when and where functions should be used later in Chapter 4.
double 64 bits long. Floating point value in the range 1.7(10 308 ) (15 digits precision) 1.45
An integer value, unlike a floating point value has no fractional component. The values –345 and 12000 are examples of integer
values. The values 1.34 and –0.0045 are examples of floating-point values. By default, all floating-point numbers are of type
double. Hence, 1.34 is a double constant whereas 1.34f is a float constant. Note that an L at the end of an integer value
signifies that the constant should be treated as a long data type. Similarly, an f at the end of a float number signifies that the
number is a float data type. As we will see later, there are other types of data including ones that the user or programmer can
define and store.
To store values that may change over the course of execution in a program, we use variables. In other words, variables can be
manipulated to store values. A variable is identified with a name. A variable name can have many characters (typically between
1 and 31), starts with an alphabet, and typically is made up of the following characters – a through z, A through Z, 0 through 9,
and _. No blank spaces are allowed. Examples of valid variable names:
a345, z_helper, Alpha
The following variable names are invalid.
1Alpha, a Temperature, B$65
Note that each variable can store only one value. Such a variable is called a scalar variable. When several values need to be stored,
we can use a vector variable. We will look at vector variables later in the chapter.
To understand the usage of the variables in a computer and the type of data that is stored in them, we will use the following
convention. You may find that other books or software firms have different conventions. Some of the data types have not been
discussed as yet and can be ignored for now.
Here are a few examples illustrating how we can declare variables for different data types in a computer program.
int nV; // one integer variable
int nA=1, nB=3, nC=5; // three integer variables with initialization
int nA(1), nB(3), nC(5); // three integer variables with initialization
double dPrecision; // double precision variable
float fX, fY, fZ; // float variables
bool bStatus=true; // boolean variable with initialization
char cOperation=’+’; // character with initialization
Note the different styles in declaring the variables. For example, the statement
int nV;
declares an integer variable whose value is currently unknown. Such variables are called uninitialized variables. Good
programming practice requires that variables be used only after they are initialized. In other words, the declaration for
nV should be followed by a statement later in the program where an integer value is assigned to nV. Now consider the
following declaration.
bool bStatus=true; // boolean variable with initialization
Here the boolean variable bStatus is declared and initialized with the true value. A variable must be declared once and
only once before it is used in any program segment.
2.3 Expressions
An expression is made up of a sequence of tokens or basic elements that can be evaluated. For example,
nValA + nValB
is an expression that is made up of two variables nValA and nValB and the addition operator, +.
Mathematical Operators
The following table lists the mathematical operators you can use in constructing an expression.
Operator Meaning
+, ‐ Unary positive, negative
+ Addition
‐ Subtraction
* Multiplication
/ Division
% Modulus (or remainder)
Operator Precedence
The operators have default precedence when used in combinations. The default precedence can be overridden by using
parenthesis. The order of ascending precedence is the following.
precedence
0. = Assignment
1. + Addition
1. ‐ Subtraction
2. * Multiplication
2. / Division
2. % Modulus
3. () Parenthesis
4. +, ‐ Unary positive, negative
Consider the following examples where integer values and arithmetic are used.
Expression Order of evaluation Evaluates to
5+3‐2 5+3=8‐2=6 6
5+3*2‐1 3*2=6;5+6=11‐1=10 10
(5+3)*2‐1 5+3=8;8*2=16‐1=15 15
5*6/3 5*6=30/3=10 10
5*(6/4) 6/4=1;5*1=5 5
5*6/4 5*6=30/4=7 7
5%2*3 5%2=1*3=3 3
In the case of operators with the same precedence, the expressions are evaluated left to right. Note that integer arithmetic results
in the fractional value being lost. 6/4 evaluates to 1 as does 7/4. However, with floating point arithmetic 6.0/4.0 evaluates to
1.5. We will discuss more about this issue in Chapter 3.
Mathematical Functions
The following table lists the commonly used mathematical functions. The header file <cmath> needs to be included before we
can use these functions. The function parameters and computed value are of double data type, unless otherwise noted.
Name Function Examples
sin(x) Computes sine of an angle x expressed in radians. Similarly, sin (dX)
cos(x) and tan(x) compute cosine and tangent values cos(dX*dY/2.0)
tan(2.3+4.5/dZ)
respectively.
asin(x) Arc sine. Function returns angle expressed in radians. asin(dB)
Similarly, acos and atan are functions for arc cosine and arc
tangent respectively.
cosh(x) Hyperbolic cosine. Similarly, sinh and tanh are hyperbolic cosh (dFill)
sine and tangent respectively.
sqrt (x) Square root of x. sqrt(25.0+dAlpha)
x raised to power y x .
pow (x,y) y pow (fX, 3.5f)
If nA, nB and nC are all declared to be integers and fP, fQ and fR are declared to be floats, then the following assignment
statements have the same data types on both sides of the assignment operator.
nA = 10 + nB*nC;
fP = 10.2f*fQ ‐ fR;
What does the compiler do when it encounters an assignment with different data types? For example, how is
dX = 10 + 2.5*nA;
evaluated where the left-hand side is a double variable (dX) and the expression on the right is made up of an integer constant
(10), a double constant (2.5) and an integer variable nA? We will look at this situation in greater detail in the later chapters.
Streams
A stream is a sequence of bytes. In input stream, the bytes flow from a device like keyboard or disk drive to main memory. In
output stream, the bytes flow from main memory to a device such as display screen, printer, or disk drive.
The iostream header file contains basic information required for all stream I/O operations. This header file contains objects
such as cin, cout, cerr, and clog, some of which will be discussed here.
The iostream library contains many classes that handle I/O operations. The istream class supports stream input operations.
The ostream class supports stream output operations. The iostream class supports both the input and output operations. The
iostream class is derived from both the istream and ostream classes.
Stream Input
Stream input may be performed with the right shift operator (>>), also referred to as the stream-extraction operator. This
operator normally skips whitespace characters like blanks, tabs, and newlines and returns zero (false) when end of file is
encountered in the input stream.
cin is an object of the istream class and corresponds to standard input device. In the following, the cin object used with the
stream-extraction operator causes a value for integer variable nScore to be input from cin to memory. Assume that nScore has
previously been declared an integer. Then the statement
cin >> nScore;
is used to read in the value of nScore.
Stream Output
The ostream class offers several output capabilities. These include output of standard data types with the stream-insertion
operator, characters with the put member function2, unformatted output with the write member function, and various
formatted output.
Stream output may be performed with the left shift operator (<<) which is also referred to as the stream-insertion operator.
cout is an object of the ostream class and it corresponds to the standard output device. The cout keyword has the following
syntax.
cout << …;
where the insertion operator << is used to output standard types represented above as …. Consider the following example.
int nA=1, nB=10, nC;
nC = nA + nB;
cout << "The sum of " << nA << " and " << nB << " is " << nC
<< "." << endl;
Note the use of endl (end line) stream manipulator. The endl stream manipulator creates the same result as \n escape sequence
and also causes the output buffer to be output immediately even if it is not full. The output generated by the above statement
is shown below.
The sum of 1 and 10 is 11.
For example, to read an integer value we could use the following statements.
#include <iostream> // iostream class
using std::cin; // standard input
using std::cout; // standard output
…..
….
int nScore;
….
cout << “What is the score? “; // ask user for the score
cin >> nScore; // read the user input
…
Note in this example since the using std:: style is used, both cout and cin do not need the std:: prefix. Alternately, the ::
(unary scope resolution operator) can be used directly with the standard namespace std. In other words,
#include <iostream>
using std::cout;
…
cout …
is equivalent to
#include <iostream>
…
2 Member functions will be discussed with objects and classes starting in Chapter 7.
std::cout …
Field Width
width() sets the field width and returns the previous width with one integer argument, and with no argument
returns the current setting
setw() sets field width(a value wider than the field width will not be truncated and width setting applies only
for the next insertion or extraction)
The precision and setprecision manipulators control the output. If the display format is scientific or fixed, then the precision
indicates the number of digits after the decimal point. If the format is automatic (neither floating point nor fixed), then the
precision indicates the total number of significant digits. This setting remains in effect until the next change. The width and
setw manipulators control how many character spaces are reserved for outputting a value.
Here are a few example statements showing how the precision and field width can be controlled. We will examine these features
in greater detail in the following chapters.
cout.precision (10);
cout << setprecision(10); // same effect as previous line
cout.width (20);
cout << setw(20); // same effect as previous line
main.cpp
As before, we see examples of function prototyping using the header files iostream and iomanip (lines 11-12). The former
include file makes it possible to use the following keywords - std::cout, std::endl. The latter include file supports the use of
std::setprecision, std::setw.
The output generated by the program is shown in Fig. 2.5.1. Note the difference in the output between the float and the double
outputs. The float value beyond 6 significant digits is unreliable.
Fig. 2.5.1 Output generated by Microsoft Visual Studio for Example 2.5.1
A few points to note. Multiple statements can be typed on a single line if they are separated by semicolons – see lines 19, 24,
and 31. A complete statement can appear on more than one line – see lines 28 and 29, and 35 and 36.
main.cpp
The const qualifier before a variable signifies that the value of the variable cannot change over the course of the program. If an
attempt is made to change the value of the variable, a compiler error results. Consider the following statements.
const int NVALUES = 3; // NVALUES is declared as a const int
NVALUES = NVALUES + 3; // invalid statement. will not compile
This is a defensive programming mechanism, and one should use such a programming style. A sample output generated by the
program is shown in Fig. 2.5.2.
Fig. 2.5.2 Output generated by Microsoft Visual Studio for Example 2.5.2
As a matter of programming style, it is desirable to declare constant integer variables than use an integer constant throughout
the program. First, it is easier to understand the significance of MAXINPUT than the number 3. Second, if the value of the constant
needs to be modified, then only one statement needs to be changed in the program as opposed to changing the integer constant
wherever it is used in the program.
A sample output generated by the execution of the program is shown in Fig. 2.6.1.
Fig. 2.6.1 Output generated by Microsoft Visual Studio for Example 2.6.1
Tip: Here is a common programming error. What will happen if we use nVNumbers[3]to access the last value in the above
computer program? The situation is unpredictable. We will see how to trap and correct such errors with arrays in Chapter 9.
We can also store a string of characters in a char vector.
char szHeader[] = "Welcome to my 4‐function calculator program";
Note that the above statement defines the variable szHeader as a character string and initializes the value of the variable to
the specified string. The compiler automatically computes the length of the vector needed to store the string. Character strings
require an additional storage space to store the string delimiter – a special character that signals the end of the character
vector. This character is ‘\0’. Hence the other term for such strings – a zero-terminated string. Consider the following example.
char szFirstName[4] = "John"; // incorrect
This is invalid since the string John needs five spaces not four. The fifth space is required to store the string delimiter. String
manipulation can become very cumbersome. In the following statements, while the variable declaration is legal, the assignment
is not.
char szMovieTitle[18];
szMovieTitle = "Lord of the Rings"; // will not compile
However, the following statements are valid.
szMovieTitle[0]="L";
szMovieTitle[1]="o";
We will use the standard string class whenever possible. The std::string class is gradually introduced in the following chapters
and is discussed in detail in Chapter 7.
2.7 Troubleshooting
To err is human, and most IDE’s provide tools to aid in finding and correcting the errors. In this section we will look at how
some of these tools can be used.
Example Program 2.7.1 Compile Errors (Example Program 2.1.1 Revisited)
It is quite frustrating as a beginning programmer to find that seemingly small issues can prevent a program from compiling or
linking or executing. In this section, we will see how to react to error messages.
main.cpp
Because of a couple of simple typing errors, we see several error messages. As the first error message states, the error is in line
17 arising from a missing ; at the end of line 15. The correct statement should have been
std::cout << "Welcome to Object‐Oriented Numerical Analysis.\n";
The second and the third messages are displayed because the compiler does not understand what rwturn is. Note that the IDE
tags a potential problem with red squiggly characters under the word rwturn. Modern IDEs are extremely powerful and are
designed to assist the programmer correct potential problems before formally compiling the program.
Compiling the program yields no errors. However, when we try to link the program we get the following error messages.
The error message states that “error LNK2019: unresolved external symbol _main referenced in function "int __cdecl
invoke_main(void)”. In plain English, the linker was looking for the main program and could not find it. Once again, we have
a typo (typographic error) in the program. Line 13 should read
int main()
Syntactically there was no error in the program. We could have a function called nain in our program! However, as we saw
earlier in the chapter, every program must have one and only one main function. This is the function where the execution of the
program begins. There is ambiguity as to where the program should start its execution if this function does not exist.
The above program compiles and links correctly. However, it does not produce the correct answer as shown in Fig. 2.7.1.
Set watchpoints: The user can use this feature to find where in the program the value of a certain variable is changed
etc.
Let us set up the strategy to debug Example 2.7.3. First, we will set up a breakpoint at line 22-23. The reason is that we want to
ensure that the program reads our input correctly and stores the correct value in the variable Angle. Second, when the program
execution encounters the breakpoint, we will examine the value of Angle.
Below we present the output generated by Microsoft Visual Studio when the breakpoint is encountered. The program output
window is shown in Fig. 2.7.2. The IDE main screen is shown in Fig. 2.7.3.
cout << "Sine of " << Angle << " degrees is: "
<< SineAngle << '\n';
The second part of the error is a little more difficult to debug. We set a second breakpoint, and the output (at the second
breakpoint) is shown in Fig. 2.7.4. The evaluated value of sin(40.5 ) is shown as 0.334151179. What went wrong? At this stage,
we should look at the C++ documentation on the sin(x) function. The help facility for the sin function is shown in Fig. 2.7.5
and we immediately discover that the parameter x should be in radians (Angle should be expressed in radians).
Fig. 2.7.5 Microsoft Visual Studio on-line help documentation on the use of sin(x)function
Remember that debugging is an art as much as it is a science.
we have also used (see Example 2.7.3 where using std::cout is not defined)
std::cout << "What is n? ";
Both these usages are correct. In fact, there is a third usage style.
using namespace std; // defined at the beginning of the file
….
cout << "What is n? ";
cin >> n;
The using syntax does not require that keywords such as cout, cin etc. be qualified with the std:: qualifier. The prefix std::
indicates that the keywords cout, endl, setw etc. are defined inside the namespace std. The standard namespace std is defined
in the standard library and is available by inclusion of the appropriate header file (e.g. #include <iostream>). Other namespaces
can be defined and used by the programmer and the :: operator (scope operator). This means that two identical keywords with
possibly different functionalities can be used in the same program segment as long as their usage is properly defined in both
namespaces and they are properly referenced in the program that uses them.
As a matter of style, we will usually not employ the using keyword in the rest of the text and instead use the std:: qualifier
when appropriate.
Interested readers are urged to look at Example 2_8_1 for a user-defined namespace example.
Summary
In this chapter we saw how to write, compile, link, execute and debug simple C++ programs. In the following chapters we will
learn more about C++. We will then look at how to get organized and develop the program.
Below we summarize the key facts learnt in this chapter.
C++ is divided into two parts – the core language and the standard library. Elements of the standard library are used
with the syntax std:: or through using std:: declaration at the top of the file.
Definitions and declarations must precede usage. Look at C++ compiler having access to a program dictionary that
has three components – (a) C++ keywords and tokens, (b) functions whose prototypes are available in header files,
and (c) user-defined variables and functions. Imagine that this is a dynamic dictionary with respect to (c). In other
words, items may be added to the dictionary while the compiler is interpreting the program. If something is used in
the program that cannot be found in the dictionary, then the compiler issues an error message.
Every program has one and only one main function. This is the location from where the execution of the program
starts.
Comments in programs start with the pair /* and end with the pair */. Comments on a single line occur to the right
of the pair //.
A block of statements occur within the braces { and }. As an example, the statements in the main function can be
found within the braces. A semicolon ; is used to terminate most commonly used C++ statements.
Usually there are two types of files found in C++ programs – .cpp and .h files. Usually, the .cpp files contain the
statements that are executed, and the .h files contain the definitions and declarations. Note that the file extensions
may be different with different compilers and IDEs.
The more commonly used standard data types are short, int, long, float, double, bool and char.
Variables are used to store values of the standard data types. Variables are identified by variable names.
Scalar variables store one value. Vector variables store several values.
Functions are independent program segments that can be called from different locations in a program. Functions are
discussed later starting in Chapter 4.
C++ provides several functions including mathematical functions that the programmer can use in his or her program.
Expressions can be created using constants, variables, functions and operators. Expressions with mathematical
operators follow certain evaluation rules.
There are several types of C++ statements such as assignment, input/output, etc.
C++ provides streams (sequence of bytes) for obtaining input from devices such as a keyboard or disk, and for
outputting data to a display device, printer, or disk.
Exercises
One of the common errors in writing the computer programs suitable for the topics discussed in this chapter is
deciding what data type should be used for the variables. Spend some time to think over the problem before deciding
the data type. For example, should the sides of a rectangle be represented by an integer or a floating-point variable?
Appetizers
Problem 2.1
Write a program to output the following pattern on the screen.
***
****
***
Problem 2.2
Write a program to interactively obtain an integer value and display (a) the negative of that number, (b) its absolute value, and
(c) its square.
Problem 2.3
Write a program to interactively obtain a floating-point value, x for each of the three following cases and display (a) the square
root of that number, (b) its absolute value, and (c) 4.5x .
Problem 2.4
For the following values: a 1.2 , b 35.6 , c 1056.78 , d 22.5 , e 153.4 , write a program to compute and display
a b a b c b2 c
the results of the following expressions: (a) , (b) , (c) a 0.4 , (d) sin( a ) cos , (e)
c c 33.3 a a b 2
tan( d )
.
tan( d ) sin( e )
Main Course
Problem 2.5
Write a program to obtain the length and width of a rectangle, and compute and display the perimeter and the area of the
rectangle.
Problem 2.6
Write a program to carry out the following conversions (a) obtain the length in in and convert it to m , (b) obtain the
temperature in F and convert it to C , and (c) obtain the mass in slg and convert it to kg .
Problem 2.7
Write a program to obtain the ( x , y , z ) coordinates of two points. Now compute and display (a) the distance between the two
points, and (b) a unit vector from point 1 to point 2 (display the unit vector as aiˆ bjˆ ckˆ ).
Problem 2.8
Write a program to obtain the following material values expressed in in , F ,slg, lb and convert them to m , C , kg , N - (a)
modulus of elasticity, (b) coefficient of thermal expansion, and (c) mass density.
C++ Concepts
Problem 2.9
Write a program to display the values of the function y( x ) ax 3 bx 2 cx d for the range 10 x 15 using an
increment of 5. Obtain the values of the coefficients of the cubic polynomial from the user. Display the values as follows (one
per line).
(x, y(x)) = (x value, y value)
For example,
(x, y(x)) = (‐5.0, ‐120.3)
Problem 2.10
Summarize all the facts about C++ that (a) you have learnt from this chapter, and (b) from reference material outside this book.
Chapter
Control Structures
“Whenyougettotheforkinthe road,takeit.”YogiBerra
“Ifyoudonotknowwhereyouaregoing,you’llwindupsomewhereelse.”YogiBerra
“Herearethreegreatquestionswhichinlifewehaveoverandoveragaintoanswer:Isitrightorwrong?Isittrueorfalse?
Isitbeautifulorugly?Oureducationoughttohelpustoanswerthesequestions.” JohnLubbock
“When I turned two I was really anxious, because I'd doubled my age in a year. I thought, if this keeps up, by the time I'm
sixI'llbeninety.”StevenWright
Now that we know how to write simple C++ programs, it is but natural to ask, “How can I write useful programs?” As we saw
in Chapter 1, computer programs manipulate information. The algorithms that are used in manipulating the information (a) test
conditions that are to be met for certain actions to take place, and (b) repeatedly execute several steps for a specified number of
times or until certain conditions are met.
Consider a simple example of a vector that contains integer values, e.g., -20, 15, 20, 55, -130. We need to develop an algorithm
that will search this vector looking for a known target value and report whether the vector contains the target or not, and if the
target is located, where in the vector the target is located. We could use the following algorithm to solve this problem.
Input: Target value, t , and the vector of integer numbers, V .
Output: The location, l where the target t exists in V .
Step 1: Find how many elements, n in the vector V .
Step 2: Loop through all the elements starting at the first location, i.e., i 1, 2,..., n .
Step 3: Compare the number V ( i ) against the target t . If they are equal, set l i . Exit.
Step 4: Increment i i 1 .
Step 5: If i n , go to Step 2. Else set l 0 indicating that the target was not found. Exit.
An examination of the algorithm shows that repetition takes place in Steps 2 through 5 and that conditional tests occur in Steps
3 and 5. Conditional tests are necessary not only to find the target but also to terminate the repeated execution of the Steps 2
through 5. In the rest of the chapter, we will see how to use C++ control structures.
Objectives
To understand the concept of control structures.
To understand and practice selection concept.
To understand and practice repetition concept.
3.1 Selection
Quite often, decisions must be made in any algorithm. Some steps in an algorithm may be needed only if certain conditions are
met. The selection concept can be used to handle such a situation. C++ provides two constructs where selection is possible –
through the use of if .. else and the use of switch statements. We will look at the if .. else usage first.
if .. else syntax
The general syntax can take on several forms as shown below.
Form 1
if (expression)
statement(s) if expression is true;
Form 2
if (expression)
statement(s) if expression is true;
else
statements(s) if expression is false;
Form 3
if (expression1)
statement(s) if expression1 is true;
else if (expression2)
statements(s) if expression2 is true;
else if (expression3)
statements(s) if expression3 is true;
…
else
statements(s);
Form 4
if (expression1)
statement(s) if expression1 is true;
else if (expression2)
statements(s) if expression2 is true;
else if (expression3)
statements(s) if expression3 is true;
Each expression captures a condition that needs to be met and the statements that follow specify the action that needs to be
carried out if the condition is true. At most, only one of the expressions is evaluated as true and the statements associated with
the action are executed. As we can see by the different forms, the else part of the statement is optional.
When more than one statement is associated with any part of the construct, these statements must be enclosed within the {}
braces. The expression used in a selection statement must evaluate as true or false and can be made up of logical or relational
operators. C++ considers 0 to be false and a non-zero value to be true.
Relational Operators: The most commonly used relational operators are < (less than), <= (less than or equal to), > (greater
than), >= (greater than or equal to), != (not equal to), and == (equal to). Let us look at some examples.
The statements “If the score in the exam is greater than or equal to 60 then the student has passed the exam. Otherwise, the
student has failed the exam.” can be implemented as
if (nScore >= 60)
cout << "Passed the exam.";
else
cout << "Failed the exam.";
The statement “If the number of entries is not equal to zero, then the average of all the entries is the sum over the number of
entries” can be implemented as
if (nEntries != 0)
{
fAvg = fSum/nEntries;
cout << "Average of the " << nEntries <<
The relational operators are used to compare two quantities that must be of the same data type or must be such that a suitable
conversion is available to facilitate the comparison of the two quantities. The previous example can also be written as
if (nEntries == 0)
cout << "Average cannot be computed.";
else
{
fAvg = fSum/nEntries;
cout << "Average of the " << nEntries <<
" numbers is " << fAvg;
}
Tip: It is a common programming error to use the assignment operator = instead of using equality operator == in a selection
statement.
For example,
if (nA == 5)
is not the same as
if (nA = 5)
However, both the statements will compile. The danger is that the second form will always evaluate as true, an unintended
consequence and the value of the variable will be set as 5.
Logical Operators: More complex selections can be made by using logical operators. The most commonly used operators are
|| (OR) and && (AND). In a typical usage we have a compound expression in the following forms
if (expression1 || expression2) …
if (expression1 && expression2) …
where the selection is made up of two expressions and involves either OR or AND operators. The final evaluation of such an
expression is shown below.
The statement “The product of two numbers is positive if both numbers are positive or both the numbers are negative” can be
implemented as
if ((nA > 0 && nB > 0) || (nA < 0 && nB < 0))
cout << "Product is positive.";
else
cout << "Product is negative.";
The statement “A legal value of student GPA is between 0.0 and 4.0 both inclusive” can be implemented as
if (fGPA >= 0.0 && fGPA <= 4.0)
cout << "Valid GPA value.";
else
cout << "Invalid GPA value.";
The statement “If the score in the math portion of the exam is greater than 90 or if the score in the language portion of the
exam is greater than 90, then the student is a gifted student” can be implemented as
if (nScoreMath > 90 || nScoreLanguage > 90)
cout << "Student is a gifted student.";
main.cpp
In the example, all possible paths arising from the if … else construct, are enclosed within the braces {}. We encourage this
usage since it makes the program easier to read and makes adding new statements less error-prone. After the age variable nAge
is initialized in statement 17, the program executes sequentially. Assume that the value of nAge is 53. Execution starts at line 19,
and the expression is evaluated as false. The next statement that is executed is 23 and the expression is evaluated as false. The
same situation arises with statement 27. Finally, statement 31 evaluates as true and the associated action in line 33 is executed.
The last statement that is executed in the program is line 41.
As we will see throughout the text, there is no unique way of structuring a program. We can rewrite the previous example in
the following two forms.
Alternate Form 1
if (nAge >= 60)
cout << "Person " << nAge << " years old is a senior citizen.\n";
else if (nAge >= 19 && nAge <= 59)
cout << "Person " << nAge << " years old is an adult.\n";
else if (nAge >= 11 && nAge <= 18)
cout << "Person " << nAge << " years old is a juvenile.\n";
else if (nAge >= 0 && nAge <= 10)
cout << "Person " << nAge << " years old is a child.\n";
else if (nAge < 0)
cout << "Invalid age " << nAge << ".\n";
Alternate Form 2
if (nAge < 0)
cout << "Invalid age " << nAge << ".\n";
if (nAge >= 0 && nAge <= 10)
cout << "Person " << nAge << " years old is a child.\n";
if (nAge >= 11 && nAge <= 18)
cout << "Person " << nAge << " years old is a juvenile.\n";
if (nAge >= 19 && nAge <= 59)
cout << "Person " << nAge << " years old is an adult.\n";
if (nAge >= 60)
cout << "Person " << nAge << " years old is a senior citizen.\n";
Alternate Form 1 is easy to read (however it does not use the {} rule!). Alternate Form 2 may be easy to read but is inefficient
since the conditions with all the if statements are tested and is also error-prone. While a usage such as
if (nA) …
is valid, we will avoid such usage and explicitly state our objective, e.g.
if (nA != 0) …
Nested if statements
Finally, a word about nested if statements. Consider the following statements that do not produce the desired result.
float fX = 4.15f, fY = 2.14f;
….
// not written correctly
if (fX <= fY)
if (fX < sqrt(20.0))
cout << "X is less than square root of 20.";
else
cout << "X is greater than Y.";
Note that every else is associated with the nearest if. The mere fact that statements are indented does not guarantee that this
association is made. The correct way to write the statements is as follows.
float fX = 4.15f, fY = 2.14f;
….
// corrected version
if (fX <= fY)
{
if (fX < sqrt(20.0))
{
cout << "X is less than square root of 20.";
}
}
else
{
cout << "X is greater than Y.";
}
We strongly recommend that you use the {} braces to identify the block of statement(s) associated with the if and the else
parts of the statement.
Conditional operator ?:
C++ provides a succinct way to take care of a specialized form of selection. For example, consider the following statements.
if (fX > fY)
fA = 2.1*fB;
else
fA = 0.5*fA;
The C++ conditional operator ?: can be used instead of the above statements as follows.
fA = (fX > fY? 2.1*fB : 0.5*fA);
Note that three operands are involved. The first operand captures the selection condition. If this condition is true, then the
second operand is executed. If the condition is false, the third operand is executed. Here are a couple more examples.
fX > fY? fA=2.1*fB : fA=0.5*fA;
std::cout << (fX > fY? fA : fB);
3.2 Repetition
As we saw in the introductory section of this chapter, sometimes one or more statements need to be executed repeatedly until
a certain condition is satisfied. The block of statements is in a loop. C++ provides three commonly used loop constructs –
while, do ..while and for statements.
while statement
The general syntax of the while statement is as follows.
while (test condition)
{
statement(s)
}
The statements within the {} are executed if the test condition expression evaluates to true. The test condition (or expression) is
tested at the beginning of the block. The statements in the block are executed only if the expression evaluates to true. Hence it
is possible that the statements in the block do not execute even once because the expression is not true.
Example Program 3.2.1 Repetition using the while Statement
We will illustrate the usage of the while statement with an example.
N
Problem Statement: Write a program to compute the sum of the first N integers, i.e. i .
i 1
main.cpp
Let’s look at the strategy used to drive the while loop. A loop counter or index, i, is used to keep track of how many times the
statements in the block need to be executed. This loop counter is initialized to 1 in line 21. If the loop counter is not initialized,
the test condition cannot be evaluated correctly. The variable to store the sum, nSum, is initialized to zero in line 20. The test
condition compares the value of i to nN. The two statements in the block, lines 24 and 25, execute as long as i is less than or
equal to nN. Line 25 is used to increase the value of the loop counter. If this statement is omitted, the statements in the block
will continue to execute forever – an infinite loop. The test condition can be complex – we saw how complex expressions can
be constructed using relational and logical operators in the previous section.
do .. while statement
The general syntax of the do..while statement is as follows.
do
{
statement(s)
} while (test condition);
The statements within the {} are executed as long as the test condition expression remains true. However, unlike the while
statement, the test condition is evaluated at the end of the block of statements not at the beginning of the block of statements.
Hence the statements in the block will be executed at least once. We will illustrate the usage of the do ..while statement with
an example.
Example Program 3.2.2 Repetition using the do ..while Statement
N
Problem Statement: Write a program to compute the sum of the first N integers, i.e. i .
i 1
main.cpp
The loop counter is declared and initialized in line 21. In line 26, the loop counter is used in the test condition. Similarly, the
sum is defined and initialized in line 20 and is updated in line 24.
When the operators are used before the operand they are known as prefix operators as in ‐‐nP and as postfix operators if they
are used after the operand as in nP++. One must be careful in using these operators. Consider the case where we have a vector
nVA of length 3 containing three values as 11, 65 and 70. The correct statements are as follows.
int nVA[3];
int nP = 0; // nP is 0
nVA[nP++] = 11; // nP is 1. nVA[0] = 11
nVA[nP++] = 65; // nP is 2. nVA[1] = 65
nVA[nP] = 70; // nP is still 2. nVA[2] = 70
The difference is that when nP++ is executed, nP is incremented after the value of nP is used in the expression whereas ++nP
means increment nP first and then use nP in the expression.
In the above examples, the value of the variable associated with the increment and decrement is changed by one. Finally, it
should be noted that the unary increment and decrement operators can be used in more sophisticated contexts as we will see
later in the text.
for statement
The general syntax of the for loop statement is as follows.
for (initialization; test condition; update)
{
statement(s)
}
The basic idea is to repeatedly execute the statement(s) in sequence as long as the test condition is true. The initialization
part is used to initialize the values of the loop control variable and if necessary, other variables. The test condition must
evaluate to true for the statements to execute. The update part is executed at the end of the loop and is typically used to change
the value of the loop control variable.
Example Program 3.2.3 Repetition using the for Statement
N
Problem Statement: Write a program to compute the sum of the first N integers, i.e. i .
i 1
main.cpp
Note the initialization that takes place in line 20 and the body of the for loop between lines 21 and 24. As we will see next, it is
possible to carry out the initialization of the sum in the initialization part of the for loop.
Comma Operator
The comma operator , is often used in conjunction with some of the statements that we have seen before. The operator is used
to separate a list of expressions and returns the value of the last expression that is evaluated. Consider the following statement.
nSum = (nX = 2, nY = nX + 3);
After the statement is executed, the value of nX is 2, the value of nY is 5 and finally, the value of nSum is 5 since the last expression
that is evaluated is nX + 3.
An appropriate location to use the comma operator is in the initialization section of the for statement. For example, we could
rewrite the for loop in Example 3.2.3 as follows.
for (nSum = 0, i=1; i <= nN; i++)
{
nSum += i;
}
In the initialization part, nSum is set to zero and i is set to 1. The execution of the for loop then begins.
The two loops controlled by the loop indices i and j execute the sole statement nCount++. If n=10 and m=5, then the value of
nCount after the two loops is 50. One should be careful in making sure that the test condition is met at some stage of the loop
execution; otherwise, the loops will be stuck in an infinite loop. Consider the following statements.
n=50; fSum = 0.0f;
for (i=1; i <= n; i++)
{
fSum += static_cast<float>(pow(i,2.0));
if (i > n/2) i=1; // dangerous
}
The execution will be stuck in the for loop since the value of i will always be less than n. Consider the following statements.
n=m=10; fSum = 0.0f;
for (i=1; i <= n; i++)
{
for (j=1; j <= m; i++) // dangerous
{
fSum += static_cast<float>(i+j);
}
}
Once again, the execution is stuck in the inner for loop since the value of j remains at 1!
Tip: It is a common programming error to get stuck in infinite loops because the termination condition is never met or because
of typing the incorrect loop index variable name or because of misplaced semicolons!
break syntax
The general syntax of the statement is
break;
Placing this statement at an appropriate location ensures that the control (or program flow) is immediately shifted to outside
the innermost loop that the break statement is associated with.
Consider the following example. We wish to compute the sum of the ages of all the students in a class. We will assume that we
do not know (or have not been told) how many students are in the class. The user is expected to input the ages of the students
one at a time. When there are no more students, the user will input the age as zero or a negative number. We will first write the
relevant statements using the do ..while statement.
int nAge; // to store the student’s age
int nSum = 0; // to store the sum of the ages
do
{
cout << "Enter the age (zero or negative to end): ";
cin >> nAge;
if (nAge > 0)
nSum += nAge;
} while (nAge > 0);
cout << "The sum of the ages is " << nSum << ".\n";
The for statement with nothing specified for its three components mimics an infinite loop. The break statement makes it
possible to exit the loop.
continue syntax
The continue statement is closely associated with the break statement. The general syntax of the statement is
continue;
Placing this statement at an appropriate location ensures that the statements between the continue statement and the end of
the nearest loop are not executed.
We will rewrite the example shown earlier with the break statement now using the continue statement.
int nAge; // to store the student’s age
int nSum = 0; // to store the sum of the ages
do
{
cout << "Enter the age (zero or negative to end): ";
cin >> nAge;
if (nAge <= 0)
continue;
nSum += nAge;
} while (nAge > 0);
cout << "The sum of the ages is " << nSum << ".\n";
switch syntax
One could look at the switch statement as a specialized case of if … else statement. The statement provides different execution
paths based on the value of an expression that evaluates to an integer. The syntax of the statement is as follows.
switch (integral expression)
{
case ConstantExpression1:
statement(s)
break;
case ConstantExpression2:
statement(s)
break;
…
default:
statement(s)
break;
}
The entire block of statements are enclosed between the braces { …}. Each of the different execution paths begins with the
keyword case that is followed by a unique constant integer expression and the colon symbol. The break keyword at the end of
an execution path is optional. If the break keyword is used, then the subsequent statements are not executed, and the control is
transferred to the first statement after the switch block. The keyword default is optional and the statements in that subblock
are executed only if the integral expression is not equal to any one of the constant expressions associated with the different case
labels.
A few things to note about the switch statement. The switch expression and all the expressions associated with the case labels
must be of the same data type. Recall that the primitive data types – int, short, long, and char, are all of the integral data
type. In addition, these expressions must be constants.
Let us now look at an example. We will write a program segment to compute the grade point average (GPA) of a student’s
grades in a semester. We will assume that we have a 4-point grading system with all the courses carrying equal credits.
char cGrade; // to store the student’s grade
int nCourses = 0; // to store the total number of courses
float fGPA = 0.0; // to store the GPA
do
{
cout << "Enter the Grade (Type S to end): ";
cin >> cGrade;
switch (cGrade)
{
case 'A': nCourses++;
fGPA += 4.0;
break;
case 'B': nCourses++;
fGPA += 3.0;
break;
case 'C': nCourses++;
fGPA += 2.0;
break;
case 'D': nCourses++;
fGPA += 1.0;
break;
case 'E': nCourses++;
fGPA += 0.0;
break;
case 'S':
break;
default:
cout << “Invalid grade.\n”;
break;
}
goto syntax
The general syntax of the goto statement is as follows.
goto label;
…
label: statement(s);
Programmers have a love-hate relationship with the goto statement. Unrestricted use of the goto statement can lead to programs
that are very difficult to read and maintain. C++ allows the goto statement to be used within a function (we will see what
functions are in the next chapter). The execution control is transferred from the point where the goto is used to the location
where the target label is used in the goto statement. The label is an (unique) identifier and is followed by a colon. We will now
rewrite the program segment used earlier in illustrating the break and continue statements using the goto statement.
int nAge; // to store the student’s age
int nSum = 0; // to store the sum of the ages
begin:
cout << "Enter the age (zero or negative to end): ";
cin >> nAge;
if (nAge > 0)
{
nSum += nAge;
goto begin;
}
cout << "The sum of the ages is " << nSum << ".\n";
Note that the label can appear in any location in the program – before or after it is referenced in any statement in the program.
For the remainder of this chapter, we will look at several examples illustrating more selection and repetition concepts,
programming styles, algorithm development, etc.
Example Program 3.3.1 Using Control Statements for Computing Student GPA
The next example completes the problem of computing the student GPA discussed earlier in this chapter.
Program Statement: Develop a program to compute the semester GPA and the cumulative GPA of a student taking courses in a
four-point grading system. The program should ask the user to input the (a) current GPA and the number of semester hours
taken so far, and (b) the grade and the semester hours for each course taken this semester. It should display as a final report the
GPA and the semester hours before the current semester and for the current semester, and the cumulative GPA and semester
hours.
Solution: We will first develop the terminology and then the algorithm to solve the given problem. We will use the term
cumulative to signify the time period prior to the current semester, semester to signify the current semester and new to signify
the state at the end of the current semester. The equation to compute the GPA is
n
gs i i
GPA i 1
n
(3.3.1)
si
i 1
where n is the total number of courses, s i is the number of semester hours (usually between 1 and 4) for the ith course and g i
is the numerical grade (A=4, B=3, C=2, D=1, E=0) for the ith course. The above equation can also be written as
ncum nsem
g s g
i 1
i i
j 1
j sj
GPA ncum nsem
(3.3.2)
si s j
i 1 j 1
where ncum is the (cumulative) number of courses taken so far and nsem is the number of courses taken this semester. Note that
ncum ncum
g s GPA s
i 1
i i cum
i 1
i (3.3.3)
3. If the grade is S exit this loop. Is the grade valid? If yes, ask the user for the number of semester hours for the course.
Is the value valid?
nsem nsem
4. Track or update s
i 1
i and g
j 1
j sj .
5. End loop.
6. Use Eqns. (3.3.2) and (3.3.3) to compute the required quantities.
7. Display the results.
The developed program is closely (but not entirely faithful) based on the above algorithm and is shown below.
main.cpp
A few things to note about the program. First, the error checks are minimal. The TODO section identifies what needs to be
done. This is left as an exercise for the reader. Second, the program shows how different data types can be handled appropriately
through a process of type casting or explicit data conversion. While GPA can have a fractional component, the number of
semester hours is an integer. As we saw in Chapter 2, the result of purely integer arithmetic contains no fractional component
– 4/3=1 as is 5/3=1. We look at this issue in sufficient detail in the last section the chapter.
nonlinear. Display a table showing the load-deflection pairs and the slope of each segment. Also display the load value at which
the response becomes nonlinear.
Load-Deflection Diagram
0.45
0.4
0.35
0.3
0.25
Load
0.2
0.15
0.1
0.05
0
0 20 40 60 80
Deflection
A total of no more than 11 data points is assumed and the memory to store the values are allocated at compile time. There is
memory wastage if only 5 data points are defined. The program cannot execute if more than 11 data points need to be defined.
This is one of the problems with static memory allocation that we will address later in the book. Lines 37 through 62 are used
to obtain the input from the user.
(4) Logical expressions are used to check the validity of the user input in line 47 and line 50.
(5) The break statement is used to exit the loop in line 45 if the user input for the load value is 0 and in line 60 if ten values are
defined by the user.
(6) Eqn. (3.3.4) is implemented in lines 68 and 69. A safety check is not implemented to avoid a divide by zero error. This can
be easily done and is left as an exercise.
(7) Step 11 of the algorithm is implemented in the loop in lines 73 through 82. A tolerance value of 10 3 is used in checking
whether slopes of adjacent segments are equal or not. The use of fabs math function requires the use of cmath header file. Once
again, the break statement (line 80) is used to exit the loop.
(8) Finally, let us look at the statements to format and display a table – a number of columns with a header and a fixed (column)
width where data are displayed in each row. The three column headers (“Load”, “Deflection” and “Slope”) are defined in lines
87-88. We will left-justify the entries in each column via the std::setiosflag(std::ios::left) statement. Next, we will assume
that a field width of 7 is adequate to represent the load values. The column width is chosen as the larger of two numbers – the
number of digits required to represent the value and the number of characters in the column header. The << operator formats
the floating value so that the display is as compact and efficient as possible. Hence with a field width of 7 we should be able to
display (positive) values between 0 and 999999 including values as 0.15, 0.16667, 3000.12 etc. A field width of 10 is used with
the deflection values since the column header Deflection contains 10 characters. The same logic applies in formatting the last
column containing slope values. We will see more about formatted output in the last section of this chapter.
The following example is a precursor to programs involving numerical analysis and solution techniques.
Example Program 3.3.3 Exhaustive Search or Trial-and-error
Program Statement: The coefficient of restitution, c is a measure of the elasticity of the collision between two objects one of which
is usually at rest. For example, if a ball is dropped from a height H onto a floor and is observed to bounce to a height h , the
coefficient of restitution can be computed as
h
c (3.3.5)
H
Clearly if conservation of energy principle is followed 0 c 1.0 .
A ball is dropped from a height of 3.5 m onto a floor. The coefficient of restitution between the ball and floor is 0.9. Compute
how many bounces occur before the ball bounces to a height as close to 1 m as possible.
Solution: Numerical solution is an attractive approach if the analytical solution is difficult to compute. However, in this problem,
even though we know the analytical solution, we will use trial-and-error approach (or exhaustive search) to find the solution.
The motivation is to gain confidence in the development of the trial-and-error approach by comparing the trial-and-error
solution to the analytical solution.
The basic idea is to increment i , compute the new height hi and compare the new height to the target height, htarget that is 1
m. We will keep track of the difference between the computed height and the target height
hdiff hi htarget
The difference should decrease with increasing i , and then start increasing. Only under certain set of values will this difference
be zero or nearly zero – it is unlikely that the bounced height will be exactly the target height.
The analytical solution to this problem can be found as follows. Let the height to which the ball bounces after every subsequent
bounce be denoted as h1 , h2 , h3 ,... . Note that
h1 c 2 H h2 c 2 h1 c 4 H hi c 2i H (3.3.5a)
from which one could solve for i as
2i log c log
hi
H
log hi H
or, i (3.3.6)
2 log c
Algorithm: Here is the developed algorithm. As an added safety, we keep track of the number of iterations (or bounces). If the
number exceeds a predefined maximum number, we exit the loop. This is one more way we can avoid getting stuck in an infinite
loop.
1. Obtain user input for initial height, coefficient of restitution and the target height. Set hi new hi old H ,
h
diff new
hdiff old
H , and i 0 .
2. Loop to compute the new height.
3. Increment i . If i Max iterations , exit the loop and print an error message.
4. Compute hi new c 2 hi new , and hdiff hi new htarget .
new
5. If hdiff hdiff we have found the solution. Set hi new hi old , i i 1 and exit the loop.
new old
7. Go to step 2.
8. Print the results. The number of bounces is i and the bounced height closest to the target height is hi new .
main.cpp
Line 15 establishes the upper bound on the iterative algorithm. It is a good idea to use an upper bound so that in case the
program is unable to find a solution in a reasonable number of iterations, the iterative loop can be exited. The user input is
obtained in lines 28 through 33. No error checking is done, and the reader is encouraged to modify the program and carry out
checks to ensure that the input values are physically possible, e.g. initial height is positive, target height is less than the initial
height, etc.
Steps 2 through 7 in the algorithm are implemented in lines 40 through 71. Initialization of the float variables that store the
old and the new heights is made in lines 36 and 37. The math library functions pow and fabs are overloaded functions. In other
words, the argument to the functions can have different data types and the C++ compiler at compile time is able to ascertain
the data type of the argument and call the appropriate version of the pow and fabs functions. Otherwise type casting will have
to be used to convert double values to float values via the static_cast<float> expression. We will learn more about overloaded
functions in Chapter 4.
Two checks are made to exit the infinite loop – in line 49 and in line 65. There are several challenges to writing a general-purpose
computer program. One of the challenges is error detection so that the program is not carrying out calculations that are never
ending. The reader is encouraged to specify the input that would lead to the maximum number of iterations (currently 1000)
being exceeded.
The program follows the algorithm except for statements 73 through 85 where we compute the analytical solution. Eqn. (3.3.6)
is implemented in lines 75-76. However, since the number of bounces is an integer, we use the ceil function to find the next
higher integer (nUpper). To find the point that is the closest to the target height, we also use the next lowest bounce (nLower).
We use these two integer values to compute the height as per Eqn. (3.3.5a), and then find the bounce that is closest to the target
height.
Let us consider the example of the different types of polygons. We could define the enumeration type as follows.
enum Polygons {TRIANGLE=1, QUADRILATERAL, PENTAGON, HEXAGON, HEPTAGON,
OCTAGON, NONAGON, DECAGON};
Polygons now is treated as a data type just as int, float etc. The compiler assigns integer values such that TRIANGLE has a value
of 1, QUADRILATERAL has a value of 2, PENTAGON has a value 3, and so on. Note that by default, if TRIANGLE is not assigned a value,
it is taken as a zero.
Here is an example usage of this enumeration type in a program following the above definition.
Polygons PolyType; // variable to store the polygon type
std::string strPolyType;
if (strPolyType == "triangle")
PolyType = TRIANGLE;
else if (strPolyType == "square" || strPolyType == "rectangle")
PolyType = QUADRILATERAL;
else
std::cout << "Unsupported polygon type.\n";
Because of the usage of the enumeration type, the program is more readable than using simple integer constants to differentiate
the types of polygons.
Type Coercion and Casting
How do we handle expressions where different data types are involved? In line 46, the variable fSemGPA on the left side of the
assignment (symbol) is a floating value. On the right side of the assignment, we have a floating point constant, 4.0, and an
integer variable, nCourseSemHrs. In order to ensure that the computations are done correctly so that fractional components are
preserved, we need to convert (or promote) the integer variable nCourseSemHrs. This is done three different ways. The keyword
float can be used as a qualifier as
float (expression)
or (float) expression
The preferred style is to use the unary cast operator
static_cast<float>(expression)
The value of the expression in each case is stored temporarily as a floating point number that is then used in the subsequent
computations. Note that we could have written statement 46 as
fSemGPA += static_cast<float>(4*nCourseSemHrs);
Here are some rules governing type coercion when dealing with arithmetic and relational expressions and assignment operations.
Note that promotion or widening of a data type involves conversion of a value from a “lower” type to a “higher” type according
to a programming language’s precedence of data types.
(1) Step 1: Each char, short, bool, or enumeration value is promoted to int. If both operands are now int, the result
is an int expression.
(2) Step 2: If Step 1 still leaves a mixed type expression, the following precedence of types is used:
Lowest highest
int, unsigned int, long, unsigned long, float, double, long double
The value of the operand of the lower type is promoted to that of the higher type, and the result is an expression of
that type.
Consider the following expression (nValue + 4.0) where nValue is defined as an int. Step 1 indicates that this expression is a
mixed expression. Hence, nValue is temporarily coerced to a double, and the entire expression is evaluated as a double.
Similarly, in relational expressions of the form nValue1 > fValue1, the value of nValue1 is temporarily coerced to a float before
the comparison takes place.
Formatted Output
C++ provides the programmer with sufficient controls to tailor or format the output to a file or the screen. In Section 2.5 we
looked at the very basics of formatted output using precision, setprecision, width and setw commands. In this section, we
will see a few more ways of controlling the output format. Every output stream has a member function called setf that can be
used to specify the type of desired output. In conjunction with the member constants in the ios class, various operations can
be carried out. Note that the ios class is described (hence included) in the <iostream> and <fstream> header files.
Table 3.4.1 Formatting with setf
Flag Effects of setting the flag Default
ios::showpos Shows a plus sign before positive integers. Not active
ios::showpoint For floating point numbers, the decimal point and the trailing zeros are shown. Not active
If this flag is not set, a floating point number without a fractional component
may not have the decimal point and the trailing zeros displayed.
ios::fixed The scientific notation (using e) is not used. Instead the floating-point numbers Not active
are output completely. This unsets the ios::scientific flag.
ios::uppercase An uppercase E is used instead of the lowercase e in scientific notation. Not active
ios::scientific Floating point numbers are written in scientific notation using the e symbol. Not active
ios::right If the field width is specified (using the width function or setw manipulator), the Default
next item output is right justified.
ios::left If the field width is specified (using the width function or setw manipulator), the Not active
next item output is left justified.
ios::resetiosflags Clears the flag so that a new setting can be specified. For example, if the current
output is left justified, then to have right justification be applicable, use
resetiosflags(ios::adjustfield)before using ios::right.
The flags invoked using the setf function can be removed using the unsetf function as we will see in the following example.
Example Program 3.4.1 C++ Stream Input/Output
Program Statement: Develop a simple program to understand how formatting statements works with floating point numbers.
Solution: We will develop an interactive program in which the user will be asked to input a floating-point number and the output
precision. The number will be read in and then displayed on the screen in various output styles. The program will run in a
continuous loop with the program termination possible by typing in CTRL-C.
The resulting program is shown below.
main.cpp
The program enters an infinite loop in line 16. The user input is obtained in line 20. The input number is displayed in four
different styles. First, the default style is displayed. This style may be compiler dependent. We use the user-specified precision
to control how many digits after the decimal point are displayed. In the first user-defined style, we set the style as not showing
the value in the exponent (scientific) form and always showing the decimal point as well as the trailing zeros. This is carried out
in line 27-29. Using the bitwise | operator, we can carry out in one statement what can also be done in two statements. In other
words, line 20 is equivalent to the following two statements
std::cout.setf (std::ios::fixed);
std::cout.setf (std::ios::showpoint);
The bitwise operator can be used to stack as many different types of output specifications as required. Once a style (or flag) is
set, it remains in effect until it is reset or unset. In the second user-defined style, in addition to the existing style, we specify that
the + or the – sign be displayed before the number. This is specified in line 32. Finally, in the third user-defined style, we want
to display the numbers in the scientific notation (line 37). Line 36 is necessary to unset the current style – fixed or non-exponent
way of displaying float values. Finally, the default settings are restored in lines 41-42 by unsetting the existing flags, and restoring
the default floating point settings in line 43.
A sample output from the program in shown in Fig. 3.4.1.
Lines 19-23 show the code to obtain a valid value for the number of user-defined values. These values are then obtained and
stored in lines 26-30. To display the float values in the scientific style, the setf function is used in line 33. Next the column
headings are displayed in lines 35-36. The number of blank spaces used before displaying both the index and the value is a
function of the field width used to display both these values. The field width is set to 5 for displaying the index. Unlike other
formatting flags, the field width manipulator, setw is valid only for the next item that follows the specification. This is set in line
39. Similarly, in line 41, the field width is set to 10 display the value. Is this field width adequate to display any floating point
value?
Summary
In this chapter we saw most of the important C++ control structures. At this stage with the knowledge gathered from Chapters
2 and 3, we should be able to write moderately complex programs.
Below we summarize the important facts learnt in this chapter.
Declaration and definitions must precede usage and execution. Declaration typically implies name and type associated
with a symbol. Definition implies the how and where the symbols are used. A symbol (or, an identifier) can be declared
more than once, but must be defined exactly once. Look at C++ compiler and linker having access to a C++
dictionary and a thesaurus that is augmented by a user-defined dictionary and thesaurus. The compiler looks first at
these documents when it is compiling a program and the linker collects all the object files to create the executable. A
compiler or linker error message is issued if there is missing information.
Selection is achieved through (a) if … else if … else statement, and (b) switch statement. The selection conditions
can be formed through expressions involving relational and logical operators.
Repetition is achieved through (a) while statement, (b) do … while statement, and (c) for statement.
In addition, the break, continue and goto provide other forms of control statements. Later in the book we will see
the return and the exit statements.
We saw examples where the size of a vector-related variable needed to be known in advance for memory allocation
to take place, e.g. the use of MAXVALUES in Example 3.4.2. This is a hinderance for writing a general-purpose program
and later in the book we will see the various solutions to this problem.
At the end of the chapter, we saw (a) the enumerated type that help in making programs easier to read, and (b) the
dangers in mixed type arithmetic (type coercion).
C++ provides a rich set of formatting controls for both screen as well as file outputs. We looked at a few of them in
this chapter. In later chapters we will see the rest.
Exercises
When writing programs simple or complex, it is a good programming habit to check for errors in input and inform
via clear error messages as to why the input has an invalid value. When a problem specifies exhaustive search (or
trial and error), assume that you do not know the analytical solution. You may use the analytical solution to check
the computer program generated solution.
Appetizers
Problem 3.1
Write a program to display as a table the values of the function y ( x ) ax 3 bx 2 cx d for the range 10 x 15 using
an increment supplied by the user. Obtain the values of the coefficients of the cubic polynomial and the increment from the
user.
Problem 3.2
Write a program to obtain a set of up to a maximum of 10 positive float numbers. For this set of numbers, compute its (a)
average, (b) minimum, (c) maximum, and (d) standard deviation. Prompt the user to enter the numbers one at a time. A negative
value signals the end of the user input. Check for some basic input errors such as the first number being negative etc.
Problem 3.3
Write a program to obtain a set of ( x , y ) coordinates for 5 points. Find the two points that are (a) closest to each other, and
(b) farthest apart from each other.
Problem 3.4
Write a program to accept (a) an integer input and print out the number with the digits reversed (for example, if the input is -
18080, the output should be 08081-), and (b) a string input and print out the string with the characters reversed (for example, if
the input is arizona, the output is anozira).
Problem 3.5
What is the output from the following statements?
int nV=0;
for (int i=1; i <= 50; i = 3*i)
nV++;
cout << “nV is ” << nV << “.” << endl;
Main Course
Problem 3.6
Write a program to compute and display the value of sine of an angle using the following formula.
x3 x5 ( 1)i 1 x 2 i 1
sin( x ) x ... ....
3! 5! 2i 1 !
Terminate the series if the difference between two consecutive numbers in the series is less than 10 4 . Compare your results
with the value provided by the sin function.
Problem 3.7
Rewrite Example Program 3.2.4 with the following changes. The user input should be as follows - (a) the initial input should be
the student last name, first name, current GPA and the current number of semester hours, and (b) for the current semester, the
input should be the course number, the course semester hours, and the raw score on the course (between 0 and 100). No more
course input is required if the course number is STOP. The raw score should be translated to a letter grade as follows – A : 91-
100, B : 81-90, C : 71-80, D : 61-70, and E : 0-60. Assume that a student can take at most 6 courses per semester. Print a Grade
Report showing the student’s name, details of the current semester, and the new (cumulative) semester hours and GPA.
Problem 3.8
d 4 y w( x )
The differential equation for a transverse deflection of a beam is given by . The solution for a simply-supported
dx 4 EI
wx
beam of length L subjected to a uniform loading, w is given as y( x )
24 EI
x 3 2Lx 2 L3 . Write a program to
compute the largest deflection and its location using trial and error. Obtain the values of L , E, I , w and the units for force and
length from the user. Assume that the user input is in consistent units and no unit conversion is to be carried out.
Problem 3.9
What is the output from the following statements?
int nV=0;
for (int i=1; i <= 50; i = 3*i);
nV++;
cout << “nV is ” << nV << “.” << endl;
Problem 3.10
What is the output from the following statements?
int nV=0;
for (int i=0; i <= 50; i = 3*i);
nV++;
cout << “nV is ” << nV << “.” << endl;
C++ Concepts
Problem 3.11
dT
Newton’s Law of Cooling is given as k T T where T is the temperature that is a function of time, k is a positive
dt
constant, T0 is the temperature at time t t 0 , and T is the ambient temperature. Solution of the above equation is given as
T ( t ) T T0 T e kt
A coroner is called to a crime site (a warehouse) at 1 am on Jan 15, 2009. She finds a corpse whose temperature then is 80 F .
The temperature in the warehouse is maintained at 62 F . After two hours the temperature of the corpse drops to 72 F .
Write a program that uses trial and error to find the date and time of death. (Hint: The coroner when questioned says that for
most situations 0.1 hr k 0.5 hr .)
Problem 3.12
A cable suspended between two posts hangs in the form of a catenary (Fig. P3.12) whose equation is given as
T w
y( x ) cosh x c
w T
where y( x ) is the height of the cable above the ground, T is the tension in the cable, w is the weight of the cable per unit
length, and c is a constant. By measuring the height of the cable, the following conditions are known - y( x 100) 750 ,
y( x 0) 75 and y( x 100) 750 . Write a program that uses trial and error to find the values of the parameters in the
catenary equation so as to satisfy the given conditions as closely as possible. (Hint: Take 450 T 500, 10 w 20, and
40 c 60 ).
y(x)
750
750
75 x
100 100
Fig. P3.12
Problem 3.13
Write a program to obtain 7 values from the user. Store and display (a) these seven original values, (b) the seven sorted values,
and (c) their mean, median and standard deviation. Your approach should be general enough so that it is applicable for any set
of numbers. (Hint: Use Selection Sort. The basic idea is to determine the minimum (or maximum) of the list and swap it with the
element at the index where it is supposed to be. The process is repeated such that the nth minimum (or maximum) element is
swapped with the element at the (n-1)th index of the list. Here is an example involving 8 integer numbers and sorting in ascending
order.)
Initial 8 6 10 3 1 2 5 4
Pass 1 1 6 10 3 8 2 5 4
Pass 2 1 2 10 3 8 6 5 4
Pass 3 1 2 3 10 8 6 5 4
Pass 4 1 2 3 4 8 6 5 10
Pass 5 1 2 3 4 5 6 8 10
Pass 6 1 2 3 4 5 6 8 10
Pass 7 1 2 3 4 5 6 8 10
Chapter
“Buildabettermousetrapandtheworldwillbeatapathto yourdoor.”RalphWaldoEmerson
“It is not the greatness of a man's means that makes him independent, so much as the smallness of his wants”. William
Cobbett
We saw several C++ constructs in Chapters 2 and 3. Using these constructs we can write moderately complex programs.
However, as the complexity of the tasks increases, developing a single long program contained entirely in the main program is
neither efficient nor practical. In this chapter, we will learn about the elements of modular program development starting with
functions, scope of variables and finally, program development with multiple source files.
Objectives
To understand and practice the concept of functions.
To understand the concept of scope of variables.
To understand the concept of modular program development.
To understand and practice good programming styles.
4.1 Functions
We have in the previous chapters used functions without being formally introduced to them. In Example 2.5.2, we used the sin
function to compute the sine of an angle. As the documentation shows in Fig. 2.7.5, the function expects as an argument, the
angle in radians as a double precision value and returns the sine of the angle as a double precision value. One can look at a
function as a component in a program (an independent unit that can be separately compiled) that is defined because either it
has a very specific functionality or because this functionality is used in several locations in the program or both. Let us look at
the sin example again.
dAngle = 0.2;
double dValue = sin(dAngle);
The sin function expects to see a single parameter – a double precision value. In the above example, the variable dAngle provides
that value in radians. The sin function uses that value as the angle and evaluates the sine of the angle. The computed value is
returned to the calling program. The returned value can be stored and used via a variable. As shown above, the returned value
is stored in the variable dValue.
Functions can be passed with no arguments or one or more arguments, and can return either nothing or a single value. The
general syntax is as follows.
returnvalue functionname (argument 1, argument 2, …)
// argument list is optional
{
…
return somevalue; // optional. required only if
// returnvalue is not void.
}
In fact, the main program is a special function whose return value is an int. The functionname just like variable names must be
unique. The only exception is when a function is overloaded. We will see overloaded functions in Section 4.2. Here are some
examples of functions.
Example 1
int IntSquare (int n) // computes the square of an integer number
{
int nValue = n*n;
return nValue;
}
Example 2
float SumSquares (float x, float y) // computes the sum of the squares
{
return (x*x + y*y);
}
Example 3
bool IsEven (int n) // determines if a number is even or not
{
if ((n % 2) == 0)
return true; // even number
else
return false; // odd number
}
In the three examples, the number of function arguments is either one or two, and one value is returned to the calling program.
The variable name in the argument list as used in the calling program need not be the same as the corresponding variable name
as used in the function (see example below where the calling program uses nNumber and the function uses n). Note that each of
the function definitions states that the argument must evaluate to an integer (IntSquare and IsEven) or a float (SumSquares). For
example, in the case of an integer argument, in the calling program the argument can be an integer constant, an integer variable
or an expression that evaluates to an integer. In the IsEven function we have two return statements. Sometimes the return
statement returns no value but merely terminates the execution of a function and returns control to the calling program. Before
functions are used anywhere in a program, they must be declared.
The function IsEven is declared in line 10 before being used in line 19. This (line 10) is called a function prototype. The prototype
establishes the return data type, the number of arguments and the data type associated with each argument. The names of the
variables used in the argument list can be provided but are not necessary. In other words, one could have written the prototype
as
bool IsEven (int n);
Since the function needs to be declared before being used, we could have written the program differently as follows.
#include <iostream>
int main ()
{
….
return 0;
}
In this version, the function IsEven is defined at the top of the file before the main program. Hence the function prototype is
not necessary. This solution does not work if the source program exists in several files. As we will see in the next section,
prototyping is a generic solution when functions are used.
How do we define functions where there is no need to return a value or if there are no arguments to be passed? C++ has a
keyword called void that can be used. Here is an example of a function that neither returns a value nor has a function argument.
Example 4
void PrintInputError (void)
{
std::cout << "Your input is invalid.\n";
}
OR
void PrintInputError ()
{
std::cout << "Your input is invalid.\n";
}
Example 5
void DisplayCoordinates (float x, float y)
{
cout << "X Coordinate: " << x << ". Y Coordinate: " << y << ‘\n’;
}
To correctly use a function with multiple parameters, once must be careful in ordering the arguments. Consider the following
example that uses the DisplayCoordinates function.
fXC = 1.1; fYC = ‐3.02;
….
DisplayCoordinates (fYC, fXC); // incorrect usage
While the coordinates will be displayed, the display is not correct. There is a one-to-one correspondence between the arguments
in the calling program and the parameters in the defined function.
When function arguments are used, the arguments can be passed as values, or as references, or as a pointer. In this chapter we
will discuss the first two.
Calling by Value: Consider line 19 in Example 4.1.1 where the function IsEven is invoked as IsEven(nNumber). When the
control is passed to the IsEven function, a copy of the variable nNumber (initialized with its current value) is created on the stack
before the statements in the function are executed. The stack is a special area of memory that the compiler uses for storing
named variables. The programmer is relieved of the responsibility of managing this space. Once all the statements in the function
are executed and control is passed back to the calling function, the variables stored in the stack associated with the function are
destroyed automatically. What is the implication of such a behavior? Consider the following program segments.
void AnalyzeData ()
{
….
double dXCoor = 1.2, dYCoor = 2.4, dMaxCoor;
dMaxCoor = MaxCoor (dXCoor, dYCoor);
std::cout << "Max of " << dXCoor << " and " << dYCoor << " is "
<< dMaxCoor;
…
}
….
double MaxCoor (double d1, double d2) // badly written function
{
if (d2 > d1) d1=d2;
return d1;
}
We will see the correct result despite the badly written MaxCoor function. This is because in C++, by default, arguments are
passed as values. In this example, a copy of d1 and a copy of d2 are created on the stack just before the MaxCoor function is
executed. These copies are then used in the MaxCoor function and the final values (whether they are changed or not) are discarded
before control is passed back to the AnalyzeData function. In other words, the value of dXCoor is restored to 1.2 after control is
transferred to AnalyzeData function so that the std::cout statement is executed. How can we encourage better programming
practices in situations like this? A better way of defining and writing the MaxCoor function is to use the const qualifier.
double MaxCoor (const double d1, const double d2)
{
if (d2 > d1)
return (d2);
else
return d1;
}
If the function is defined with the const qualifier for both d1 and d2, the values of these two variables cannot be changed
anywhere within the function. In other words, the statement
if (d2 > d1) d1=d2;
Calling by Reference: A reference contains the memory address of a variable. When a function is called by reference, the
memory address of the variable is passed not the value of the variable. Consider a function swap that should swap the values of
two variables and the following program segment.
The outputs before and after the call to the swap function are identical since the usage is call-by-value. To ensure that the values
are swapped in the function, we need to change the program as follows.
The ampersand character, &, is used to denote the memory address of a variable and hence denote that the function usage is
call-by-reference. The reference symbol is used in the function declaration or prototype and in the function definition but NOT
in the function call. In other words, to figure out whether the function usage is call-by-value or a call-by-reference one must
look at the function prototype or function definition. The swap function can be written as follows.
void swap (int& n1, int& n2) // (int &n1, int &n2) is also correct
{
Passing Vectors: So far, we have seen how to pass scalar variables to a function. How do we pass a vector? C++ treats arrays
differently. When an array is used (in a calling function), the contents of the entire array are passed to the function. Let’s look at
an example of a function that can be used to print the elements of an integer vector.
void PrintVector (int nV[], int nSize)
{
for (int i=0; i < nSize; i++)
std::cout << "Element " << i << " : " << nV[i] << "\n";
}
The notation nV[] (without a constant integer within the square parenthesis) signifies a vector. To use this function, we can
define and use a vector as follows.
int nVBlocks[5], nVHeights[10];
….
PrintVector (nVBlocks, 5);
…
PrintVector (nVHeights, 10);
Note that with this example, it is possible to modify the values of the vector within the function. Since this function merely
prints the values, a better function definition is
void PrintVector (const int nV[], int nSize)
The six function prototypes are declared in lines 13-20. The return type is clearly defined as are the function arguments, if any.
The const qualifier is used to tell the compiler that the argument should not be modified in the function. The main program
calls four functions – ShowBanner, GetValues, AvgValues, MinMaxValues, with the argument names associated with the variable
names defined in the main program not necessarily the names used in the prototypes. The other two functions (MaxValue and
MinValue) are called from the MinMaxValues function.
The ShowBanner function merely displays a banner when the program execution starts. We have used special characters in the
program as shown in lines 57-59. Their meanings along with other special characters are shown in Table 4.1.1.
Table 4.1.1 Special character sequences
Character Remarks
Sequence
\n Newline. Positions the cursor to the beginning of the next line.
\t Tab character. Positions the cursor to the next tab location.
\" Double quote. To output the " character verbatim.
\\ Backslash. To output the \ character verbatim.
\r Carriage return. Positions the cursor to the beginning of the current line.
\a Alert. Sounds the bell using the computer’s speakers.
The GetValues function has two arguments. The first argument is a floating-point vector passed as reference. The second
argument is an integer (passed as value) that is the size of the vector. The const qualifier is used since it would be invalid to
change the size of the vector in the function.
In the AvgValue function, the input arguments to the function are the vector containing the values and the size of the vector,
and the output from the function is the average value. The first two (input) arguments are declared with the const qualifier.
Hence, the values of the elements of the vector fVEntries and the size in nSize cannot be modified in the function. However,
the last argument is passed as a reference since the average value is computed in the function and the value must be set in the
function. In line 34, this parameter is declared as fAvgValue and in the function the corresponding argument is declared as fAvg.
The MinMaxValues function has as input arguments the vector of entries as the first argument and the size of the vector as the
second, and has output arguments the minimum and the maximum values. Since the minimum and maximum values are set in
the MinValue function and the MaxValue function, they are passed as references. The MinMaxValues function calls two other
functions MinValue and MaxValue to obtain the actual computed min and max values.
One can also pass partial contents of a vector to a function. Consider the following example.
int nVBlocks[5] = {1,2,3,4,5};
….
PrintVector (nVBlocks, 5); // prints all the five values
One the other hand, if one wishes to print only the last three values, the corresponding call would be as follows.
PrintVector (&nVBlocks[2], 3); // prints the last three values
Note the & before nVBlocks1. There is a difference between the above call and
If a specific element of the vector is used, e.g., nVBlocks[2], then the implication is that just the third element of that vector is
to be used. For example, if we wished to swap the contents of the third and the fifth elements, then the swap function can be
used as follows.
void swap (int& n1, int& n2); // prototype – call‐by‐reference
…
swap (nVBlocks[2], nVBlocks[4]);
Note that the function definition does not have the default value defined unlike the prototype. As an added restriction, the
default values must be on the rightmost parameters. In other words
float BeamWeight (float fLength, float fHeight, float fWidth=0.03f,
float fDensity=28000.0f);
is valid but
float BeamWeight (float fLength, float fHeight=0.03f, float fWidth,
float fDensity=28000.0f);
is not. Use of default arguments must be done with care.
Overloading functions: It is possible to have two or more different definitions of functions with the same name. As we will see in
the next example, we can write different functions that compute the minimum value of all the elements of a vector as we saw
in Example 4.1.2 but also obtain the values of the elements of the vector via keyboard input. Here are the function prototypes
for the GetValues function.
void GetValues (int nVV[], const int nSize);
void GetValues (float fVV[], const int nSize);
Note that while the function names are the same, the argument list is not identical. If two functions have the same name, the
compiler is able to differentiate between their usage in a program based solely on the argument list (not the return value) – either
the data types of the arguments must be different, or the number of arguments must be different.
The primary advantage to function overloading is program readability – we do not have to concoct new names for functions
even though these functions are almost always identical. However, the disadvantage is that we need to maintain several versions
of the same function. Later, we will see how to overcome this disadvantage using templates. We illustrate these new ideas about
functions in the following examples.
As we have seen before, the function prototypes (lines 12-15) need to be declared before they are referenced in the main
program. Each GetValues function is designed to obtain no more than 5 user inputs from the keyboard. These functions are
called in line 24 and 31. These calls are followed by calls to the MinValue functions to obtain the minimum value input by the
user.
Note that both the forms of the function GetValues are essentially the same except for the usage of different data types. Similarly,
for the MinValue functions.
There are other issues that we need to be aware of concerning function overloading. For example, what will happen if the
following statements are used in the program to invoke the MinValue as
float fValues[NUMELEMENTS], fMinV;
fMinV = MinValue (fValues, NUMELEMENTS);
We will get a compilation error since a version of the function that supports float values does not exist.
With this new overloaded function in the program, we will most likely obtain an incorrect value!
Let us look at another example involving exhaustive search that we have seen before.
Example Program 4.2.2 Exhaustive Search (Revisited)
Problem Statement: The fluid flow through a trapezoidal channel is given by the following equation:
A H 2H
P (4.2.1)
H tan sin
where P is the wetted perimeter, A is the cross-sectional area of the fluid, and the rest of the parameters are shown in Fig.
4.2.1.
H
Fig. 4.2.1 Flow through a trapezoidal channel
Given the values of the wetted perimeter and the height, develop a computer program to find the values of cross-sectional area
of the fluid, A and .
Solution: The approach is not much different than what we used in Chapter 3. However, in this example, we will be using
functions. We can rewrite Eqn. (4.2.1) as
A H 2H
P 0 (4.2.2)
H tan sin
The basic idea is to try different values (combinations) of the unknown parameters, A and , and find the combination that
gives the value of the left-hand side of Eqn. (4.2.2) closest to zero. We can essentially call the left-hand side of Eqn. (4.2.2) as
the error, .
Algorithm: Here is the developed algorithm.
1. Set the values (or obtain user input for) P and H . Initialize error, to a large value.
2. Loop through 5 85 in increments of 0.01 .
3. Loop through 0.1 m 2 A 2.0 m 2 in increments of 0.001m 2 .
4. Compute the left-hand side of Eqn. (4.2.2). If this value is smaller than then set to this value and save the values
of A and .
5. End loop for A .
6. End loop for .
7. Print the results.
One could make the lower and upper bound for the values of A and , and their increments as user input. In the above
algorithm, the increments are deliberately chosen to be very small. The algorithm is implemented below.
main.cpp
In lines 39-40 we initialize the value of the smallest error to the largest double value, the value that is available from the math
library via climits. Nested for loops are used to vary the values of and A in lines 48 through 58. The function
UserFunction is called in lines 44 to compute the current error in the solution. In the next line, the function UpdateError is
called to check for the smallest error (so far) and to save the solution – values of and A .
The final computed values are displayed in lines 61-63. The timing information from the program are obtained first in line 27
(when program computations start) and then again in line 66 when all the computations have taken place. The program uses
the functions from C++’s time library (#include <ctime>) to obtain the clock time. While not technically correct, the clock
time will be a good indicator of the computational effort expended in the program. The data type time_t is a struct which we
will discuss in Chapter 7. The call to function time takes place as
time (&time_tVariable);
as this call involves call-by-pointer (discussed in Chapter 8). The difftime function is called in line 68 to compute the difference
between two time_t variables.
Recursive Functions: A function that calls itself is termed a recursive function. It should be noted that any problem that can be
solved using recursion can also be solved using iterations. Recursion as an option over iterations should be chosen with care. It
is the preferred approach if the algorithm to solve the problem lends itself more naturally to recursive calls. Recursive functions
are computationally expensive since the overhead of function calls carries over.
As an example of recursion, let us assume that we are interested in computing the factorial of a number, n . From the definition
of a factorial, we have
n ! (1)(2)(3)...( n ) (4.2.3)
with 0! 1 . Once we rewrite the above equation as
n ! n( n 1)! (4.2.4)
we can immediately see why recursion can be used to compute the factorial. The resulting code is simple.
long Factorial (int n)
{
if (n == 0) // recursion ends here
return 1L;
else
return (n*Factorial (n‐1));
}
return (lFact);
}
In this example, both the function forms are equally elegant. However, there are some situations where recursion is a clear
choice.
Function calls come with execution-time overhead. The stack frame is used to keep track of the functions and the associated
arguments that are used during function calls. Hence, one should be careful in using functions especially if a few functions are
used repeatedly in a program.
Will the compiler issue an error message because it is unable to differentiate between the variable fLength declared and used
in the main program with the variable fLength declared and used in the function ComputeAreaRectangle? The answer is “No”.
Let’s review what we have learnt so far with regards to variables. A variable’s name needs to be unique within a program segment
– main program or a function. In other words, two variables even if they have different data types, cannot have the same name.
The scope (or life) of the variable fLength in the main program is from the time it is defined in the main program until the end
of the main program signified by the return 0 statement. Similarly, the scope of fLength in ComputeAreaRectangle is once again
from its definition float fLength = f1; to the end of the function signified by the return (fLength*fWidth); statement. Such
variables are called local variables. For example, fWidth is a local variable in the function ComputeAreaRectangle. Let’s look at a
modified form of the function.
float ComputeAreaRectangle (float fLength, float fWidth)
{
return (fLength*fWidth);
}
Now is the function parameter fLength a local variable in the ComputeAreaRectangle function? Yes, since it is declared in that
function. Once again, the compiler will treat the two fLength variables (in the main program and the ComputeAreaRectangle
function) differently since their scope limits are different.
Now consider a modified form of the previous program. Assume that the following statements appear in a single source file
and compile correctly.
float fLength;
int main ()
{
cin >> fLength;
float fArea = ComputeAreaRectangle (fLength, fLength);
…
return 0;
}
float ComputeAreaRectangle (float f1, float f2)
{
float fLength = f1;
float fWidth = f2;
return (fLength*fWidth);
}
Note that the variable fLength is declared outside of both the main program and the function ComputeAreaRectangle. In the
main program, it appears that the variable fLength is used without being declared. However, according to the scope rules, a
variable that is declared outside the body of the main program and all the functions, and precedes these functions, has a global
scope within the file. Hence, the variable fLength in the main program is the global variable declared at the top of the source
file. However, the variable with the same name declared in the ComputeAreaRectangle function is a local variable whose scope
is restricted to the function. A more subtle scope rule deals with the use of variables in a compound statement. A compound
statement (or block) contains one or more C++ statements within the {} braces. Consider the following statements.
int main ()
{
float fA, fB;
int n, m;
….
if (fA > fB)
{
int i;
i = 2*abs(n‐m);
…
}
i += 2;
….
return 0;
}
The program will not compile flagging the statement i += 2; as an invalid statement with the variable i as being undefined.
The reason is that the scope of the variable i is entirely in the if (fA > fB) block. Once again it is important to remember that
the scope of any variable in any block is from the time it is defined in the block until the end of the block defined by the closing
brace }.
We now present an example program to illustrate the scope rules discussed so far.
Example Program 4.3.1 Understanding Scope Rules
Problem Statement: Obtain the perimeter and area of a segment of a circle.
Solution: The length of the arc, a of a circle is given by
a r (4.3.1)
where r is the radius of the circle and is the arc angle in radians. Similarly, the area of a segment of a circle is given by
As r 2
(4.3.2)
360
The program is shown below.
main.cpp
There are two declarations with initialization of the variable fRadius – one on line 12 and the other on line 19. The one on line
12 has a global scope meaning that the initial value of –123.4 is available throughout the file. However, the local variable declared
on line 19 overrides the global variable in the main program. Hence in the main program, fRadius has an initial value of 10.0
not –123.4. The variable fRadius is also declared as function argument in the two functions. Once again, these are local variables
that take precedence over the global definition. The actual value that the fRadius variable assumes when the function executes
comes from the corresponding argument in the function call (lines 30, 31, 40 and 41).
Both the variables fArcLength and fArea are declared twice in the main program. The variables declared in lines 28 and 29, have
their scopes limited to the block between lines 27 and 35. However, the scope of the variables with the same names declared
on lines 38 and 39, start at line 38 through the end of the main program.
Let’s look at a slightly revised program. What will happen if line 19 is moved after line 35? The expression on line 26 evaluates
as false and the if block (lines 27 through 35) does not execute. Moreover, when the function calls are made in statements 32
and 33, the results are incorrect since fRadius has a negative value!
#undef identifier
Once an identifier is defined using the #define directive, the identifier can be undefined using the #undef directive. In other
words, the #define … #undef directives are paired together in a zone of the source file where the identifier has a special meaning.
For example,
#define TOLERANCE 0.0001
…..
#undef TOLERANCE
We will now examine a set of directives that are closely linked to each other.
#ifndef identifier
#ifdef identifier
#if constant_expression
#else
#elif
#endif
The #ifndef (if not defined), #ifdef (if defined) and #if are all paired with the #endif directive. For example,
#ifndef HAPPY
#define HAPPY
….
#endif
The constant_expression is an integer constant expression and is made up of integer constants, character constants and the
defined operator. For example,
#if DEBUGLEVEL == 0
cout << "The value of x is " << fX << endl;
#endif
The #else and #elif (else if) directives go with matching #if and #endif directives. For example,
#if DEBUGLEVEL == 1
cout << "The value of x is " << fX << endl;
#elif DEBUGLEVEL == 2
cout << "The value of x is " << fX << endl;
cout << "The value of y is " << fY << endl;
cout << "The value of z is " << fZ << endl;
#else
cout << "Safe execution so far." << fX << endl;
#endif
What better way to illustrate all the ideas that we have discussed so far than through an example?
Example Program 4.4.1 A Simple Statistics Library
Problem Statement: Develop a statistics library containing functions that support the following measures (a) arithmetic mean, X ,
(b) median, X m , (c) standard deviation, , (d) variance, 2 , and (e) covariance, xy . We will assume that real numbers are
used in all the computations.
Solution: Any book on statistics will provide the following formulae for the abovementioned measures.
n
x i
X i 1
(4.4.1)
n
x n 2 x n 2 1
Xm if n is even (4.4.2a)
2
X m x n 1 2 if n is odd (4.4.2b)
n
x X
2
i
i 1
(4.4.3)
n
n
x X
2
i
2 i 1
(4.4.4)
n
x i X y i Y
xy i 1
(4.4.5)
n
This statistical library (of functions) is similar to C++ math library that supports a host of functions such as sqrt, fabs,
sin etc. These math functions can be used in any program if the function prototypes are included in the program via the
<cmath> header file. In this example, we will create the following source files.
statpak.cpp program to use and check the functions in the statistics library
stat.h header file containing the function prototypes for the library
stat.cpp file containing the statistical functions
The advantages of splitting the entire C++ source statements into three files are many. First, we capture the entire functionality
of the statistical library into two physical files – stat.h and stat.cpp. There are no extraneous issues to deal with. Second, these
functions can be used in any program. One needs to include the function prototypes, typically, as
#include “stat.h”
at the top of the source file(s) in which the functions are used. In fact, we can share these functions with other programmers.
Third, the process of testing and debugging becomes simpler since we have split the library functions from the actual usage of
the functions. In this example, we will embed the test functionality in the source file statpak.cpp. Imagine that we have a
large program containing several thousand lines of source code distributed into several physical files. If a few corrections are
made in a single file, is it necessary to compile all the source files? We need to recompile only those files where the source
statements are changed with a modular program development. Finally, the set of actual values used to test the functions will be
hardcoded into the test program. This is not quite efficient since this will involve editing and inserting new values, recompiling
and linking every time we wish to try a new set of test values. We will overcome this drawback once we learn how to deal with
external files (Chapter 12).
The source files are presented and discussed below.
stat.h
Potential conflicts may be created when several souce files (.cpp) are used in a program and each file includes the same header
file (.h). The #pragma once directive tells the compiler to include the header file only once. Older versions of compilers that do
not support #pragma once can be made to work in a different way. To avoid including a header file more than once when a file
is compiled, C++ provides the following mechanism. Define a unique identifier to be associated with the header file. The
compiler uses this identifier to track how many times a specific header file is referenced, and loads the contents of the header
file only once. For example, the following statements will essentially do what #pragma once has done in this example.
#ifndef __STATLIB_H__
#define __STATLIB_H__
// function prototypes
float StatMean (const float fV[], int nSize);
float StatMedian (const float fV[], int nSize);
float StatStandardDeviation (const float fV[], int nSize);
float StatVariance (const float fV[], int nSize);
float StatCoVariance (const float fVX[], const float fVY[],
int nSize);
#endif
In line 1, the preprocessor directive #ifndef is used. We have already seen and used the #include preprocessor directive. The
#ifndef signifies if not defined. In other words, if the identifier following #ifndef has not been defined and used so far, then
the C++ compiler is directed to load and parse the statements that follow until the #endif directive is encountered. In line 2,
the identifier associated with the stat.h header file is defined using the #define preprocessor directive. It is a good idea to make
the identifier unique to avoid conflicts with other identifiers defined in other source files. When appropriate, in this book the
following naming convention will be used in defining the identifier – the first two characters are the underscore character “_”.
Similarly, the last three characters are H__.
stat.cpp
The functions in the library are defined in this file. Note that in line 7, the #include "stat.h" directive is used so that the
function prototypes are available to the compiler. Compare this to the way we have seen include directives before. The difference
between having the file name in the angle brackets <..> and between the double quotes “..” is that when the angle brackets
are used, the compiler looks for the file in a special directory (typically these are files that use various elements of the C++
library), whereas when the double quotes are used, the compiler looks for the file in the current directory (these are defined by
the programmer).
The rest of the file is a strict implementation of Eqns. (4.4.1) through (4.4.5). The functions are reused as much as possible to
avoid replicating the code – the standard deviation function (StatStandardDeviation)calls the function that computes the
mean (StatMean). Some error checks are carried out. For example, every function checks to see whether the size of the vector
is positive (non-zero). We also assume that the vectors are already sorted for the StatMedian function to work correctly.
Creation of a library is the first step in using the library. The second equally important step is to develop a program to test the
functions in the library. This process is called unit testing that we will examine in more detail in Chapters 9 and 10. Listed below
is the main program containing code to test the statistical functions.
statpak.cpp
The prototypes of the functions from the statistical library are included in line 11. In line 13 we see the prototype of a local
function. Three vectors are declared and defined in lines 17 through 24. The statistical measures are computed for each vector
via a call to ComputeStats function where the mean, demain, standard deviation, and variance are computed and displayed.
Finally, in line 42 the covariance function is tested using vectors A and B.
Programs invariably contain bugs, and the bugs are likely to be detected via unit testing. Readers are encouraged to use their
calculators to compute the statictical values before running the program.
From Program From Microsoft Excel
Finally, we will look at one more form of scope that cuts across different files using the extern keyword. Consider the following
problem. A program is being developed and the main program and all the other functions are contained in two source files.
Suppose we have a variable nPoints in the first file (called main.cpp), and that we wish to use this variable (access and/or modify
the value) in the second file (called draw.cpp). The schematic diagram as to how to achieve this is shown in Fig. 4.4.1.
main.cpp draw.cpp
.... ....
int nPoints; extern int nPoints;
.... ...
int main () int Scale (...)
{ {
.... for (i=1; i <= nPoints; i++)
cin >> nPoints; {
.... ....
return 0; }
}
return 0;
void MaxCoordinate (...) }
{
.... void Display (...)
} {
....
}
The number of floating point operations is n . The length, l of a vector an1 is defined
use a global variable nFLOPS that we will initialize to zero in the main program. In the two functions, we will update the value of
nFLOPS.
main.cpp
Note the declaration of the global variable in line 14. The variable is initialized in line 23. To use the vector functions, we will
define and declare vectors fX and fY on lines 19 and 20. The length of these two vectors and the dot product between these
vectors are computed in the for loop (lines 26 through 31). These operations are executed 10000 times.
VectorOps.h
To ensure that the header file containing the prototypes is included during the program compilation only once, the #pragma
once statement is used in line 7. Note that the vectors are passed by reference and their size is the last argument in the argument
list. The reader should reflect on what would happen if the incorrect value of the size of the vector is passed to the two functions.
Later in the book (Chapter 10) we will see a more robust way of handling these operations.
VectorOps.cpp
The global variable nFLOPS is declared in line 9 but note that the extern qualifier is used. The nFLOPS value is updated in line 20
in function VectorDotProduct and in line 36 in function VectorLength. As a note of caution, we will minimize (if not eliminate)
the use of global variables (see Programming Tip #12).
Storage Classes
C++ has several storage classes (not to be confused with scope). When a variable is declared and used, memory needs to be
allocated and subsequently deallocated when the variable goes out of scope. In other words, the storage class determines the
memory life of a variable and a function. We will discuss some (auto and static) of the storage classes here and others (static,
extern, mutable) later.
automatic variables
A local variable’s life is defined by its scope – the variable is created when and where the variable is declared in a block and ends
when execution exits the block. However, C++ automatically assumes the responsibility of allocating and deallocating the
memory for the variable. Hence the name for the storage class – automatic. There is a C++ keyword auto that can be used as
follows:
auto int nPopulation;
static variables
Global and static variables and functions exist from the time a program starts execution till the time the program finishes
execution. We saw the use of the extern keyword before in connection with global variables. The keyword can also be used
with functions. Similarly, the keyword static can be used with variables and functions. Here is an example of a static variable
used in a function.
void DoNothing ()
{
static int nCount = 0;
nCount++;
std::cout << "Count is : " << nCount << "\n";
}
The output that is generated when the program is executed is shown in Fig. 4.4.2.
The line is called the template prefix. The keywords template and class appear as shown above. The parameter T is the type
parameter. The compiler will substitute the appropriate data type for T based on the function call. The parameter name can be
any legal identifier. We have chosen T for convenience’s sake. Also, one could have more than one parameter separated by
commas. The general syntax is
template < [typelist] [, [ arglist ]] > declaration
and we will see this declaration and usage later when template classes are introduced in Chapter 9. Let’s go back to Example
Program 4.2.1 and the int version of the function MinValue.
int MinValue (const int nVV[], int nSize)
{
// set minimum value to the first entry
int nMinV = nVV[0];
return nMinV;
}
We could easily rewrite this function substituting the int data type associated with the vector of number with a generic data
type T as follows.
template <class T>
T MinValue (const T TVV[], int nSize)
{
return TMinV;
}
Compare the differences between the two versions. In the template version, we have the special first line. The function definition
for
int MinValue (const int nVV[], int nSize)
now reads
T MinValue (const T TVV[], int nSize)
Beyond this, the appropriate int declarations are replaced with T! All the other changes are style changes (writing TVV instead of
nVV etc.).
The source statements for the main program are identical to Example Program 4.2.1 except for line 10 where the #include
statement is used to declare the two functions used in the program.
Tip: If the template functions are defined in a separate file, then the source statements need to be included in the header file
(not a C++ source file) as shown below.
templates.h
The template functions are almost identical to the non-template functions except for the differences pointed out earlier in
addition to being defined in a header file. The entire program is now more compact and perhaps, easier to maintain.
where ??? is the current value that one would see on the calculator. The four functions are add, subtract, multiply and divide as
signified by the following symbols +, ‐, *, /. To clear the display (reset the value to zero), we will use C or c. Similar to power
off the calculator, we will use the symbol S or s. Let us look at some examples of using the calculator. To compute 12(5.1 6.3),
the user of the program would do the following.
Enter +‐*/CS or value [0] 5.1
Enter +‐*/CS or value [5.1] +
Enter +‐*/CS or value [5.1] 6.3
Enter +‐*/CS or value [11.4] *
Enter +‐*/CS or value [11.4] 12
Enter +‐*/CS or value [136.8] S
We will try to resolve an important problem before starting to develop the algorithm and program structure. One of the biggest
problems that new C++ programmers have with the standard input class is the way input is read and interpreted. For example,
what happens with the following code:
#include <iostream>
int main ()
{
int nPoints;
return 0;
}
if the user types an invalid value such as 1w3 instead of 123. This is the output generated by the Microsoft Visual C++ compiler
is shown in Fig. 4.6.1.
For example, we could rewrite the previous program as follows. The main program is shown below, and the entire program
would be formed by an additional source code contained in getinteractive.cpp.
#include “getinteractive.h”
int main ()
{
int nPoints;
return 0;
}
The overloaded versions can be used if range checking is to be performed. For example, what if the value of nPoints is
between 1 and 100. Then we would rewrite the call to GetInteractive as.
GetInteractive ("Input number of points: ", nPoints, 1, 100);
The details of the GetInteractive function are not discussed here. Interested readers can explore the source code
(getinteractive.cpp).
And now on to the development of the algorithm. We will store the displayed value in a variable called dMemory and the next
input value in a variable called dNext. We will track the operation to be performed using a variable called nOper that will have a
value of 1, 2, 3 and 4 if binary addition, subtraction, multiplication and division is to be performed; otherwise, the value will be
set to 0. To obtain and act on the user command, we will use a variable called nCommand that will have the following values – 0
to exit the program, 1 for addition, 2 for subtraction, 3 for multiplication, 4 for division, 5 to clear the display to zero, and 6 if
the user has input a number.
Algorithm
1. Initialize dMemory, dNext and nOper to zero.
2. Loop until user terminates the program.
3. Get user command. Limit number of input characters to 10.
4. Is input one of +‐*/cCsS or is it a number? Try to read the number as a double precision value. If there is an invalid
input, ask the user for a valid input.
5. If the input is to stop the calculations, then exit the program.
6. If the input is to clear the memory, then set dMemory and nOper to zero.
7. If the input is one of the four operators, update nOper.
8. Else the input is a number. Check nOper. If nOper has been defined (nOper is not zero) then carry out the operation,
update dMemory and set nOper is zero. Else store this number in dNext.
9. End loop.
10. Terminate program.
The program is developed in a modular form with all the steps except Steps 3 and 4 implemented in the main program
(main.cpp). Steps 3 and 4 are implemented in several functions (utility.h and utility.cpp) and the details are presented
below.
Example Program 4.6.1 A Four-Function Calculator
main.cpp
There are seven utility functions. The ShowBanner and ShowGoodBye functions are used once at the beginning and at the end of
the program. The UserCommand function parses the user input and returns only if the input is valid. The binary operations are
carried out in functions Add, Subtract, Multiply and Divide. The function prototype file is shown below.
utility.h
utility.cpp
An interesting part of the implementation deals with formatting the user prompt. Recall that the user prompt must be of the
form:
Enter +‐*/CS or value [???]
The first part of the string is straightforward. The only sticky part is how to get the current value of dMemory instead of ??? and
store the entire prompt as a standard string. We will study more about strings in Chapter 7. However, we will explain what is
necessary to format strings. First, we need to include the class to format strings. Recall that we have been using the iostream
for input and output. Similarly, the sstream class links strings with streams. In other words, it is possible to read from or write
to a string using the formatting capabilities that we have seen in the stream classes. In lines 7 and 8, we declare the string stream
class and indicate that we wish to use the standard output string stream class in the program. The actual variable strPrompt
associated with this class is declared on line 29. Note how the variable (or object) is used to format the string on lines 35 and
36. If we had wanted to write the string to the standard output, we would have used the following statement:
std::cout << "\nEnter +‐*/CS or value [" << dMemory
<< "] ";
The first parameter of the GetInteractive function is declared as const string&. The conversion from ostringstream into a
const string& requires the use of str member function from the ostringstream class as we see in line 39. As we have
mentioned before, sometimes it is easier to use and then understand why. We have used the advanced features here only because
there is no other alternative. The “why” will be tackled in Chapter 7.
Second, we have used an effective but not elegant approach to taking care of the divide by zero problem in the Divide function.
In line 95, if the denominator is zero, we merely return a zero value as the result instead of flagging the error. What is an
alternative? We could issue an error message and terminate the program. We will see more about error handling in the next
section and throughout the rest of the book.
try/catch Block
Exception handling is made up of the try/catch block. The try block contains C++ statements that may potentially generate
an exception as well as statements that should be executed if no exception occurs. The process of generating an exception is
called throwing an exception. One or more catch blocks immediately follow the try block. The general format is as follows.
try
{
// statements including at least one of the following
throw expression;
…
}
catch (datatype_1 identifier)
{
Each catch block specifies a unique datatype that it can catch and contains an exception handler. The last catch block contains
three dots (and no data type) and can be used to catch any type of exception not caught by the preceding catch block(s). The
identifier associated with the catch block is the catch block parameter and is designed to catch the exception thrown by the try
block.
Example Program 4.7.1 Exception Handling
The following program is a simple one. The user inputs a floating-point number, and the program computes its reciprocal and
its square root. In this example we will see how to catch three types of errors and handle them. The first is catching an invalid
input – user typing a number that is not a floating-point number. The second error is if the user inputs a zero value since it is
not possible to compute its reciprocal. The last error is if the user inputs a negative number since we are interested in computing
a real square root. The try keyword is in line 19 and the try block is contained in lines 20 through 32. The user input is read in
line 22. The validity of the input is checked in line 24 through the fail() member function. If the statement is true (user entered
an invalid input), an exception with a std::string data type is thrown in line 25. The other two errors are detected in line 26
and the associated exception is thrown in line 27 with a double data type. If no error is detected, the reciprocal and square root
values are computed and displayed on the console in lines 30 and 31.
The first catch block with a double data type is contained in lines 33 through 39. Appropriate statements are output to the
console indicating the type of error (zero value or a negative value). The second catch block with a std::string data type is
contained in lines 40 through 43. The exception handler outputs an error message that the user input is invalid. Finally, the
catch-all block is contained in lines 44 through 47 and should never be executed. It is merely shown in this program to illustrate
how a catch-all block may be used in a program.
We will take at deeper look at C++ provided exception classes and how to handle errors in a typical program in later chapters.
Summary
Material from the first four chapters should enable a programmer to completely write moderately complex programs. However,
most programmers do not write complete programs. Object-oriented programming makes it possible to use and modify
components written by others. We will start the study of numerical analysis from the next chapter.
While there is no universal programming style, we will set the guidelines for some good programming practices.
Programming Style Tip 4.1: Naming variables, constants and functions
Variable names in C++ start with an alphabet and involve the following characters: 0‐9, a‐z, A‐Z, $ and _. In this book, we
will maintain the following convention. The objective is to make the naming of the variables uniform and predictable. C++
provides the following basic scalar types – integer, float, double, boolean, and character. In addition, C++ also has support for
vectors and matrices as a basic type and through STL (standard template library). However, we will use our own vector and
matrix templates, as we will see in Chapter 9. Whenever appropriate, we will use the string class to store character strings.
Data Type Variable Name Prefix Examples
Integer scalar n? nX, nIterations, nJoints
Float scalar f? fY, fTolerance
Double scalar d? dArea, dP123
Boolean scalar b? bDone, bConverged
Integer vector nV? nVScores, nVSSN
Integer matrix nM? nMVertices, nMShapes
Float vector fV? fVRHS, fVForces
Float matrix fM? fMCoef, fMElementForces
Double vector dV? dVRHS, dVGPA
Double matrix dM? dMCoef, dMCoordinates
String str? strNames, strStates
Class name C? CNode, CPoint
Pointer p? pVCoor
Member of class m_? m_nRows, m_fVCoordinates
A blank line before and after a body of statements makes the body easier to read and understand.
// initialize all parameters to default values
Initialize ();
GetLine (i);
DrawLine (i);
}
Though we have not learnt about classes – how to define and use them, whenever possible, we will use the string class to store
a string of characters. We will see a formal introduction to the standard string class in Chapter 7.
Programming Style Tip 4.10: Create easy to read tabular output
As we have seen with the simple and moderately complex examples so far in the book, it is much easier for one to digest the
information if presented in a visually attractive form. Tabular output is one such form (see Example Programs 3.2.5). Graphical
output is another attractive form and we will see more about this in Chapter 19.
Programming Style Tip 4.11: Preprocessor directives and macros
We strongly discourage the use of both manifest constants and macros. For example, instead of defining a constant as
#define PI 3.1415926
Not only will the above declaration generate compiler errors if an attempt is made to modify the constant but will also allow
debugger access to the constant.
Macros are cumbersome to maintain and can lead to subtle errors. We will see that using templates (Chapter 9) provides a much
better alternative.
Programming Style Tip 4.12: Global variables
Once again, we will strongly discourage the use of global variables using extern qualifier. Once we understand how to define
and use classes, we will see how to organize the program to minimize, if not eliminate, the need for global variables.
Programming Style Tip 4.13: Templates
We will strongly urge the use of templates whenever appropriate. Templates not only reduce the size of the source statements
in a program and make program maintenance easier, but they also make software reuse through the development of libraries
possible. As we have mentioned before, one of the strengths of C++ is the ease with which libraries have developed and made
available to programmers. The Standard Templates Library (STL) is one such example, and is discussed in Appendix B.
Exercises
Most of the problems below involve the development of one or more functions. In each case (a) develop a a plan to
test the function(s), and (b) implement the plan in a main program. The functions should not use cin or cout
unless specified. Put the main program in a separate file and the function(s) in separate file(s).
Appetizers
Problem 4.1
Write a function IsOdd to determine whether an integer n is an odd number as
bool IsOdd (int n);
Problem 4.2
The thermal efficiency, e of a heat engine is defined as the ratio of the net work done to the thermal energy absorbed at the
highest temperature during one cycle as
Qc
e 1
Qh
where Qh is the amount of heat absorbed by the engine and Qc is the amount of heat given up. The function prototype is
given as follows.
float Efficiency (float fHeatAbsorbed, float fHeatLost);
Problem 4.3
The half-life of radon is 3.8 days. Write a function to compute the concentration of radon given the initial concentration (in
mol L ) and the elapsed time (in days). The function prototype is given as
float RadonConc (float fInitialConc, int nDays);
Problem 4.4
The Arrhenius Equation relates the rate at which a reaction proceeds with its temperature and is given as
k Ae
Ea RT
where k is rate coefficient, A is a constant, Ea is the activation energy, R is the universal gas constant, T is the temperature
in degrees Kelvin. Write a function to compute the rate coefficient (stored in a vector) for a given number of values of
temperatures (stored in a vector). nPoints is the number of temperature values.
void Arrhenius (float fA, float fEa, const float fVT[], float fVK[],
int nPoints);
Problem 4.5
Write a function to compute the state of stress ( x , x y ) on a plane given ( x , y , xy )
y’
x y x y
x
x y
x' y' sin 2 xy cos 2
yx
2
y
Fig. P4.5
Main Course
Problem 4.6
Develop a library of functions to compute the surface area and volume of the following three-dimensional objects – (1) Cube,
(2) Tetrahedron, (3) Right pyramid, (4) Right circular cylinder, (5) Right circular cone, (6) Sphere. The function prototypes are
presented below.
void CubeProp (float& fSurfArea, float& fVolume, float fSide);
void TetrahedronProp (float& fSurfArea, float& fVolume, const float fVX[4],
const float fVY[4], const float fVZ[4]);
void PyramidProp (float& fSurfArea, float& fVolume, float fSide,
float fHeight);
void CylinderProp (float& fSurfArea, float& fVolume, float fRadius,
float fHeight);
void ConeProp (float& fSurfArea, float& fVolume, float fRadius,
float fHeight);
void SphereProp (float& fSurfArea, float& fVolume, float fRadius);
Problem 4.7
Conduction of heat through different shaped solid objects. The total heat transfer rate Q through the body is related to the
temperature difference T and a quantity called the conduction shape factor S by Q Sk T Sk T1 T2 where k is
the thermal conductivity of the body and T1 and T2 are the boundary surface temperatures across which heat flow takes place.
The shape factor S is related to the thermal resistance of the body and is given as shown in the following table.
Shape S
Slab of thickness t and cross-sectional area of heat flow A (see Fig. P4.7(a)) At
Long hollow cylinder of length L , inner radius r1 at temperature T1 , outer radius r2 at 2 L
temperature T2 ln r2 r1
A sphere of radius R maintained at temperature T1 placed in a semi-infinite medium at a 4 R
distance z from a surface maintained at temperature T2 (see Fig. (b)) 1 R 2z
A sphere of radius R maintained at temperature T1 placed in a semi-infinite medium 4 R
maintained at temperature T2 and placed at a distance z from an insulated surface 1 R 2z
A cylinder of radius R and length L maintained at temperature T1 placed horizontally in 2 L
a semi-infinite medium at a distance z from a surface maintained at temperature T2 cosh 1 z R
Circular hole of radius R centered in a square solid of side a and length L 2 L
ln 0.54 a R
T
2
A K
T
z
2
T
t 1 R
T
1
(a) (b)
Fig. P4.7
Develop a program to obtain the values of different parameters and compute the value of Q . Construct a separate function
for each of the shapes shown in the table.
Problem 4.8
Write a function to compute the area under a curve that is approximated as straight lines between adjacent points as shown in
Fig. P4.8. With this scheme, the shape under the curve between adjacent points can either be a trapezoid or a rectangle or a
triangle (note: area can be positive or negative or zero).
y
(x , y )
1 1
x
(xn, yn )
(x , y )
i i
Fig. P4.8
The prototype of this function is as follows.
float AreaUnderCurve (const float fVX[],const float fVY[], int nPoints,
int& nTriangles, int& nRectangles, int& nTrapezoids);
The inputs to the function are the vector of x and y values of the points. The output is the return value that is the area under
the curve and the last three arguments – the number of triangles, rectangles and trapezoids detected during the computation of
the area. Develop three other functions:
float AreaRightTriangle (float fHeight, float fBase);
float AreaRectangle (float fHeight, float fBase);
float AreaTrapezoid (float fBase, float fHeightLeft, float fHeightRight);
to compute the area of a right triangle, rectangle and a trapezoid. Call these functions from the AreaUnderCurve function.
Problem 4.9
Fibonacci numbers are defined as the sequence of the following integers 0, 1, 1, 2, 3, 5, 8, … In other words, F1 0, F2 1
and Fi Fi 1 Fi 2 , i 2, 3,... Write two functions to compute the Fibonacci numbers with the first using iterations and the
other using recursion.
Problem 4.10
Develop a library of functions to operate on points in the ( x , y , z ) space that are stored as a double vector of length 3. The
following functions are needed.
Distance: The straight-line distance between points 1 and 2.
DistanceFromOrigin: The straight-line distance between the point and the origin of the coordinate system.
UnitVector: Unit vector between points 1 and 2.
DistanceFromLine: Shortest distance from the point to the straight-line connecting points 1 and 2.
The function prototypes are given below.
double Distance (const double dV1[3], const double dV2[3]);
double DistanceFromOrigin (const double dVP[3]);
void UnitVector (const double dV1[3], const double dV2[3],
double dVUnitV[3]);
double DistanceFromLine (const double dVP[3], const double dV1[3],
Once you have tested the functions, convert them to template functions so that they can be used with either float or double
data types.
C++ Concepts
Problem 4.11
Enhance the library of statistical functions shown in Example Program 4.4.1. Develop these functions as template functions.
The functions should compute the following statistical measures – (1) Arithmetic mean, (2) Geometric Mean, (3) Median, (4)
Mean Deviation, (5) Standard Deviation, (6) Variance, and (7) Covariance.
Problem 4.12 (See Problems 1.9 and 1.10)
Several engineering problems are solved using heuristics. One can loosely define heuristics as the employment of the solution
techniques that are based on experience rather than on a rigorous theory. The use of rule-based procedures is an example of
heuristics. We all use heuristics on a regular basis especially when playing games. Write a computer program for playing tic-tac-
toe. Start with an initial screen that looks like the grids shown in Fig. P4.12.
1 2 3 1 2 3
4 5 6 4 5 6
7 8 9 X 8 9
User plays first Computer plays first
Fig. P4.12
Chapter
“Youcanuseaneraseronthedraftingtableorasledgehammerontheconstructionsite.”FrankLloydWright
In the preceding chapters we saw how to write simple yet useful programs. As we have repeatedly seen before, good computer
programs can be used to among other things, automate mundane, repeating tasks and make them as error-free as possible. In
this chapter we will start to look at the basics of numerical analysis. We will see how integer and floating-point numbers are
represented and stored. We will look at accuracy, sources of errors and how best to deal with computer arithmetic.
Objectives
To understand the basics of numerical analysis starting with numerical representation.
To understand what is meant by numerical approximation and numerical errors.
To understand Taylor series expansion and function approximation.
sign bit
Fig. 5.1.1 Bit representation for a 4-bit integer storage
A 1 in the sign bit usually signifies a negative number. The other three bits are then used to store the value of the integer. The
largest value occurs when all the three bits are 1’s. In other words, the largest value than can be stored is
1112 1 2 0 1 21 1 2 2 7 10 . Hence the range of integer numbers, n that can be represented in a 4-bit
representation is 7 n 7 . Or the range of (decimal) numbers, n that can be represented in a p -bit representation is
(2 p 1 1) n (2 p 1 1) . Going back to Section 2.2, we can now see why a short number using 16 bits can be used to store
values between (216 1 1) n (216 1 1) 32767 n 32767 .
When a floating-point number is represented in the decimal system, we can continue to think of the number as we did with
integers. For example,
4.203 10 4 10 0 2 10 1 0 10 2 3 10 3
Similarly, for the binary representation of floating-point numbers, we have the following example.
1.1012 1 2 0 1 2 1 0 2 2 1 2 3 1.625 10
The computer representation of a floating-point number is a little more complex. Typically, a floating-point number has three
components (Fig. 5.1.2).
exponent
0 1 8 9 31
If 0<E<255 then V=(-1)**S * 2 ** (E-127) * (1.F) where "1.F" is intended to represent the binary number created
by prefixing F with an implicit leading 1 and a binary point.
If E=0 and F is nonzero, then V=(-1)**S * 2 ** (-126) * (0.F) These are "unnormalized" values.
The exponent can either be negative or positive. A bias is subtracted from the exponent in order to get the actual exponent.
This bias value is 127 for single-precision floats. As an example, an exponent (E) value of 134 means that the actual exponent
is (134-127), or 7.
The mantissa represents the precision bits of the number. It is composed of an implicit leading bit and the fraction bits. To
maximize the quantity of representable numbers, floating-point numbers are typically stored in normalized form. This basically
puts the radix point after the first non-zero digit. Thus, we can just assume a leading digit of 1, and don't need to represent it
explicitly. As a result, the mantissa has effectively 24 bits of resolution, by way of 23 fraction bits.
For example,
1 10000001 10100000000000000000000 = ‐1 * 2**(129‐127) * (1.101)2 = ‐(6.5)10
Since the number of bits for the exponent is 8, the approximate range of numbers that can be represented is
8
2( 2 1127) 3.4(1038 ) . There are five distinct numerical ranges that single-precision floating-point numbers are not able to
represent.
(1) Negative overflow: Negative numbers less than (2 2 23 ) 2127 .
(2) Negative underflow: Negative numbers greater than 2 149 .
(3) Zero (see below).
(4) Positive underflow: Positive numbers less than 2149 .
(5) Positive overflow: Positive numbers greater than (2 2 23 ) 2127 .
Overflow means that values have grown too large for the representation, much in the same way that you can overflow integers.
Underflow is a less serious problem because is just denotes a loss of precision, which is guaranteed to be closely approximated
by zero.
The procedure to store and interpret double precision numbers is very similar to single precision numbers (Fig. 5.1.3).
exponent
0 1 11 12 63
Not A Number
The value NaN (Not a Number) is used to represent a value that does not represent a real number. NaN's are represented by a
bit pattern with an exponent of all 1s and a non-zero fraction. There are two categories of NaN: QNaN (Quiet NaN) and
SNaN (Signalling NaN).
A QNaN is a NaN with the most significant fraction bit set. QNaN's propagate freely through most arithmetic operations.
These values pop out of an operation when the result is not mathematically defined.
An SNaN is a NaN with the most significant fraction bit clear. It is used to signal an exception when used in operations. SNaN's
can be handy to assign to uninitialized variables to trap premature usage.
Semantically, QNaN's denote indeterminate operations, while SNaN's denote invalid operations.
Converting a Decimal Number to a Non-Decimal Number (Base b)
So as to generalize the procedure for both integers and floating-point numbers, we will split the given number into its integral
and fractional parts. Note that an integer has no fractional part.
Integral Part
(1) Divide the integral part by the base b. This yields a quotient and a remainder. The remainder is the rightmost digit of
the integral part of the new number.
(2) Divide the quotient again by b. The remainder is the next digit of the integral part.
(3) Repeat step (2) until the quotient is zero. The (last) remainder is the leftmost digit of the new number.
Fractional Part
(1) Multiply the fractional part of the decimal number by base b. The integral part of the product constitutes the leftmost
digit of the fractional part of the new number.
(2) Multiply the fractional part of the product by base b. The integral part of the product constitutes the next digit of the
fractional part of the new number.
(3) Repeat step (2) until a zero fractional part or a duplicate fractional part occurs. The integer part of the (last) product
is the rightmost digit of the fractional part of the new number. A duplicate fractional part is an indication that the digit
(or sequence) is a repeating one.
Example 5.1
Problem Statement: Represent each of the following decimal numbers as a binary numbers.
(a) 12 (b) -24 (c) -1.45
Solution: For each of the numbers we present a table showing the calculations.
(a) 12 10 1100 2
Division (Quotient, Remainder) Binary Number
12/2 (6,0) 0
6/2 (3,0) 00
3/2 (1,1) 100
1/2 (0,1) 1100
(b) 24 10 11000 2 2
Division (Quotient, Remainder) Binary Number
24/2 (12,0) 0
12/2 (6,0) 00
6/2 (3,0) 000
3/2 (1,1) 1000
1/2 (0,1) 11000
2 Negative numbers are usually represented as 2’s complement. See Problem 5.9.
Fractional Part
Multiplication (Product, Integral Part) Binary Number
0.45 x 2 (0.90,0) 0
0.90 x 2 (1.80,1) 01
0.80 x 2 (1.60,1) 011
0.60 x 2 (1.20,1) 0111
0.20 x 2 (0.40,0) 01110
0.40 x 2 (0.80,0) 011100
0.80 x 2 (1.60,1) 011100…
As we can see from the last row in the table, the pattern begins to repeat itself; hence the calculations are terminated. It should
be noted with this simple example, a number that can be represented exactly as a decimal number may not be represented
exactly as a binary number with a finite number of bits.
Types of Errors
Before we discuss the various types of errors that can result from computer arithmetic, let us first define two very important
terms – absolute error and relative error. Let x t be the true value of a quantity whose computed approximate value is denoted
as x a . The absolute error, Eabs is then given as
Eabs x t x a (5.1.3)
and the relative error, Erel is defined as
xt xa
Erel (5.1.4)
xt
While the signs are sometimes useful, usually we are more concerned with the magnitude of the error. Hence the absolute values
are used in both the error definitions. Both these error measures tell us something about how accurate the approximate value
is.
Example 5.2
Problem Statement: (a) The weight of a certain object is 15.0 N. A store clerk weighs the object and reports the weight as 15.5 N.
What are the absolute and relative errors in the clerk’s measurement?
(b) A student astronomer using a telescope estimates the distance to a celestial object as 15,500,000 miles. It is known that the
celestial object is in fact 15,000,000 miles away. What are the absolute and relative errors in the student’s measurement?
Solution:
(a) From the problem data we have x t 15.0 and x a 15.5 . Using Eqn. (5.1.3) we have
Eabs x t x a 15.0 15.5 0.5 N
Using Eqn. (5.1.4) we have
xt xa 15.0 15.5 0.5
Erel 0.0333
xt 15.0 15.0
(b) From the problem data we have x t 15000000 and x a 15500000 . Using Eqn. (5.1.3) we have
Consider the case where sin 0.5 is evaluated. When the first three terms are used, we have
0.53 0.55
sin(0.5) 0.5 0.47942708
3! 5!
and when the first four terms are used we have
0.53 0.55 0.57
sin(0.5) 0.5 0.47942553
3! 5! 7!
A more accurate value computed using extended precision is 0.47942553860420300027328793521557.
Machine Epsilon or Precision
Another important quantity is known as machine epsilon. Machine epsilon, is the upper bound on the relative error that
occurs when a nonzero real number, x is represented as a floating point number x a . In other words
x xa
(5.1.8)
x
We can customize the above expression for computer systems that use base b with d -digit mantissa. When truncation is used
b d 1 (5.1.9a)
and when symmetric rounding is carried out
0.5 b d 1 (5.1.9b)
There is another way of defining machine epsilon (also known as unit roundoff). Let x a be the smallest number representable
in the machine arithmetic that is greater than 1 (in the machine). The machine epsilon is then defined as
xa 1 (5.1.10)
We will use this definition to estimate the machine epsilon and show that Eqns. (5.1.9) and (5.1.10) are equivalent.
Example Program 5.1.1 Computing Machine Precision
In the example shown below, the machine epsilon is estimated using Eqn. (5.1.10). In other words, we will add a floating-point
number to 1.0 and check to see if the sum is 1.0. If not, we will divide the number by 2 until the number becomes so small that
adding it to 1.0 will yield a result of 1.0.
main.cpp
In line 14, the floating-point number (that is added to 1.0) is itself initialized to 1.0. In line 26, this value is halved. The process
is repeated until adding the number to 1.0 (line 20) does not change the result from 1.0. The program output is shown in Fig.
5.1.4.
Computer Arithmetic
As we have seen earlier, floating point numbers can be represented to a finite precision. In the following examples, we will
illustrate errors resulting from computer arithmetic.
The program computes the sum of the infinite series using both single and double precision after having obtained the value of
n from the user. A sample output is shown in Fig. 5.1.5 for a relatively large value of n - 100 million. For both single and
double precision, the sum of the series is computed two different ways as follows
1 1 1 1
S forward 2 (5.1.11a)
12 2 2 32 n
1 1 1 1 1
Sreverse 2 2 (5.1.11b)
n 2
n 1 n 2
2 2
2 1
One would expect that there would be no difference between the two procedures. In Eqn. (5.1.11b), the sum is obtained by
adding from the smallest to the largest number. However, the sample output shows that Sreverse is more accurate in the single
precision version and about the same in the double precision version.
Fig. 5.1.6 shows the output generated by the program. In line 35, the program uses Microsoft-specific function _isnan to check
whether the result of the division (from line 34) stored in fE yields a valid number or not.
Since these constants are machine dependent, C++ provides a very convenient mechanism to find these machine dependent
values. A sample output is shown in Fig. 5.1.7.
Fig. 5.1.7 Machine-dependent values for Windows 10 using MSVS 2019 C++ compiler
x2 x3
ex 1 x
2! 3!
x3 x5 x7
sin( x ) x
3! 5! 7!
x2 x4 x6
cos( x ) 1
2! 4 ! 6!
Sometimes Taylor series approximation is not particularly efficient. Consider a fourth-degree approximation of e x in the interval
1,1 expanding about x 0 . Then
x2 x3 x4
P4 ( x ) 1 x
2 6 24
and the error estimate is given as
x5 e 5
e x P4 ( x ) x 0 x 1
120 120
e 1 5 1 5
x e x P4 ( x ) x 1 x 0
120 120
The error increases with increasing x and
e
Max e x P4 ( x )
1 x 1 120
Summary
This chapter is an introduction to numerical analysis – the business of finding approximate solutions via a numerical technique
that is implemented as a computer program. As we saw in the first four chapters, C++ provides the tools for implementing
numerical techniques as robust, fast and accurate computer programs. It is important to note that usually numerical solutions
are approximate for several reasons. In this chapter we looked at two such sources of error – truncation and round-off errors.
In the later chapters, we will see how these errors affect different numerical techniques.
Exercises
Most of the problems below involve the development of one or more functions. For each applicable case, the function
prototype is given. In each case (a) develop a plan to test the function(s), and (b) implement the plan in a main
program. The functions should not use cin or cout unless specified. Put the main program in a separate file and
the function(s) in separate files.
Appetizers
Problem 5.1
(a) Compute the binary form of the following decimal numbers. (i) -187 (ii) 3009 (iii) -199 (iv) 5789.
(b) Compute the decimal equivalent for the following binary numbers. (i) (-10011)2 (ii) (101010)2 (iii) (11111100001)2.
Problem 5.2
Fn 1
Fibonacci numbers, Fi , are defined as F0 F1 1 and Fi 2 Fi 1 Fi , i 1, 2, 3,... The ratio x n is the Golden Ratio
Fn
1 5
with n . It is known that lim x n . Determine the relative error in approximating x for n 1, 5,10 .
n 2
Function prototype
double REGoldenRatio (int n);
Problem 5.3
Expand the function f ( x ) 2 x 4 1.5x 2 33.4 x 10.5 in Taylor’s series about x 0.5 . Use the resulting expression to
estimate the value of f ( x 1) by retaining 1, 2, 3 and 4 terms of the expansion. Determine the absolute and the relative errors
for each case.
Function prototype
void TSExpansion (int nTerms, double& dEst, double& dAbsError,
double& dRelError);
Problem 5.4
Compute the roots of the quadratic polynomial 3.3x 2 40.5x 1.8 0 using (a) float data type and (b) double data
type. Check the absolute error for each case by substituting the root back into the equation.
Main Course
Problem 5.5
Write a function to convert an integer value to its binary representation. Display the binary representation on the screen. Use
data from Problem 5.1 to test your function.
Function prototype
void ToBinary (int nInteger);
Problem 5.6
Write a function to convert a binary number to an integer value. Display the integer value on the screen. Use data from Problem
5.1 to test your function.
Function prototype
void ToInteger (int nBinary);
Problem 5.7
Consider the problem of estimating the area of a circle of radius R. One approach is to split the circle into a collection of uniform
triangles as shown in Fig. P5.7(a). A typical triangle that subtends an angle at the center of the circle is shown in Fig. P5.7(b).
b
h
R
2
Since b R sin( / 2) , h R cos( 2) and , we have, the area of one triangle and the estimate of the area of the
n
circle are given by
R2 2
ae sin
2 n
n
nR 2 2
A(An ) a e sin
e 1 2 n
Function prototype
double AreaTriangleA (int n);
Problem 5.8
Another approach to solving the problem discussed in Problem 5.7 is shown in Fig. P5.8. Derive the expression for the estimate
of the area AB( n ) .
Problem 5.10
Consider the area estimate problem discussed in Problems 5.7 and 5.8. Develop a procedure by which you can estimate the area
of a circle using either Mesh A, or Mesh B or both. The input to the procedure is (a) the radius of the circle and (b) the desired
accuracy. Assume that you do NOT know the exact area of the circle. The procedure must compute the estimate for the area
(within the prescribed accuracy) using the least computational effort. The computational effort, E , is defined as
q
E 100 ni2
i 1
where q is the number of times the procedure uses the formula from either Mesh A or Mesh B, and ni is the number of
triangles in the mesh. For example, if your procedure uses Mesh A twice with n 10 and n 20 , and Mesh B thrice with
n 10 , n 25 and n 50 , the total effort would be 3825.
The output from the procedure is the estimate of the area and the computational effort.
Function prototype
double TriangleAreaEstimate (double dR, double dError,
double& dComputeEffort);
Problem 5.11
In this problem we will investigate what happens with floating point operations where finite representation and rounding come
into play in what appears to be unpredictable ways. Let us assume that you are working with decimal arithmetic involving infinite
precision and 4 significant digits. Compute the following expressions and show why the evaluated expressions are not equal.
(a) 1/3 + 2/3 + 2/3
(b) 2/3 + 2/3 + 1/3
References
Atkinson, An Introduction to Numerical Analysis, Wiley, 1978.
Burden and Faires, Numerical Analysis, PWS-Kent, 1988.
Press, Flannery, Teukolsky and Vetterling, Numerical Recipes in C, Cambridge Press, 1988.
Mathews and Fink, Numerical Methods Using Matlab, Prentice-Hall, 1999.
Chapra and Canale, Numerical Methods for Engineers, McGraw-Hill, 2002.
Schilling and Harris, Applied Numerical Methods for Engineers Using Matlab and C, Brooks/Cole, 2000.
Rao, Applied Numerical Methods for Engineers and Scientists, Prentice Hall, 2002.
Chapter
“Mathematical reasoning may be regarded schematically as the exercise of a combination of two facilities, which we
maycall intuitionandingenuity.”AlanTuring
We will look at three numerical problems commonly encountered by engineers and scientists. First, we will examine the various
techniques to compute the roots of nonlinear functions. Note that the roots are obtained by solving the problem f ( x ) 0 .
dy( x )
Next, we will look at numerical techniques to compute the (first) derivative of functions, i.e. that is particularly useful
dx
when analytical derivatives are difficult to compute. And, finally we will learn how to carry out numerical integration once again
b
useful when analytical integrations are difficult to compute. We will look at methods to compute single integrals, i.e. f ( x )dx
a
d b
and double integrals, i.e. f ( x , y )dxdy .
c a
Objectives
To understand how to find the roots of nonlinear functions.
To understand the concepts associated with numerical differentiation.
To understand the concepts associated with numerical integration.
Step 1: Pick a value in a , b (called c ) and construct two intervals, a , c and c , b . At least one root exists in one of these
intervals.
Step 2: To determine which interval, use the Intermediate Value Theorem. A root exists in a , c if f a f c 0 , and in
c , b if f c f b 0 . One and only one of these conditions will be satisfied, because f a f b 0 .
Step 3: We have reduced the problem of finding a root in an interval a , b to the (smaller) problem of finding a root in a smaller
interval. Therefore, depending on the result of 2, we can set c to a or b and iterate. Essentially, we ‘bracket’ the root in smaller
and smaller intervals – hence the name of the method.
Step 4: Continue 1-3 until the desired precision is reached. Clearly, if the final interval is a , b , then the root obtained by
bracketing cannot differ from the actual root by more than b a in magnitude. The convergence criteria can be either the size
of the interval a , b or the magnitude of the function at c .
The bisection method and false position methods both use the pseudo-algorithm of Steps 1-4, but differ in their choice of c .
The bisection method picks c to be the midpoint of a , b - therefore, we halve the search interval at every iteration.
Consequently, of all methods using Steps 1-4, the bisection method’s worst case behavior is best.
The false position method uses the more promising idea of constructing a linear approximation to the function, and picking c
to be the root of the line. The line connecting a , f a and b , f b is
f (b ) f ( a )
y f a x a (6.1.1)
b a
Setting y to zero and solving yields
b a
c a f a (6.1.2)
f (b ) f ( a )
Example 6.1.1
We will find the roots of the quadratic equation x 2 3 0 . We can use 1, 2 as the interval, as f x 1 is negative and
f x 2 is positive. The true root lying in the given interval is 1.73205081.
f x1
The root of the tangent-line is x 1 , so the recursive formula is
f ' x1
f x n 1
x n g x n 1 x n 1 (6.1.6)
f ' x n 1
Note that the Secant Method can be considered as a special case of the Newton-Raphson Method, where the derivative
f ( x 2 ) f ( x1 )
f ' x n 1 is replaced by the approximation .
x 2 x1
The advantage of the approximation methods is that if they converge, they generally converge with a greater speed than the
bracketing methods. This is because as x n m … x n 1 approach the true root, the approximation matches the function more
and more closely in the neighborhood of the true root. In fact, both the Secant Method and the Newton-Raphson Method
have better than linear convergence in most cases.
The drawback of the approximation methods is that, unlike the bracketing methods, they do not guarantee convergence because
the root is not confined to an interval. Hence a poor initial guess or a badly behaved function may cause the methods to fail to
converge to a root. Additionally, convergence is poorer if the root is repeated (i.e., if the function can be written
f x x g x , p 1 , where is the root in question). The order of convergence of the Newton-Raphson Method
p
1 5
is quadratic (i.e., 2), whereas the order of convergence of the Secant Method is the golden ratio 1.618 . Calculation of
2
the derivative for the Newton-Raphson Method, either analytically or numerically, requires extra computation.
Example 6.1.2
We will find the roots of the quadratic equation x 2 3 0 using the Newton-Raphson Method. Note that f ( x ) x 2 3
and f ( x ) 2 x . We will start with the initial guess for the root as x 0 1 .
n xn f ( x n ) f (xn ) f xn
x n 1 x n
f ' xn
0 1 2 -2 2
1 2 4 1 1.75
2 1.75 3.5 0.0625 1.73214
3 1.73214 3.46428 0.00030898 1.73205
4 1.73205 3.4641 -2.7975(10-6) 1.73205
We terminate the iterations when the magnitude of the function is close to zero. It should be noted that the process can fail if
at any time the value of the derivative of the function is very close to zero.
b" i if b i b m
b bm
"
otherwise
b' b" if b b "
b ' b sign( m ) b b"
Step 5: Set b new b ' , a b . If f ( b new ) and f ( b ) have the same sign, c is unchanged. Otherwise, c b . Set b b new . Go to
Step 2.
Example 6.1.3
We will find the root of the equation f ( x ) x 1 1 x 1 using the Brent Method taking a tolerance value of
2
t 10 5 . We will take a , b 3, 0 .
main.cpp
We will write a simplified version of Brent’s Method. The function containing the equations whose roots will be evaluated is in
lines 24 through 36. There are four sample equations with the focus on the first one.
f ( x ) x 1 1 x 1
2
f (x ) x 2 1
f ( x ) 1 x 3 x x 3
1
2
( x 1)
f ( x ) ( x 1)e
The initial bracket is set in line 48 and checked in line 55. The convergence tolerance is set in line 49. The iterative loop starts in
line 67. Convergence check takes place in line 84. Steps 3, 4 and 5 are implemented in lines 94 through 117. Execution is
terminated with the iteration limit check in line 74. The number of function evaluations is tracked as an indicator of the
computational effort. The results from the execution of the program are shown in Fig. 6.1.1.
2!
(6.2.1b)
1
... f n x 0 x x 0
n
n!
x x0
n 1
Rn 1 x f n 1 (6.2.1c)
n 1 !
where x , x 0 a , b , and where x 0 , x . The function Pn x approximates f x more and more accurately in the
neighborhood of x 0 as n increases, assuming higher order derivatives are negligible compared to n 1 ! . This is evident
f n 1
when considering the remainder term; if is bounded by a small number, and if x is close to x 0 , then the error in the
n 1 !
approximation is bounded by a small number as well. Consequently, we can use a Taylor polynomial to approximate a function
f n 1
in the neighborhood of a point for which the values of derivatives are known and for which the term can be ignored.
n 1 !
The connection of Taylor’s polynomial to numerical differentiation is that if higher order derivatives are negligible compared to
n 1 ! , we can approximate derivatives of any order using Taylor series. For instance, expanding f x 0 h and f x 0 h
in terms of a Taylor’s polynomial of degree 2, we obtain
h2 h3
f x 0 h f x 0 f ' x 0 h f '' x 0 f ''' , x 0 , x 0 h
2 6
2
(6.2.2)
h h3
f x 0 h f x 0 f ' x 0 h f '' x 0 f ''' , x 0 h , x 0
2 6
Subtracting one equation from the other, we get
f x0 h f x0 h h2
f ' x0 f '''
2h 12 (6.2.3)
h2
f ''' , x 0 , x 0 h , x 0 h , x 0
12
This is known as the central difference formula, because it uses two points, f x 0 h and f x 0 h that are centered about
x 0 to calculate the derivative there. We can also derive forward difference and backward difference formulas by expanding
f x 0 h and f x 0 h in terms of a Taylor’s polynomial of degree 1. We have
f x0 h f x0 h
f ' x0 f '' , x 0 , x 0 h
h 2
(6.2.4)
f x0 f x0 h h
f ' x0 f '' , x 0 h , x 0
h 2
These formulas can be used to approximate the numerical derivative by assuming that second-order derivatives are negligible,
for the forward and backward difference formulas, and that third-order derivatives are negligible, for the central difference
formula.
Similarly, using the third-degree Taylor’s polynomials
h2 h3
f x 0 h f x 0 f ' x 0 h f '' x 0 f ''' x 0
2 6
h4
f 4 , x 0 , x 0 h
24
(6.2.5)
h2 h3
f x 0 h f x 0 f ' x 0 h f '' x 0 f ''' x 0
2 6
4
h
f , x 0 h , x 0
4
24
and adding these equations gives
f x0 h 2 f x0 f x0 h h2
f '' x 0 f 4
h2 24
(6.2.6)
h2
f , x 0 , x 0 h , x 0 h , x 0
4
24
which is the central difference formula for the second derivative. Using the second-degree Taylor’s polynomials
h2
f x 0 h f x 0 f ' x 0 h f '' x 0
2
h3
f ''' , x 0 , x 0 h
6
(6.2.7)
2h 2
f x 0 2h f x 0 f ' x 0 2h f '' x 0
2
2h 3
, x 0 , x 0 2h
f '''
6
and subtracting twice the first formula from the second, we get:
f ( x 0 2h ) 2 f ( x 0 h ) f ( x 0 ) h
f ( x 0 ) 2
f ( )
h 3
(6.2.8)
4h
f ( ) , x 0 , x 0 h , x 0 , x 0 2h
3
The general tactic to find the forward-finite-difference approximation for f n x 0 is to manipulate the Taylor’s polynomials
of f x 0 h ... f x 0 nh , so that the terms f ' x 0 ... f n 1 x 0 cancel out. Similarly, manipulation of Taylor’s
polynomials for f x 0 h ... f x 0 nh and the cancellation of f ' x 0 ... f n 1 x 0 will result in the backward-finite-
difference approximation formulas. Finally, for the central-finite-difference approximation for f n x 0 ,
n
f x 0 mh ... f x 0 mh should be manipulated so that lower-order derivatives disappear, where m for n even, and
2
n 1
m for n odd. Clearly, these manipulations become increasingly more tedious as n increases.
2
Difference Formulas
Using difference operators provides a much easier approach to constructing the same formulas as above. The drawback is that
the formulas are approximations, and no explicit error term is present. The main idea is to use difference operators to
d
approximate the derivative operator, . The three approximations are:
dx
d d d
forward , central , backward (6.2.9)
dx x dx x dx x
where the operators themselves mean:
f x i f x i 1 f x i
f x i f x i 1 2 f x i 1 2 (6.2.10)
f x i f x i f x i 1
We will assume that the distance between successive x values (spacing) is constant, so x i x i x i h . Using these
operators, we can easily obtain the approximations to the first derivative
df f f x i 1 f x i f x i h f x i
Forward : (6.2.11)
dx x x h
Central :
df f
f x i 1 2 f x i 1 2
f xi h f xi h
(6.2.12)
dx x x 2h
df f f x i f x i 1 f x i f x i h
Backward : (6.2.13)
dx x x h
The reason that the step size is doubled for the central-finite-difference approximation is that the values of the function at
f x i h 2 and f x i h 2 may not be known.
Similarly, for the second derivative approximations,
d2 f f f x i 2 2 f x i 1 f x i
Forward :
dx 2
x x x 2 (6.2.14)
f x i 2h 2 f x i h f x i
h2
d2 f f f x i 1 2 f x i f x i 1
Central :
dx 2
x x x 2 (6.2.15)
f xi h 2 f xi f xi h
h2
d2 f f f x i 2 f x i 1 f x i 2
Backward :
dx 2
x x x 2 (6.2.16)
f x i 2 f x i h f x i 2h
h2
In general, the error for the forward difference and backward difference formulae is of the order O( h ) . The error for the central
difference formula is of the order O( h 2 ) . Consequently, the central difference formula is more accurate than the other two
techniques as we will see in following example. It should be noted that the accuracy can be increased by using additional sampling
points around the point of interest.
Example 6.2.1 Forward Difference
Compute the derivative of function f ( x ) x 3 2 x 2 10 x 5 at x 2 .
Note that the analytical derivative is f ( x ) 3x 2 4 x 10 and f ( x 2) 14 . The table below shows the calculations
f f exact
using Eqn. (6.2.11). The relative error is defined as FD where f exact 14 .
f exact
h f(x+h) f(x) f'(x) Rel Error
1.000000E-15 1.5000000000000E+01 1.5000000000000E+01 1.065814E+01 -2.387042E-01
1.000000E-10 1.5000000001400E+01 1.5000000000000E+01 1.400000E+01 8.274037E-08
1.000000E-08 1.5000000140000E+01 1.5000000000000E+01 1.400000E+01 6.610792E-09
1.000000E-05 1.5000140000400E+01 1.5000000000000E+01 1.400004E+01 2.857156E-06
1.000000E-02 1.5140401000000E+01 1.5000000000000E+01 1.404010E+01 2.864286E-03
1.000000E-01 1.6441000000000E+01 1.5000000000000E+01 1.441000E+01 2.928571E-02
The following table shows the difference in results using the three techniques. The central difference technique is by far the
most accurate with the smallest error.
h FD BD CD Best
1.000000E-15 -2.39E-01 1.51E-02 -1.12E-01 BD
1.000000E-10 8.27E-08 8.27E-08 8.27E-08 All
1.000000E-08 6.61E-09 6.61E-09 6.61E-09 All
1.000000E-05 2.86E-06 -2.86E-06 1.29E-11 CD
1.000000E-02 2.86E-03 -2.85E-03 7.14E-06 CD
1.000000E-01 2.93E-02 -2.79E-02 7.14E-04 CD
The first step is to create the function prototypes for the three methods.
NumDerivative.h
A new concept is used here – function pointers. We will discuss details of this approach at the end of this chapter. Each function
has 4 arguments – the location at which the derivative is computed, the selected function (number) from a list of functions, the
spacing, and a pointer to a user-defined function. The last argument – double(*userfunc)(int fnc, double dX), denotes a
function whose return type is double and accepts two arguments (the selected function from a list of functions whose derivative
is to be computed and the location at which the derivative is to be computed. The syntax *usefunc indicates a pointer to a
function. We will see more about pointers in Chapter 8. The next step is to define the three difference methods.
NumDerivative.cpp
Eqn. (6.2.11) is coded in lines 7-15, Eqn. (6.2.12) in lines 27-35, and Eqn. (6.2.13) in lines 17-25. Finally, the main program in
which the problem is defined and the solution techniques are called, is developed.
main.cpp
Lines 16-25 are used to define the two functions supported in the program. The function, MyFunction, will be called in the main
program when the evaluation f ( x ) needs to take place. The two arguments are the function to be used and the location, x ,
at which the function is being evaluated.
Examples 6.2.1-6.2.3 are replicated in the program. Lines 29-32 are used to help track the method being used. Line 34 shows
the different values of the spacing variable, h used in the program. The location at which the function is evaluated is specified
in line 35. The exact solution that is used later in the program to assess the accuracy, is defined in line 36.
The method that yields the best estimate of the derivative and the associated values are tracked via variables dBest, dBestDeriv,
dBesth and BestMethod (lines 45, 46, and 48). The number of different spacing values stored in vector dhTrial is found simply
by dividing the storage size of the vector (found by using the sizeof function) by the storage size of one of the elements of the
vector in line 47. Alternately, we could have defined the number of elements simply as const int nElements = 6. Note how
the evaluation of the derivative takes place in lines 52, 61 and 70 via the calls to the three different techniques. The fourth
argument is the name of the function in which the evaluation f ( x ) needs to take place.
Details of each evaluation are displayed (lines 79-91). Finally, in lines 94-97, the best estimate is displayed. The program output
is shown in Fig. 6.2.1.
We will assume that we either know the function F ( x ) that is difficult to integrate exactly or that we have a set of discrete data
(points where the function has been evaluated numerically). In either case, we need to develop a numerical technique to evaluate
the integral.
The basic idea in numerical integration is to construct another function P ( x ) (usually a polynomial) that is a suitable
approximation of F ( x ) and is simple to integrate. The interpolating polynomial of degree n, denoted Pn , is such that it
interpolates the integrand at (n+1) points in the interval a , b . While there exist errors, E F ( x ) Pn ( x ) , the error may not
always be of the same sign so that the overall error is small.
F(x)
Pn (x)
F(x)
x
a=x0 x1 x2 x3 x4 b=x5
Fig. 6.3.1 Original function and its polynomial approximation
Hence the equivalent integral is given by
b b
I F ( x ) dx Pn ( x ) dx (6.3.2)
a a
Fig. 6.3.1 shows the situation where the discrete values are known at 6 points and the approximate function Pn ( x ) is made to
pass through those six points. In more general terms, we could have a number of scenarios.
(1) We know the data at exactly (n+1) points and we fit a polynomial of degree n that passes through those points (as
shown in Fig. 6.3.1).
(2) We know the data at more than (n+1) points and we fit a polynomial of degree n using a concept such as least-squares
fit. We will look at least-squares fit in Chapter 11.
(3) If, on the other hand, we know the function F ( x ) , then we can evaluate the function at (n+1) points and use the
approach associated with scenario (1).
Newton-Cotes
When the function to be integrated is known at equally spaced points, we can use the forward difference polynomial (see Section
6.2) and fit the data. Recall that
s s 1
Pn x 0 sh f 0 s f 0 2 f 0 ...
2
(6.3.3)
s s 1 s 2 ...[ s n 1) n
f 0 error
n!
where
x x 0 sh (6.3.4a)
s n 1 n 1
error h f x0 x xn (6.3.4b)
n 1
Hence,
b b s(b )
I F ( x ) dx Pn ( x ) dx h P ( s ) ds
n (6.3.5)
a a s(a )
We can match the limits of integration by recognizing that the point x a corresponds to s 0 and x b corresponds to
s s . Hence using Eqn. (6.3.5), we have
s
I h Pn ( x 0 sh ) ds (6.3.6)
0
The value of n determines the different Newton-Cotes scheme and hence the obtained precision.
Trapezoidal Rule
We obtain this rule by fitting a linear polynomial to two discrete points. From Fig. 6.3.2, we need to compute the shaded area.
The upper limit of integration x 1 corresponds to s 1 . We can rewrite Eqn. (6.3.6) as
x1 x1
s2
I 1 h ( f 0 s f 0 ) ds h sf 0 f 0 (6.3.7)
x0 2 x0
where the left hand represents the integral only for the first interval. Denoting f 0 f 1 f 0 , we can rewrite the above equation
as
1
I1 h f 0 f1 (6.3.8)
2
and represent the entire integral as
n n
1
I I i hi f i 1 f i (6.3.9)
i 1 i 1 2
where
hi x i x i 1 (6.3.10)
We can further simplify the formula is we assume that the points are equally spaced.
1
I h f 0 2 f 1 2 f 2 ... 2 f n 1 f n (6.3.11)
2
F(x)
P1 (x)
x0 x1 x2 xn-1 x
x3 xn
Fig. 6.3.2 Trapezoidal Rule
We can compute the error by integrating Eqn. (6.3.4b) as follows.
s s 1 2
1
1
Error h h f ( )ds h 3 f ( ) O h 3 (6.3.12)
0
2 12
The total error for equally space data is given by
n n
1 1
Error 12 h
i 1 i 1
3
f ( ) n h 3 f ( )
12
(6.3.13)
xn x0
where x 0 x n . The number of increments n . Therefore
h
1
Total Error x n x 0 h 2 f ( ) O( h 2 ) (6.3.14)
12
x dx . The exact answer is 0.4. We will compute the integral for different values of sampling points, n 1 . Note
4
Evaluate
1
b a
that h .
n
1 ( 1) 1
(a) n 1 , h 2 . I (2) f 0 f 1 1 1 2
4 4
1 2
1 ( 1) 1 1
(b) n 2 , h 1 . I (1) f 0 2 f 1 f 2 1 2 0 1 1
4 4 4
2 2 2
1 ( 1)
(c) n 4 , h 0.5 .
4
1 1
I (0.5) f 0 2 f 1 2 f 2 2 f 3 f 4 1 2 0.5 2 0 2 0.5 1 0.5625
4 4 4 4 4
2 4
Simpson’s Rule
We obtain this rule by fitting a quadratic polynomial through three equally spaced points. Fig. 6.3.3 shows the shaded area arising
from this computation. We can write the integral as
s s 1 2
x
2
1
I 1 h f 0 s f 0 f 0 ds h f 0 4 f 1 f 2 (6.3.12)
x0
2 3
where the left hand represents the integral only for the first interval. As before we can represent the entire integral as
1
I h f 0 4 f 1 2 f 2 4 f 3 ... 2 f n 2 4 f n 1 f n (6.3.13)
3
F(x)
P2 (x)
x0 x1 x2 xn-1 x
x3 xn
x
4
Evaluate dx . We will write a computer program to evaluate the integral for different values of sampling points, n ,
1
1 1
x5
1 n 1024 . Note that x dx 0.4 .
4
1
5 1
main.cpp
We will write a slightly more general program to integrate two functions that can then be extended by the reader with support
for other functions. The function enhancement is shown in lines 16 through 24. The second integral is
2
2x 5 x 3
1 x 2 dx 9 ln 2 8.30685
Initialization of the program and the two methods takes place in lines 29 through 46. Line 29 can be edited to select the function
for integration. The number of supported functions is in line 30. Line 31 is used to specify the number of points used in
evaluating the integral. The limits of integration are stored in dVLow and dVHigh, and used in lines 45-46.
The main loop where the integral is computed starts in line 47 and extends to line 71 with the evaluation of the function taking
place first at the left end (line 51), then at the interior points (line 57 for Trapezoidal Rule and line 59 for Simpson’s Rule), and
finally at the right end (line 62 for Trapezoidal Rule and line 65 for Simpson’s Rule).
Gauss-Legendre Quadrature
The traditional Newton-Cotes techniques such as Trapezoidal Rule or Simpson’s Rule are not as efficient or accurate as Gauss-
Legendre Quadrature (G-LQ).
In the G-LQ technique, the base points x i and the weights w i are chosen so that the sum of the (n+1) appropriately weighted
values of the function yields the integral exactly when F ( x ) is a polynomial of degree (2n+1) or less.
b 1 ^ n ^
F ( x ) dx F d wi F i
a 1 i 1
(6.3.14)
where i are the base points (or, roots of the Legendre polynomial Pn 1 ( ) ), and
dx ^
F ( x )dx F ( x ( )) d F ( x ( )) J ( )d F ( )d (6.3.15)
d
where J is the Jacobian. The following two points should be noted.
(a) Gauss-Legendre is more efficient because it requires fewer base points to achieve the same level of accuracy as the
Newton-Cotes methods, and
(b) The error is zero if the (2n 2)th derivative of the integrand vanishes. Or, a polynomial of degree n is integrated exactly
by employing ( n 1) 2 Gauss points.
To understand how Gauss-Legendre works, consider a rewrite of Eqn. (6.3.14)
1
I f ( )d w1 f (1 ) w 2 f ( 2 ) ...... wn f (n ) (6.3.16)
1
One-Point Formula
1
I f ( )d w
1
1 f (1 ) (6.3.17)
The integration is exact if f is a linear polynomial, i.e. f a 0 a1 . Hence the error, e is given by
1
e ( a 0 a1 )d w1 f (1 ) 0 (6.3.18a)
1
Two-Point Formula
1
f ( )d w
1
1 f (1 ) w 2 f ( 2 ) (6.3.19)
The integration is exact if f a 0 a1 a 2 2 a 3 3 is a cubic polynomial. Hence the error, e is given by
1
e ( a 0 a1 a 2 2 a 3 3 )d w1 f (1 ) w 2 f (2 ) 0 (6.3.20)
1
2
or, e 2a 0 a 2 w1 ( a 0 a11 a 212 a 313 ) w 2 ( a 0 a1 2 a 2 22 a 3 23 ) 0
3
2
or, e a 0 2 w 1 w 2 a 1 w 11 w 2 2 a 2 w 112 w 2 22 a 3 w 113 w 2 23 0
3
The error is zero if
w1 w 2 2 (6.3.21a)
w 11 w 2 2 0 (6.3.21b)
2
w112 w 2 22 (6.3.21c)
3
w 113 w 2 23 0 (6.3.21d)
1
Solving, w 1 w 2 1 and 1 2 . The results of the derivation and more are summarized in Table 6.3.1.
3
x
4
Evaluate dx . The exact answer is 0.4.
1
Example 6.3.5
7 7
1 1
Evaluate dx . Exact: I dx ln(7) 1.94591
1
x 1
x
To use G-Q Rule we must first construct a mapping function to map the given domain 1, 7 to the required domain 1,1
. The mapping function is
Two-Dimensional Functions: Functions that involve two independent natural coordinates are handled in a manner similar
to one-dimensional functions.
d b 1 1
F x , y dx dy
c a
F x ( , ), y( , ) J d d
1 1
n n
w i w j f i , j (6.3.16a)
j 1 i 1
where f i , j F x ( , ), y ( , ) J (6.3.16b)
x y
J det( J ) where J 22 (6.3.16c)
x y
The values of the weights and natural coordinates are the same as shown in Table 6.3.1 except that in the table refers to both
and .
Example 6.3.6
1 1
( n 1) (2 1)
The answer obtained is the exact answer. Once again, the appropriate quadrature order is 2 and using the
2 2
rule leads to the exact answer.
Example 6.3.7
3
Evaluate I x y dxdy
Rxy
1 Rxy
x
1 2 3
Analytical Approach: The sides of the domain are not parallel to the axes making it difficult to set the limits of integration.
However, the sides are such that
x y c1 and x 2 y c2
Therefore, we can introduce two new variables and set up a mapping such that
u x y and v x 2 y
The transformed domain is shown below.
v
u
1 4
2 3
-1
Ruv
-2
3 3
Hence, I x y dxdy u det( J )dudv
Rxy Ruv
( x , y ) 1 1 1
det( J )
( u , v ) ( u , v ) 1 1 3
( x , y ) 1 2
1 4
3 u3 765
Substituting, I u det( J )dudv 3 dudv 63.75
Ruv 2 1 12
Numerical Approach: We will construct the appropriate mapping functions to map the given integration domain into a square
[ 1,1], [ 1,1] . Using the following mapping functions
4 4
x i , x i y i , y i
i 1 i 1
1 1
where 1 1 1 2 1 1
4 4
1 1
3 1 1 4 1 1
4 4
The given domain is such that x 1 , y1 1, 0 , x 2 , y 2 3,1 , x 3 , y 3 2, 2 and x 4 , y 4 0,1 . The jacobian
can be constructed as
1 4 2 3
J 2 2 det( J )
4 2 2 4
(a) We will use the one-point rule first.
i j 1 w i w j 2.0 , 0, 0
i j
4 4 1 1 5
x y i x i i y i 1 3 2 0 0 1 2 1
i 1 i 1 4 4 2
3
5 3
I 2 2 46.875
2 4
(b) Now the two point rule. The details of the calculations are shown below.
i j 1 2 3 4 x y
wi wj x y
3
-0.57735 -0.57735 1.0 1.0 0.622008 0.166667 0.044658 0.166667 1.21133 0.42265 4.362508
0.57735 -0.57735 1.0 1.0 0.166667 0.622008 0.166667 0.044658 2.36603 1 38.13748
0.57735 0.57735 1.0 1.0 0.044658 0.166667 0.622008 0.166667 1.78868 1.57735 38.13748
-0.57735 0.57735 1.0 1.0 0.166667 0.044658 0.166667 0.622008 0.63398 1 4.362508
I det( J ) x y
3 63.75
The solution is in the use of function pointers and callback functions. We will see pointers in Chapter 8 and understand how
best to use them. However, at this stage it should not prevent us from using the concept in developing and writing effective,
general-purpose source code. We will illustrate the idea using the Newton-Raphson technique and an example.
Example 6.4.1
We will illustrate the ideas for a general-purpose numerical analysis interface for obtaining values from user-defined code
through the use of Newton-Raphson technique. The specific example is to compute a root of the function
f ( x ) ( x 2.3)( x 4.56)( x 3.7)
Step 1: Construct the gateway to Newton-Raphson technique. By this we mean, we will develop the interface or function
prototype.
void NewtonRaphson (double& dRoot, int& nMaxIter,
const double dConvTol, int fnc,
void(*userfunc)(double dX, double& dFX, double& dDX));
The function arguments are as follows.
dRoot Root. Input is the initial guess, and the returned value is the estimate of the root.
nMaxIter Input: Maximum number of iterations; Output: Actual number of iterations taken.
dConvTol Convergence tolerance.
fnc Function from a list of functions to solve.
*userfunc Pointer to the function that will be called to compute the function value (dFX) and derivative value (dDX) at
the current point (dX). One must pay particular attention to this function – (a) the function prototype is like
any other function, and (b) it helps to have the minimum number of arguments in the function call. In this
case, the function is passed the value of the current point and the function returns the function and the first
derivative values at the current point.
The function prototype is as follows (newtonraphson.h):
Note the enumerated type in line 6 that will be used to handle exceptional situations connected with various aspects of the
technique – invalid user input, encountering a zero (or numerically very small) derivative during one of the evaluations of the
derivative, and not being able to converge to the root within the specified number of iterations. As we will see, adding the class
keyword with the enumerated type declaration will make it easier to refer to a specific exception especially in large programs.
Step 2: Create the function that implements the Newton-Raphson technique (newtonraphson.cpp):
The first check for correctness of input is in lines 17-18 where an exception is thrown if the maximum number of iterations is
less than or equal to zero. The iterative loop is between lines 21 and 44. If the program is unable to compute the root within
those statements, an exception (non-convergence) is thrown in line 47. The user-defined function is called in line 24 that uses
df ( x )
the current value of the root, dRoot, as x and computes f ( x ), . Convergence check is carried out in lines 34-35 and if
dx
one of the conditions are met, the results are stored in dRoot and nMaxIterations, and control is returned to the calling program
(main).
Step 3: Create the code to test the technique (main.cpp). The first sub-step is to create the user-defined function, MyFunction.
Recall that the user-defined function is used as the fifth argument in the call to the NewtonRaphson function. We have selected
the name of the function as MyFunction.
The main program is divided into three parts. The first part is the initialization part contained in lines 34-37. These lines should
be edited as required. The second part is the try block. In this block, first the user input is obtained in lines 42-47. Next the
NewtonRaphson function is called. If the execution is successful, the results are shown on the screen (lines 52-54). The third part
contains the catch blocks – a block to handle system-generated errors (logical errors such as invalid argument, out-of-range
when using containers, etc., run time errors such as overflow, underflow, etc., and many more).
Here is a brief list of the improvements that can be made to the program:
(1) Are there other user input checks that can be carried out in the NewtonRaphson function? Implement them by throwing
an appropriate exception.
(2) Are there checks that can be carried out in the MyFunction function? Implement them by throwing an appropriate
exception.
(3) Are there checks that can be carried out in the main function? Implement them by throwing an appropriate exception.
Section 13.4 has more advanced scenarios involving object-oriented design that we will see and use at the appropriate time.
Summary
We looked at three very useful numerical techniques – finding roots of nonlinear equations, numerical differentiation, and
numerical integration. Note that the techniques are quite general with minimal restrictions. The Bisection and the False Position
methods require that the function be continuous in the interval containing the root. The Newton-Raphson method requires
that the function and its first derivative be continuous in the interval containing the root. The Brent Method is a derivative free
method and only requires that the function be continuous in the interval containing the root. Similarly, all the numerical
derivative techniques and integration techniques require that the function be continuous. This generality in terms of handling
any function requires that the program be written in such a manner that the user of the technique has little difficulty in providing
the value(s) needed by the technique. While some of the more advanced C++ concepts that facilitate this usage have not been
discussed so far, Section 6.4 shows how this can be done now.
Exercises
Most of the problems below involve the development of one or more functions. In each case (a) develop a plan to
test the function(s), and (b) implement the plan in a main program. The functions should not use cin or cout
unless specified. Put the main program in a separate file and the function(s) in separate files.
Appetizers
Problem 6.1
Solve for the roots of the following functions using Bisection Method.
(a) Solve for all the positive roots of the function f ( x ) x 3 2 x 2 6x x 6 .
(b) Solve for the negative root of the function f ( x ) ( x 1)( x 2)( x 3) .
(c) Solve for the roots of the function f ( x ) 0.3 cos( x ) .
Problem 6.2
Solve for the roots of the functions given in Problem 6.1 using the Newton-Raphson Method.
Problem 6.3
Compare the first derivatives computed using forward difference, central difference and backward difference for the following
functions.
(a) f ( x ) x 3 2 x 2 10 x 5 at x 4 .
(b) f ( x ) x 3 2 x 2 6x x 6 at x 2 .
2x 3 4 x 10
(c) f (x ) at x 5 .
x 1 x 2
Problem 6.4
Compute the following integrals using Trapezoidal Rule with n 2, 4,8,16,...256 .
2 4
x 2x 10 dx
1
x
2
(a) (b) dx
0 2
2
2 x 10
Problem 6.5
Compute the integrals given in Problem 6.4 using Simpson’s Rule.
Main Course
Problem 6.6
Program the Bisection Method. Use the problems in Problem 6.1 as test cases.
Problem 6.7
Program the forward, backward and central difference techniques. Compare their performances by using the problems in
Problem 6.3 as test cases.
Problem 6.8
Modify the program Example6_3_2 by adding the Gauss-Legendre Quadrature technique. Use the program to compare the
performances of Trapezoidal Rule, Simpson’s Rule and the Gauss-Legendre Quadrature by using the functions in Problem 6.4
as test cases.
References
Atkinson, An Introduction to Numerical Analysis, Wiley, 1978.
Burden and Faires, Numerical Analysis, PWS-Kent, 1988.
Press, Flannery, Teukolsky and Vetterling, Numerical Recipes in C, Cambridge Press, 1988.
Mathews and Fink, Numerical Methods Using Matlab, Prentice-Hall, 1999.
Chapra and Canale, Numerical Methods for Engineers, McGraw-Hill, 2002.
Schilling and Harris, Applied Numerical Methods for Engineers Using Matlab and C, Brooks/Cole, 2000.
Rao, Applied Numerical Methods for Engineers and Scientists, Prentice Hall, 2002.
Chapter
“Reallearningcomesaboutwhenthe competitivespirithasceased.”J.Krishnamurti
“Computerscienceisnomoreaboutcomputersthanastronomyisabouttelescopes.”EdsgerDijkstra
Classes and objects were introduced in Chapter 1. It cannot be overemphasized that proper definition and use of classes can
lead to increased productivity with all aspects of software engineering. In this chapter, we will begin the long, systematic process
of understanding what classes are and how to use them effectively in simple and complex computer programs. In other words,
we will try to understand why object-oriented programming (OOP) is the choice for developing useful programs. More
advanced concepts will be covered in Chapters 8, 9 and 13.
Objectives
To understand what objects and classes are.
To understand what data abstraction and encapsulation are.
To learn how to define and use classes.
To leverage the functionalities provided by the functions in the book’s library directory to write more robust computer
programs.
To learn more about the standard string class.
To understand and begin to use exception handling in OOP.
a
1
3m
2 3 4 A
y 2
3m
a
Y
x
(a) 3 a
X
4m A
Solution: We will now develop the general algorithm to meet our objectives.
Variable Dictionary
nPoints Number of points
nLines Number of lines
Algorithm
(1) Obtain the total number of points, nPoints, and total number of (straight) lines, nLines.
(2) Obtain the x , y coordinates of each point. Obtain the start point and end point numbers for each line. Compute the
smallest and the largest x , y values as x min , x max and y min , y max using all the points.
(3) Compute the scale for the graph based on x min , x max and y min , y max values and the dimensions of the graphing
canvas a a so that the truss would fit completely on the graphing canvas. For an anisotropic scaling, the scaling values
are different in the x and the y directions. For an isotropic scaling, there is one value that is the smaller of the two scaling
values.
(4) Loop through all the lines, i .
(5) For the current line, obtain the start point number. Compute the graph coordinates x gs , y gs . Move to x gs , y gs .
(6) Obtain the end point number. Compute the graph coordinates x ge , y ge . Draw the line to x ge , y ge .
(7) End loop i .
We will now tackle the problem of translating this algorithm into a computer program. An examination of the algorithm shows
that there are two major pieces of information (entities) that we must handle – data associated with points, and data associated
with lines. Several questions arise naturally. The most obvious one is “What data structure should be used?”, or “How should
the problem data be stored?”. Let us assume that we store the point and line data in arrays (vectors to be specific) as shown
below. We will name this approach Array-based Solution. The index of the vector will provide the data access mechanism.
Array-based Solution
Array Size Remarks
fVX nPoints Vector containing the global x-coordinates of all the points
fVY nPoints Vector containing the global y-coordinates of all the points
nVSP nLines Vector containing the start point number of all the lines
nVEP nLines Vector containing the end point number of all the lines
Since we have not defined the details of the drawing-related functions – scaling, computing the graph coordinates, move and
draw, we will simply provide their prototypes as shown in lines 18-23. Recall how C++ treats variables and functions – they
must be defined before they can be used. We will store the arrays in C++’s vector container. A container, as the name suggests,
not only provides storage space to store different data types but also provides operations that make data manipulation easy to
implement. The primary advantage of the vector container over vector declarations that we have seen earlier, e.g. double x[N];
is that x is statically allocated with a size that must be known when the program is written, not when it is executed. Hence, N
must be an integer constant. This is not desirable when writing general-purpose programs when the size of the model is an
unknown quantity. The vector class is used to store the four vectors as shown in lines 41 and 59. Note that vector indexing
starts at 0 and goes up to N‐1, e.g. see lines 49 and 54.
As we have seen before, the values of the point coordinates and the start and end points of lines are obtained interactively using
the GetInteractive function. Some amount of error checking is carried out, but by no means is the error checking
comprehensive.
Once the input data is in place, the scale factor is computed (Step 3 of the algorithm) as seen in line 81 using a call to (as yet
undeveloped) function ComputeScale.
Steps 4 through 7 are implemented in lines 84 through 98. Three other undeveloped functions GraphCoordinates, Move and
Draw are used to compute the graph coordinates given the point coordinates, move to a location on the graph, and draw to a
location on the graph. One can imagine the Move function as moving to a point on the graph with the pen up (or, raised) and
the Draw function as moving to a point on the graph with the pen down.
Strengths of the Array-Based Solution
Easy to understand, visualize and code.
Vector data structure requires minimal execution time and storage.
Weaknesses of the Array-Based Solution
Typical engineering programs have tens if not hundreds of arrays. The management of all the arrays can be problematic.
Extension of the program from two to three-dimensions by the simple addition of z coordinate values is likely to cause
ripple effects throughout the program. Line 41 needs to be changed. The block of three lines to obtain the z-coordinate
needs to be added. The z coordinate values need to be considered in the scaling factor function. Similarly, the
GraphCoordinates function needs to be changed. The process of transforming a three-dimensional object to an equivalent
view in a two-dimensional plane involves several more steps than just scaling and translation.
Similarly, if newer objects such as curved beams (described by three points not two), are brought into the program, the
programmed drawing logic needs drastic changes.
Finally, for engineering applications, it is mildly stating, awkward to count starting at zero. This is one of the drawbacks of
the vector class.
Data Abstraction
As program logic becomes more complex, it is certainly helpful to break the logic into smaller pieces. These smaller pieces are
typically implemented in well-defined functions. Modularization in program development also has other beneficial effects –
testing and debugging of smaller components is much easier, several different programmers can work simultaneously on a
project with minimal interaction or overlap, and program reuse is possible. However, as the complexity of data flow in a program
increases, the task of program development and maintenance can become expensive and fraught with dangers. It is now
necessary to think about program development in a completely different way. This does not imply that whatever we have learnt
in the preceding chapters is incorrect or useless. As we will learn in this chapter, it is easy to build on what we have learnt by
simply reorganizing our thought process and learning new language constructs.
Ideally, programmers would like to define their own data types depending on the type of application program. These data types
would be built using C++ built-in data types. The thought process to create these user-defined types is known as data
abstraction. Data abstraction is defined as separating the overall properties of a data type from its implementation. The
mechanism to implement the data abstraction is called encapsulation. Encapsulation also referred to information hiding, refers
to the technique by which data attributes and behavior-related operations are linked together such that the data can be
manipulated only through these operations not directly. The program or code that stores the data and implements the behavior
is called the server code.
Let us look at a simple example having to store and manipulate information dealing with points defined in a two-dimensional
space. Let this space be defined as the usual cartesian x-y space. Each point will be defined in terms of its x and y coordinates.
To store the two coordinates, we will use the standard float type. Recall from Chapter 1 that objects are identified by a name,
have attributes (defined as having properties) and behavior (capabilities to do something). The attributes of the point object are its
(x,y) coordinates. What are some of the data manipulations that we may want to carry out with points in a two-dimensional
space? For example, we may want to store and retrieve these values. We may also want to check whether the point is at the
origin of the coordinate system, to compute what is the distance from this point to another point, etc. These are the type of
information that a programmer needs to know to write a program that uses the point-related information. Such a code (that
uses the point information) is called the client code. The client code needs to know how to use the information but not how
the information is stored or how the functionalities (behavior) are implemented. Encapsulation also referred to information
hiding, refers to the technique by which data attributes and behavior-related operations are linked together such that the data
can be manipulated only through these operations not directly. The program or code that stores the data and implements the
behavior is called the server code.
In C++ terminology, point (or whatever name we assign) is a class. This class is a user-defined data type similar to the built-in
data types that C++ provides such as int, float, etc. We associate objects with a class similar to the way we have been
associating variables to the built-in data types. The client code declares objects associated with classes and uses the operations
permitted by the class definition and implemented in the server code, to manipulate the information stored in the objects. One
can now appreciate why information hiding is useful. First, the programming task can be very nicely divided. Programmers who
have the knowledge and expertise in writing the client code can concentrate on getting their job done without having to worry
about the implementation details. These are left to the experts who know more about the class attributes and its behavior.
Second, the client code cannot inadvertently or otherwise make errors in setting or changing the values of the data. For example,
if the point class is designed for points in the positive (x,y) space, then the error detection can be easily implemented in the
server code and an appropriate action can be taken. Third, by separating the behavior from implementation, we minimize the
impact that changes in behavior would have in the maintenance of a program. Let’s go back to the point example. Let us assume
that users of the class now request that they would like to look at points in a cylindrical coordinate system through two attributes
r , . Does this imply that we would have to rewrite the existing client code, or can we make changes to the server code (point
class) such that this new functionality is available without comprising on the existing functionalities?
In the rest of the chapter, we explain the “how and why” of object-oriented programming using C++.
class CPoint
{
public:
// constructors
CPoint (); // default
CPoint (float, float); // overloaded
// helper function
void Display (const std::string&);
// modifier function
void SetValues (float, float);
// accessor function
void GetValues (float&, float&);
private:
float m_fXCoor; // stores x‐coordinate
float m_fYCoor; // stores y‐coordinate
};
Constructors are special member functions that do not require a return type. They have the same name as the class name. They
are automatically called when an object associated with the class is declared and created. They can be overloaded just as regular
functions. They are optional. If you do not define a constructor in your class definition, then C++ provides a constructor that
essentially does nothing as far as your specifications are concerned. Constructors are typically used to initialize the member
variables of the class. There is another special member function called the destructor (dtor for short). The destructor has the
same name as the class, does not have a return type, cannot be overloaded, is not required to be declared or defined, and is
automatically invoked when the object associated with the class goes out of scope. As an example, the destructor for the CPoint
class can be declared as follows.
public:
// constructors
CPoint (); // default
CPoint (float, float); // overloaded
// destructor
~CPoint ();
The ~ (tilde) symbol is used before the class name to denote the destructor. The default specification for member variables and
functions is private. In other words, if the keyword public or private is not used, the compiler assumes that the member function
or member variable is private. In the CPoint class, two constructors are used. The first (without any parameters) is called the
default constructor. We will use this constructor to set both the coordinate values to zero. In addition to the two constructors,
there are three public member functions. The Display function is designed to display the coordinates using standard output.
The SetValues and the GetValues functions are designed to redefine the coordinates and to obtain the coordinates, respectively.
There are no public member variables. As we saw with data abstraction, class declarations typically do not have public member
variables. The two variables that store the coordinates are declared as private variables. They cannot be accessed in any program
component outside of the five member functions in the CPoint class. We could create exceptions to this rule as we will see in
Chapter 9. There are no private member functions in the CPoint class declaration. The keywords public and private appear
only in the class definition. To implement (or define) these functions, we need to create the appropriate C++ statements. Here
are those statements (usually contained in a source file, say point.cpp).
#include <iostream>
#include <string>
#include "point.h"
// default constructor
CPoint::CPoint ()
{
// coordinates initialized to zero
m_fXCoor = 0.0f;
m_fYCoor = 0.0f;
}
// overloaded constructor
CPoint::CPoint (float fX, float fY)
{
// coordinates set to fX and fY
m_fXCoor = fX;
m_fYCoor = fY;
}
// modifier function
void CPoint::SetValues (float fX, float fY)
{
// coordinates set to fX and fY
m_fXCoor = fX;
m_fYCoor = fY;
}
// accessor function
void CPoint::GetValues (float& fX, float& fY)
{
// coordinates returned in fX and fY
fX = m_fXCoor;
fY = m_fYCoor;
}
// helper function
void CPoint::Display (const std::string& strBanner)
{
// display the current coordinates
std::cout << strBanner
<< "[X,Y] Coordinates = ["
<< m_fXCoor << ","
<< m_fYCoor << "].\n";
}
Note the difference between the definition of a regular function and a member function that belongs to a class. The member
function definition needs a qualifier - the name of the class and the scope operator ::. The statements in each member function
are just as they would appear in any function with the difference that the member variables are declared in the class definition
and hence should not be defined in the body of the member function. It would be incorrect to write the function SetValues
as follows.
void CPoint::SetValues (const double dX, const double dY)
{
double m_dXCoordinate; // local to this function!
double m_dYCoordinate; // local to this function!
m_dXCoordinate = dX;
m_dYCoordinate = dY;
}
With the (incorrectly defined) function shown above, the values of dX and dY are assigned to the local variables m_dXCoordinate
and m_dYCoordinate not the variables (with the same name) that are a part of the CPoint class! Recall the scope rules from
Chapter 4.
The next obvious question is how can the class be used in an application program? We illustrate the usage using a simple
example.
Example Program 7.2.1 Using the CPoint class
Here is the main program that illustrates the usage of the CPoint class. We will use two objects associated with the CPoint class
– Origin and CarCoords. Origin will be declared using the default constructor and CarCoords will use the overloaded constructor.
The user will be prompted to enter the x, y coordinates and these values will be used with the SetValues member function to
set the user-defined coordinate values. The verification of the coordinate values will take place using the Display member
function. The usage of the GetValues function is left as an exercise.
main.cpp
An object is declared just as any other variable in a program. In line 18, the object Origin is declared, and the coordinate values
are initialized to zero in the default constructor, a convenience that cannot be overlooked. This statement will not compile if a
default constructor is not defined – it is a good practice to define a default constructor for all classes one codes. In line 21, the
object CarCoords is declared, and the coordinate values are initialized using the overloaded constructor. Note how the member
functions are used in the program. In line 24, the Display member function is called. The general usage in using a member
function is
object.memberfunction (parameter list);
not
memberfunction (parameter list);
The . is referred to as the member selection operator and is used in accessing the member variables and member functions
outside the class. Every object is closely tied to the variables associated with the class. In other words, unless otherwise specified,
an individualized copy of the member variables is created for every object. However, only one copy of the member function is
created that can then be used by all the objects associated with that class.
The use of private variables precludes its access outside the class. For example, we cannot write the following statements in the
main function
CarCoords.m_fXCoor = fV[0]; CarCoords.m_fYCoor = fV[1];
instead of the original statement (line 31)
CarCoords.SetValues (fV[0], fV[1]);
Similarly, private member functions cannot be accessed outside the class. Let us review the important facts about class
definitions and usage.
1. The class definition is usually contained in a header file. A complete class definition requires that the member functions
and variables be defined within the {}; including the semicolon at the end.
2. Member variables and functions are, by default, private.
3. Definition of the constructor is optional. C++ defines a constructor if one is not defined. The constructor does not
have a return type and has the same name as the name of the class.
4. It is a good idea to define a default constructor.
5. Declaring a public member variable should be done with care. Unless the design of the class calls for a public member
variable, declare all member variables as private. Private member functions and variables cannot be accessed outside
of the class.
6. Member functions are defined using the scope resolution operator :: and they are referenced outside the class using
. the member selection operator.
We will now look at another example where (a) public and private member functions are used, and (b) some rudimentary error
trapping is necessary for the program to work correctly.
Internally the time is stored in a 24-hour format. To help in computing the time difference, we have defined a public helper
function TimeDifference that uses two private helper functions to convert the time from the 24-hour format to seconds and
back (ConvertToSeconds and ConvertFromSeconds). In line 17 the output styles for displaying the time, e.g. hh:mm:ss, and hh
Hour(s) mm Minute(s) ss Second(s), are defined. The four public modifier functions have a return type of int. In the program
it is assumed that a return value of 0 denotes no error and a return value of 1 denotes an error. As we will see, this mode of
error checking is not desirable, and there are several improvements that can be mode to the program.
time.cpp
The default constructor is listed in lines 13 through 18. The default time is set as midnight. The overloaded constructor (lines
20-26) has three parameters for the hour, minute and seconds. The constructor does not check the input for errors since it is
not clear what to do if the specified time values are not valid. The destructor has no statements and can be deleted both from
the header and the source files. However, it is recommended that the destructor be included with every class just in case the
destructor may be needed for future enhancements to the class.
The Print member function prints the time as hour:minute:second (if Style is USINGCOLONS) or as hour Hour(s) minute
Minutes(s) second Second(s) (if Style is USINGDESCRIPTORS).
Time difference between the two time objects is computed in the TimeDifference function where it is assumed that if the time
difference is negative, then the end time has crossed midnight. For example, if the start time is 20:10:10 and the end time is
01:10:10, then the elapsed time is 5 hours. The TimeDifference member function illustrates how private member functions can
be used. In lines 49 and 50, the start time (or from time) and the end time (or to time) are converted from the 24 hour values
into seconds elapsed since midnight. If the two time values span midnight, then the adjustment is made in lines 54-55. Finally,
in line 58, the time difference in seconds is converted back to the 24-hour format. The private member functions
(ConvertToSeconds and ConvertFromSeconds) are defined in lines 61 through 73. An important question is how do lines 49 and
50 work since the ConvertToSeconds is a private member function? When an argument in a class member function is of the
same class type, the function can access the argument’s private member components (instead of using the public accessor
functions). In this example, the objects TFrom and TTo are both CTime objects. Hence it is legal to use
TFrom.ConvertToSeconds()and TTo.ConvertToSeconds().
Example of public modifier functions are shown in lines 99-119. The input values are checked if they are valid. If they are, the
values are used to modify the (private) member variables and a zero value is returned. If they are not, the current values of the
object are left intact, and a nonzero value (value of 1) is returned. As we will see in the main program, the user of the CTime class
(herein referred to as the client code) will have to assume the responsibility of taking the appropriate action upon error detection.
We will now see a sample main program, contained in file main.cpp, which uses the CTime class.
main.cpp
The objects StartTime and EndTime are declared in lines 19 and 20 and are set as midnight since the default constructor is called.
The overloaded constructor will be used if the following declaration was used instead.
CTime StartTime (2, 15, 30);
Using the GetInteractive function, the start time and end time values are obtained. The modifier function SetTime is used to
set the time values. An error message is displayed if this function returns a nonzero value.
In line 43, a new object TimeDiff is declared. This is the object that will store the time difference between StartTime and
EndTime. The use of the TimeDifference function is illustrated in line 34. Finally, the Print member functions are used in lines
45 through 48 to display the results of the time computations.
Note that we need a semicolon at the end of the definition – a feature not required when the function is defined in the class
source file. We can write the functions GetMinute, GetSecond, SetHour, SetMinute and SetSecond in a similar manner.
Inline Functions
As we saw in Chapter 4, functions make program development easier but at an added price of using additional resources of
storage and execution time. C++ provides a mechanism by which this execution time overhead can be reduced through the
use of inline functions identified by the inline keyword. For example, if the GetHour function is repeatedly used in a program,
then a more efficient way of executing the function would be to declare it as
inline int GetHour () {return m_nHour;};
The inline qualifier is merely a suggestion to the compiler. The compiler is able to decide how best to optimize the function
definition. One of the disadvantages of inlining a function is that the compiler inserts the same code at multiple locations where
the function is called thereby making the executable code larger. The inline qualifier can also be used with regular (non-
member) functions.
With this declaration, the two time objects that appear as the function parameters are passed as const references and hence
cannot be modified within the class. When a function is declared as a const as follows
return_value class_name::function_name (parameter list) const;
then it cannot modify the object within the function body. Here is the modified CTime class definition that uses the const
qualifier (note all statements are not displayed).
class CTime
{
public:
CTime (); // default constructor
CTime (int, int, int); // constructor
~CTime (); // destructor
// helper functions
void Print (PStyle Style=PStyle::USINGCOLONS) const;
void TimeDifference (const CTime& TFrom, const CTime& TTo);
void TimeDifference (const CTime& TTo);
// modifier functions
int SetTime (const CTime&);
// accessor functions
void GetTime (int&, int&, int&) const;
void GetTime (CTime&) const;
int GetHour () const;
int GetMinute () const;
int GetSecond () const;
We have also made a simple improvement to one of the member functions. The TimeDifference member function now
contains only one argument. The member function can be rewritten as follows.
void CTime::TimeDifference (const CTime& TTo)
{
// find the difference between the two times in seconds
int nTFrom = ConvertToSeconds();
int nTTo = TTo.ConvertToSeconds();
int nDiff = nTTo ‐ nTFrom;
// adjust if time crosses midnight
if (nDiff < 0)
nDiff = nDiff + 86400;
// now convert from seconds back to hr:min:s
ConvertFromSeconds (nDiff, m_nHour, m_nMinute, m_nSecond);
}
With this definition, the client code that uses this member function can be written as
CTime Time1(10, 55, 0), Time2(12, 45, 10);
Time1.TimeDifference (Time2);
The elapsed time between Time1 and Time2 is computed, and the results are stored in Time1. It should be noted that while C++
does not allow the const qualifier to be used with constructors and destructors, const objects can be initialized using the
constructor.
The objects, m_TimeStamp and m_Location will be contained in the yet undefined class. What is the disadvantage? The
disadvantage is that one needs to be familiar with the CPoint and CTime classes to use them. In other words, instead of developing
and learning about one class that would hold all the information we have now to deal with three classes.
The process of including objects from other classes into a class is known as class composition and the newly defined class is
known as a composite class. In this problem, we will define a new (composite) class called CSensor. This class will contain the
three member variables listed above. We will also define the usual accessor, modifier and help functions to manipulate the data.
No additional features are necessary at this stage. The detailed algorithm is not required since the main program will merely get
the sensor reading information from the user and display the information as a confirmation that the overall procedure is good.
The program uses slightly modified versions of previously defined classes – CTime and CPoint. The interested reader is urged to
look at the source code to see what the changes are. The header file that contains the CSensor class declaration is shown next.
sensor.h
There are three private member variables that store the three pieces of information associated with every reading. These variables
are declared in lines 28-30. To manipulate the information, we use the usual modifier and accessor functions as shown in lines
22 and 25. The Display function is used to display the stored information. The class member functions are shown next.
sensor.cpp
The SetData and GetData member functions are straightforward. Both use the publicly available functions from the CPoint and
CTime classes to modify and to access the data.
The Display member function leverages the display functions from the CPoint and the CTime classes to display the entire sensor
data – location, time and temperature.
Now we are ready to look at the main program used to obtain and store the information.
main.cpp
The first thing to notice is that the sensor data is stored as a vector – line 17. For each of the locations where the data is
supposedly acquired, the data is obtained interactively in lines 32-34. The CPoint object is created in line 23 and the CTime object
in line 24. Finally, in line 43, the values of the ith CSensor object data (SensorData) are set. The newly created data is displayed
one location at a time in line 49.
This simple example serves to illustrate what we have achieved. Instead of having one big class with several components we
have three smaller classes with much smaller components and yet the same functionalities that, if necessary, can be extended
with need and use.
These objects can be initialized only through an executable statement. For example, to initialize the m_TimeStamp variable we
could modify the CSensor constructor as follows.
CSensor::CSensor ()
{
m_TimeStamp.SetTime (10, 0, 0);
}
Or, if appropriate copy constructors exist for the CTime and CPoint classes, then the following construct can be used.
CSensor::CSensor (const CTime& Time, const CPoint& Point) :
m_TimeStamp (Time), m_Location(Point)
{
}
Note that the CSensor constructor is called after all the data member class constructors (CPoint and CTime) are called. We will
see more about composition in Chapter 9.
Copy Constructors
Earlier we saw two versions of the constructor – the default ctor and the overloaded ctor. A copy constructor is a special case
of the overloaded constructor where a copy of an existing object is used to create a new object. Here is an example of a copy
constructor for the CPoint class.
class CPoint
{
public:
CPoint (const CPoint&);
….
}
CPoint::CPoint (const CPoint& P) // copy constructor
{
m_fXCoor = P.m_fXCoor;
m_fYCoor = P.m_fYCoor;
}
The data type or object to be stored in the vector needs to be specified for a template class object to be created. The elements
of the vector can be initialized as the following example shows.
std::vector<int> a(n, ival); // ival is an int variable
In the above example, all the n elements of vector a are initialized with the value contained in ival.
Accessing vector elements
The operator [] can be used to access elements of the vector. For example, the following code shows how all the elements of
vector a are set to ival.
for (i=0; i < n; i++)
a[i] = ival;
Notably, range checking is not done when the [] operator is used. In other words, execution of the following statements is
unpredictable.
std::vector<int> a(4);
Other functionalities
There are numerous operations that can be carried out using vectors and some of the more important ones are listed in the
table below.
Operation Remarks
vector<type> a(b) Copies the contents of an existing vector b and creates a new vector a, e.g., vector<CPoint>
P1(P2);
a.size() Returns the number of elements in vector a
a.empty() Returns true if a is empty, false otherwise
a = b Assigns the contents of b to a
a == b Returns true if a is equal to b, false otherwise. Other logical operators can also be used.
a.at(i) Returns the element at location i. An error message is generated if location i does not exist,
e.g., P = P1.at(n);
a.push_back (b) Increases the number of elements of a by one by appending a copy of b at the end of a.
a.clear() Removes all the elements in a. The vector is now empty.
Lines 10 and 11 are needed so that the features of the vector and string classes can be used in the program. A vector of strings
of zero size is created in line 25. In lines 29-31, three strings are created and stored in the vector. Element with index 0 is created
in line 29 and so on. In line 37, the size of the vector is increased to 4 so that an additional string can be stored. That additional
string is stored in line 38 using the [] and = operators. Lines 37 and 38 could have been simply replaced by an additional call
to the push_back function. Additional usage of the the [] and = operators is shown in lines 42-44.
The concatenation can also be carried out using the append member function.
strFullName = strFirstName;
strFullName.append (strLastName);
The " " pair is associated with the string class initialization even if the string contains a single character. In other words the
following declaration is invalid
string strGradeA = ‘A’; // invalid
is valid.
Accessing the individual components
The string class provides access to individual characters in a string through the use of [] operator similar to accessing an element
of a vector.
strName = "Tony";
strName[3] = "i"; // name is now Toni
One can also find the length of a string using the length member function.
string strHeader = " Hello ";
std::cout << "Length of " << strHeader << " is "
<< strHeader.length() << "\n";
Length of Hello is 7
The member function size has the same functionality as the length member function.
Using as a function parameter
A string variable can be used as a function parameter just as any other variable. For example if the function prototype is
void Display (const string& strHeader, const float fV[], int nSize);
If a string is to be modified by a function then it is preferable that it be passed as reference. An example usage in a program
segment would be
string strFileName;
AddExtension (strFileName);
Comparing strings
Strings can be compared to each other using the logical operators (==, !=, >, <, >=, <=) as well as the compare member
function. String comparisons take place in a lexicographical sense – as words are arranged in a dictionary. For example, the word
list precedes listing, cooling precedes help etc.
Here are a couple of examples of string comparisons.
if (strName1 < strName2)
cout << strName1 << " occurs before " << strName2 << "\n";
else
cout << strName2 << " occurs before " << strName1 << "\n";
Or
int nResult = strName1.compare(strName2);
if (nResult == 0)
cout << strName1 << " is the same as " << strName2 << "\n";
else if (nResult < 0)
cout << strName1 << " occurs before " << strName2 << "\n";
else if (nResult > 0)
cout << strName2 << " occurs before " << strName1 << "\n";
Working with Substrings
The member function substr can be used to work with substrings. Let’s look at the following example.
string strFirstPart, strLastPart;
string strTitle = "Vector A";
strFirstPart = strTitle.substr (0, 6); // extracts Vector
strLastPart = strTitle.substr (7, 1); // extracts A
The first parameter in the function call is the position from which to extract the substring and the second parameter is the
number of characters to extract.
Finding substrings
Several member functions are provided that help find characters or strings within strings. Here are some of those member
functions.
find: This function can be used to find a string within another string. The return value is the starting location where the string
is found first (lowest position). Otherwise the returned value is a special value that is stored in string::npos. Here is an example.
string strHeadline = "Aliens land on Mars";
int nPos = strHeadline.find ("land");
if (nPos == string::npos)
cout << "land is not contained in " << strHeadline << "\n";
else
cout << "land occurs at location " << nPos
<< " in string " << strHeadline << "\n";
With the above example, the function looks for the string “land”. The returned value is 7.
find_first_of: This function can be used to find the first occurrence of any character in a given string within another string at
or after a specified location. The default value of this location is 0. The return value is the starting location where the character
is found. Here is an example.
string strHeadline = "Lose weight or loose change";
int nPos = strHeadline.find_first_of ("os", 5);
if (nPos == string::npos)
cout << "could not find ‘os’ after location 5.";
else
cout << "‘os’ occurs at location " << nPos << " beyond loc 5.\n";
With the above example, the function looks for the characters o and s beyond location 5. The returned value is 12 corresponding
to the character o.
find_last_of: This function can be used to find the last occurrence of any character in a given string within another string at
or before a specified location. The default value of this location is the end of the string. The return value is the location where
the character is found. Here is an example.
string strHeadline = "Lose weight or loose change";
int nPos = strHeadline.find_last_of ("se");
if (nPos == string::npos)
cout << "could not find se.";
else
cout << "last se occurs at location " << nPos << ".\n";
With the above example, the function looks for the characters s and e from the end of the string. The returned value is 26
corresponding to the character e.
rfind: This function can be used to find a string within another string backwards. The return value is the location where the
string is found last (highest position). Here is an example.
string strHeadline = "Aliens land on Mars";
int nPos = strHeadline.rfind ("land");
if (nPos == string::npos)
cout << "land is not contained in " << strHeadline << "\n";
else
cout << "land occurs at location " << nPos <<
" in backward search of string " << strHeadline << "\n”;
With the above example, the function looks for the string “land” within “Aliens land on Mars” but starts the search from the
end of the string. The returned value is 7.
find_first_not_of: This function can be used to find the first occurrence (lowest position) at or after a specified location that
matches none of the characters in a given string within another string. The default value of this location is 0. The return value
is the starting location where the character is found. Here is an example.
string strHeadline = "xxxx wwwww xx xxxx change";
int nPos = strHeadline.find_first_not_of ("wx ", 5);
if (nPos == string::npos)
std::cout << "could not find anything but ‘wx ’ after location 5.";
else
std::cout << "char not one of ‘wx ’ occurs at location " << nPos
<< " after loc 5.\n";
With the above example, the function looks for the first character that is not one of w, x and a blank space beyond location 5.
The returned value is 19 corresponding to the character c.
find_last_not_of: This function can be used to find the last occurrence (highest position) at or before a specified location that
matches none of the characters in a given string within another string. The default value of this location is the end of the string.
The return value is the starting location where the character is found. Here is an example.
string strHeadline = "xxxx wwwww xx xxxx change";
int nPos = strHeadline.find_last_not_of ("change ");
if (nPos == string::npos)
std::cout << "could not find anything but ‘change ’.";
else
std::cout << "char not one of ‘change ’ occurs at location " << nPos
<< " string searched backwards.\n";
With the above example, the function looks for the first character that is not one of the characters in ‘change ’ starting the search
at the end of the string. The returned value is 17 corresponding to the character x.
Inserting strings
The insert member function can be used to insert a string at a specified location in another string. Here is an example.
string strDigits = "123321";
string strAlphabets = "abcd";
strDigits.insert (3, strAlphabets); // insert before location 3
There are other functions available in the C++ library and these are listed and described in Appendix D.
Next, we present an example program that uses the standard string class.
Example Program 7.5.1 Four-Function Calculator Using the std::string Class
We will rewrite the 4-function calculator developed in Section 4.6. We will improve the user input by obtaining an expression
to evaluate rather than bits and pieces. The program will evaluate an expression in its simplest form as
leftnumber operator rightnumber
For example, to evaluate 12 1.53 , the user is expected to enter the input as follows.
Type an expression or stop to terminate the program: 12*1.53
In line 17, the supported operators (or the four function-related operators) are defined and stored in the strOpers variable. The
string stored in strHelp will be displayed only once with the string stored in strPrompt will be displayed every time the user is
prompted to enter the expression to evaluate – see lines 21, 31-34. The GetInteractive function is called with the appropriate
prompt to obtain the user input in strUserInput. A maximum of 50 character input is assumed!
The program is terminated if the user input is stop. Otherwise, using the find_first_of member function, the location of the
operator within the string is found in line 45. If one of the characters +‐*/ is not found, an error message is issued (line 84).
The substr member function (lines 49 and 57) is used to obtain the string form of the left and right numbers. To convert
from a string to a floating-point value, the GetDoubleValue function is used. This function returns a nonzero value if the input
is not a valid number. If either the left number or the right number is invalid, the bError variable is set to true, the expression is
not evaluated, and an error message is displayed (line 81). Otherwise, the operation is carried out (lines 65-75) and the result is
displayed (line 78).
The above statement defines the (user-defined) type name that can then be used to define variables that have the defined
structure. For example, we can define a point structure as follows.
struct stPoint struct stPoint
{ {
float fXCoor; float fXCoor;
float fYCoor; float fYCoor;
}; } P1;
Using either definition, we can declare variables CarCoordinates and BikeCoordinates as follows.
stPoint CarCoordinates, BikeCoordinates;
We can declare and initialize variables as follows.
stPoint CarCoordinates = {‐55.1f, 0.0f};
stPoint CarCoordinates = BikeCoordinates;
To access the individual components of the structure, we will have to use the . (dot) operator. For example
CarCoordinates.fXCoor = 12.3f;
struct’s provide a mechanism to aggregate data (sometimes referred to as plain old data, POD). By default, unlike classes,
struct provides public access to its data and methods.
Behavior
Point Line
Obtain or define the x-coordinate, or the y-coordinate, Obtain or define the start point, or the end point, or both.
or both.
Compute the maximum and minimum coordinate Compute its length so as enable the drawing process on a
values, and hence their range. two-dimensional plane.
Similarly, identification of the behavior will help define the methods to implement the behavior. Forces the program developer
to think in terms of classes and objects. Abstraction and encapsulation make program organization and development easier.
This leads to cleaner data visualization and organization. There are less data management issues compared to the array-based
solution. Fig. 7.7.1 shows a pictorial view of the problem solution.
attributes methods graphical
canvas (a, a)
(point and line data) WireFrame (1) Read and store
point and line data
(2) Convert model coordinates
(start point, end point) Line a
to canvas coordinates
(3) Move
(4) Draw a
(x, y) Point (0, 0)
The main program is extremely short. In line 17, the CWireFrame object TwoDTruss is declared. Its two public member functions
are called in lines 20 and 26 where the truss model details are read via keyboard input and where the pseudo-drawing takes place
on a graphical canvas, respectively. The call to the function Display is not required but is made to inform the user of the model
details.
The details of the CWireFrame class contained in wireframe.h file are shown next.
The dimensions of the canvas and its margin are declared in line 28. These values are for illustrative purposes only and can be
changed. The point and the line data are stored in std::vector objects (lines 32, 33). In order to manipulate the model, the
member variables to store the model limits in the x and y directions (lines 35, 36), the mid-point of the model (line 37), and the
scaling factor to map the model (line 38) to the canvas are used. Finally, a number of private member functions that help read,
store and manipulate the information are declared in lines 40-45.
Declarations in the two helper classes, CLine and CPoint are shown next. The CPoint class has very minor updates compared
to the version shown in earlier examples. The default constructor, copy constructor, helper, accessor and modifier functions are
declared as public member functions. While the variables that store the (x, y) coordinates are declared private, there are no
private member functions. Similarly, the CLine class has the same type of constructors, helper, accessor and modifier functions,
and uses two private member variables to store the starting point and the ending point numbers that define a line.
point.h
line.h
Finally, we look at the manner in which program execution takes place by examining the CWireFrame member functions. All the
private member variables are initialized in the default constructor. There is no copy constructor since in this program, it is not
anticipated that a copy of the entire model will be made for subsequent use. The main task of obtaining the user data to create
the model takes place in SetModelSize function. Once valid values of the number of points and lines are obtained (lines 29
through 38), the sizes of the point and line vectors are set via calls to the resize member function in the std::vector class. The
initial size of each vector is zero. Once the actual sizes are set, the data to populate the CPoint object (m_PointData) is obtained
in function ReadPointData. This is followed by the call to ReadLineData where the CLine object (m_LineData) is filled with the
start and the end points of each line.
The ReadPointData function obtains the (x, y) coordinate values and stores them as shown in line 65. Since vector indexing
starts at 0, the index is computed as [i‐1] rather than i, i.e. data for point i is stored in location [i‐1] in the vector.
The theory behind the drawing process is discussed next. With reference to Fig. 7.1.1(b), the entire truss irrespective of the units
used to define the point coordinates must not only fit into the drawing canvas without infringing into the margin at the four
edges of the canvas, but be centered in the canvas. Let the coordinates of a specific point i be denoted as x i , y i and its
corresponding coordinates on the graphing canvas be denoted as x gi , y gi . Then to achieve our drawing objectives, we can
map the point coordinates to its corresponding graph coordinates as
x gi x i X mid A 2
y s (7.6.1)
gi y i Ymid A 2
where
X mid 1 X min X max
(7.6.2)
Ymid 2 Ymin Ymax
Aa A a
s min , (7.6.3)
X max X min Ymax Ymin
Note that (i) ( X min , X max ) are the minimum and maximum x-coordinates from all the defined points, and ( Ymin , Ymax ) are the
corresponding y-coordinates, (ii) X mid , Ymid are the (x, y) coordinates of the mid-point of the truss model, (iii) s is the scaling
factor between the drawing canvas and the point coordinate space in which the truss is referenced, and (iv) effective drawing
space in the canvas measures A a A a . Quantities in Eqns. (7.6.2) and (7.6.3) are computed in the ComputeScale
function.
Note that (i) A DRAWSIZE and a MARGIN , and (ii) the apparent complexity of computing the scaling factor is due
to the fact that the truss could simply be one-dimensional, i.e., parallel to the x-axis or the y-axis.
Steps (4)-(7) in the algorithm (see Section 7.1) are implemented in lines the function DrawModel.
And finally, the computation of the (x, y) coordinates on the graphical canvas using Eqn. (7.6.1) takes place in function
GraphCoordinates first and then the two primitive drawing operations take place in function Move and Draw.
A sample execution of the program is shown in Fig. 7.6.1 using the model from Fig. 7.1.1(b). The final drawing operations
involve three sets of (move-draw) pair since there are three lines (or truss members). A simple check should show that the (x,
y) values should lie between a , a and A a , A a , or with the values used in the program 1,1 and 14,14 .
Fig. 7.7.2 A sample truss description and program generated drawing instructions using the program from
Example Program 7.7.2
While this is a much-improved solution, there are some deficiencies that are worth discussing.
(1) This version of the program is for an interactive mode of execution using the keyboard for input and a console window for
display. Creation of data is cumbersome and editing created data is impossible. A more user-friendly program would have a
pure Graphical User Interface (GUI) that would facilitate a graphical view of the truss. Such a program would provide the ability
to interactively add, delete, update, cut, and paste points and lines. It would also support input via other devices such as mouse
and pen, and allow for more advanced graphical operations to take place such as rotation, zooming in and out, etc.
(2) The CLine class does not have direct access to its point’s attributes. So, what would happen if a point were deleted? An
invalid line would exist. In addition, the CLine class should be the logical place to set up the draw function. The CWireFrame
class is used to get both the line and associated point information.
(3) How much additional effort would be needed to support lines in three-dimensional space?
(4) There is minimal error checking in the program. What checks would you carry out to maintain a consistent model?
(5) Would there a performance penalty if hundreds, or thousands, or million lines were created and manipulated?
Even if we somehow ensure that only valid objects are created, it is certainly possible that an invalid object can be potentially
created with the overloaded constructor and the modifier function. Hence, two questions need to be asked and answered. First,
where do we detect that an invalid object is being created? Second, what do we do to ensure that the program continues to run
correctly?
First, we will take a look at the initial part of the CDateandTime header file.
3 https://en.wikipedia.org/wiki/Julian_calendar
Enumerated classes are used to help track months (line 11), days of the week (line 12), the two major errors (line 13), and the
manner in which the date and time can be displayed (line 13). In lines 16-21, integer constants are declared for the attributes
associated with date and time. The keyword static is used as a prefix. A static variable exists for the lifetime of the program
unit in which it is defined. In addition, when the keyword is applied to a member of a class, only one instance of the variable
exists for that class. Hence, static const int MONTHS_PER_YEAR = 12; declares a single variable MONTHS_PER_YEAR that is shared
by all the CDateandTime objects. Its initial value is 12 and that value cannot be changed (const qualifier). This makes it possible
to define a vector of strings to store the names of the month as
Without the static qualifier, these vectors cannot be defined in the header file. The C++ compiler is able to recognize that the
length of the vector is 12 and is able to allocate the memory to store these character strings.
We will now look at how we can handle invalid objects and not break the program. In the client code (main program), the try-
catch block is implemented so that if an error is caught, the program displays an error message and execution continues. Snippet
of the code from the main program illustrates how this can done.
try
{
if (i == 1)
// bad date, will throw an exception
CDateandTime DAT0 (0, CDateandTime::Month::JAN, 1, 0, 0, 0);
else if (i == 2)
// other statements follow
…
}
catch (CDateandTime::Error err)
{
if (err == CDateandTime::Error::INVALIDDATE)
std::cout << "main: Caught invalid date.\n";
else if (err == CDateandTime::Error::INVALIDTIME)
std::cout << "main: Caught invalid time.\n";
}
}
The try block has calls to the overloaded constructor with invalid date. The plan is to throw an appropriate error before the
object is created and catch the error in the main program and continue execution. Continuing execution means correcting the
error to create a valid object or doing something else that does not involve the invalid object.
The code that detects the error and throws an exception in CDateandTime (DateandTime.cpp) is shown below.
The call to the overloaded constructor is channeled through the modifier function, SetDateTime. This is motivated by not having
to create duplicate code. The actual checks are carried out for the date in ValidDate function and for the time in ValidTime
function.
Note the syntax and style. A try‐catch block is introduced. In the try block, checks are made to validate the hour, the minute
and the second values. A time specification of 24:00:00 is not allowed, but 00:00:00 is taken as midnight. The catch block
displays the error message. For a large program, this display is probably not recommended since immediately another throw is
executed in line 140 that is intended to be caught in the client code (main program). A throw without an argument rethrows the
exact same exception that was just caught. This style is extremely efficient in error handling and we will look at the performance
implications of exception handling later in the book.
We will go back to the use of static keyword in the context of a variable of a class. A member variable is declared as
Summary
In this chapter we were introduced to the concept and usage of simple yet powerful idea – how to visualize the logic and data
encountered in a typical program through abstraction. This process helps in the defining classes where attributes are distinct
from behavior. By using classes, we eliminate a host of problems. First, we tie data to data access and manipulation very tightly.
We can make program development more systematic by reducing the proliferation of global functions and variables. Second,
we can have more control over the access to the data using public and private functionalities – information hiding is possible.
Third, we encourage the reuse of software. This is possible only if classes are designed with appropriate specifications. As we
will see in Chapter 13, programmers not involved in the initial design of a class can still add functionalities to an existing class
through the process of inheritance.
Programming Style Tip 7.1: There is no substitute for proper class design
It is essential that the class design take place properly. The interface and the implementation should be separated - this is the
idea behind encapsulation. From the client code development viewpoint, the programmer needs to know only the interface for
a successful implementation.
Programming Style Tip 7.2: Practice defensive programming
Be careful to define public and private variables and functions. Functions that can potentially expose the inner workings of a
call should be hidden from the client code by declaring them as private.
Programming Style Tip 7.3: Define a default constructor and a destructor
Even though C++ does not require a ctor, this is the place where one would initialize all the variables defined in a class. For
similar reasons, it is better to explicitly define a destructor where cleanup operations can take place.
Programming Style Tip 7.4: Define the copy constructor
Even though C++ does not require a copy constructor, you can speed up program execution and avoid run time errors by
providing a copy constructor.
Exercises
Most of the problems below involve the development of one or more classes. In each case (a) develop a plan to test
the classes(s), and (b) implement the plan in a main program.
Appetizers
Problem 7.1
Develop a class CRectangle to handle rectangles in a two-dimensional (X, Y) space. In addition to storing the data that describe
the rectangle, this class should be able to (a) compute the area, (b) perimeter, and (c) recognize if the rectangle is a square.
Problem 7.2
Enhance the capabilities of the CPoint class by adding a member variable m_fZCoor (to store the z coordinate) and member
functions to carry out the following tasks. (a) Create the copy constructor as CPoint::CPoint(const CPoint&). (b) A predicate
function bool IsOrigin() to see if the point is at the origin of the coordinate system. (c) The distance to another point as float
DistanceTo (const CPoint&). (d) Unit vector to another point as void UnitVector (const CPoint&, float fVUVector[]).
Problem 7.3
The capabilities of the CTime class discussed in this chapter can be enhanced. Create additional member variables and functions
that will (a) recognize the time zone, and (b) print time in 24-hour format, or in the am or pm format, or with respect to UTC
(coordinated universal time; formerly known as Greenwich Mean Time).
Main Course
Problem 7.4
Develop a CFraction class to store fractions and support the following operations – addition, subtraction, multiplication and
division via Add, Subtract, Multiply, and Divide member functions. The prototype of a typical public member function is as
follows.
void Add (const CFraction&, const CFraction&);
Also, develop a void Display (const std::string& strMessage); public member function that will display the fraction in its
reduced form preceded by the std::string argument.
Problem 7.5
Develop a CTriangle class as a composite class using the CPoint class object. The triangle is described in terms of the x , y
coordinates of the three vertices. The class should store information on the triangle such as perimeter, area and the three angles
and have public accessor functions for these attributes. It should also have public predicate functions to test and see if the
triangle is an isosceles (bool IsIsosceles()) triangle, right triangle (bool IsRightTriangle()) triangle or an equilateral (bool
IsEquilateral()) triangle. Also, develop a void Display (const std::string& strMessage); public member function that
will display the all the stored properties of the triangle.
Problem 7.6
Develop a CQuadraticPoly class to find the roots of a quadratic polynomial ax 2 bx c . Construct the default and an
overloaded constructor that accepts a , b , c . Store a , b , c as private member variables. Construct a public function bool
ComputeRoot (float&, float&) where the return value is true if the roots are real and false if the roots are imaginary. The two
parameters are the roots of the polynomial.
C++ Concepts
Problem 7.7
Develop a CNumDiff class to numerically differentiate a function using a difference formula. You should use the technique
discussed in Section 6.4 to support user-defined functions.
Problem 7.8
Develop a CNumIntegrate class to numerically differentiate a known function using one of the Newton Cotes techniques. You
should use the technique discussed in Section 6.4 to support user-defined functions.
Problem 7.9
Develop two blueprints for a solid geometry program – one with and one without classes. The program should have the features
to compute quantities such as length, interior angles, perimeter, area, surface area, and volume for the following objects –
triangle, quadrilateral, tetrahedron, hexahedron, prism, pyramid, cylinder, and cube. Discuss the pro and cons of the two
programs.
References
Cockburn, Writing Effective Use Cases (The Crystal Collection for Software Professionals), Addison Wesley, 2000.
Bergin, Data Abstraction – The Object-Oriented Approach Using C++, McGraw-Hill, 1994.
Page-Jones, Fundamentals of Object-Oriented Design in UML, Addison Wesley, 2000.
Pressman, Software Engineering: A Practitioner’s Approach, McGraw-Hill, 2001.
Schach, Classical and Object-Oriented Software Engineering with UML and C++, McGraw-Hill, 1999.
Lee and Tepfenhart, UML and C++: A Practical Guide to Object-Oriented Development, Prentice-Hall, 2001
Chapter
Pointers
“Agreatmemorydoesnotmakeamind,anymorethana dictionary isapieceofliterature.”John HenryNewman
“Memoryisthesecondthingtogo.”
General-purpose programs handle objects whose size is known only at run time and whose size may change dramatically during
the course of execution. For example, a program that draws an X-Y graph is much less useful if it sets a predefined limit on the
number of points it can handle or, if it does not allow addition or deletion of points. In this chapter and the next, we will see
the basics of how to write programs where memory allocation to store scalars and arrays are handled dynamically at run time.
Nothing is free! The process of managing resources can be troublesome and can lead to unintended consequences. While the
resources on any computer system are finite, programmers are being asked to create programs that can handle bigger problems,
run faster, and yield more accurate results.
Objectives
To understand the concept associated with pointers.
To understand more about dynamic memory allocation.
To understand and practice writing C++ programs where memory allocation and deallocation are managed.
CPU Cache
Memory Bus
RAM
Disk
(a)
Fig. 8.1.1 (a) Simplified Memory Hierarchy (b) Intel Core i7 Processor Layout
The contents of the RAM can be thought of as being divided into two distinct parts – as used by the operating system and those
used by one or more application programs (Fig. 8.1.2).
RAM
Operating Application
System Program
Application
RAM Program
Operating 1 2 3 4 5
System
X MB
1 Hard Disk
2
Virtual
3
Memory Other Files
4 (Paging File)
5 Y GB
Z GB
6
7
Application Virtual Memory
Page Table
Fig. 8.1.3 A simplified virtual memory scenario
Let us look at the next case where the amount of RAM is less than the amount of memory required by the operating system
and a single active application program. This situation is depicted in Fig. 8.1.3. An operating system that handles such a scenario
is called a virtual memory operating system. Examples include Microsoft Windows, Linux, the different flavors of Unix, etc.
Conceptually, the memory is divided into pages. The size of a page (in bytes or KB or MB) is a function of the OS. In the figure,
let us assume that after the entire operating system is loaded into RAM, five pages can be loaded into RAM. We will label these
pages as 1, 2, …, 5. Let us now assume that we wish to execute an application that requires a memory equivalent of 7 pages.
Pages 1 and 2 are pages that contain program instructions, and the rest of the pages contain program data. In a virtual memory
OS, a special part of hard disk is set aside for virtual memory related operations. Typically, the paging file size is much larger
than the size of RAM. The purpose of the paging file is to maintain a copy of the program’s information. This information is
then made available to the CPU on demand from the disk to the CPU. This operation may take a few milliseconds to complete
– three orders of magnitude slower than the amount of data transfer time from the RAM to the CPU. In the example, pages 1,
2, 3, 5 and 7 are called swapped-in pages since they are currently in RAM, and pages 4 and 6 are called swapped-out pages since they
are currently not in RAM.
To maintain coherence between the different copies of program information, several strategies are implemented in a typical OS.
One of the techniques is to use a virtual memory page table. This page table contains the mapping between the pages in the
RAM and the location on the disk (part of the virtual memory paging file system) associated with the application. When an
application issues a request for information, the system computes the memory address as a page number and checks to see if
the page is in RAM. If it is (page hit), it fetches the information for processing. If the information is not in RAM (page miss),
then the information must exist on the hard disk and appropriate instructions are issued. One can see from this simple example
that everything else being the same, a program will run faster if there are more page hits than misses. A program with more page
hits than another program, is said to have more locality of reference.
Memory management becomes more complex as we begin to change the assumptions made with the previous examples. How
is memory to be allocated and deallocated if these operations take place during the execution of the program? Most OS use heap
or free store to carry out dynamic memory management. This is the memory area that gets affected due to the use of new and
delete operations. We have memory leak if new is used without a corresponding delete. Similarly, we have access violation, if the
program tries to access memory that is not allocated or refers to a non-existent memory location or address.
There are several types of objects and non-objects that a typical C++ program controls in different memory areas – const data,
heap, free store, stack, and global and static memory areas.
const data: This is a special memory area that is protected and cannot be modified (read only). Only primitive data types whose
values are known at compile time can be stored here. Objects cannot be stored. The data here is available for the entire duration
of the program.
Stack: The stack is used to store automatic variables. The objects are created as soon as memory is allocated and destroyed
immediately before memory is deallocated.
Free store: This memory area is used for dynamic memory allocation and is affected by new and delete operators. Memory for
objects is allocated but this memory may not be immediately initialized. This memory may be accessed and manipulated outside
of the object’s lifetime but while the memory is still allocated.
Heap: Heap is the second memory area used for dynamic memory allocation and is affected by malloc and free operators
more commonly associated with C.
Global/static: Storage allocation for global or static variables takes place at program startup and the initialization takes place
subsequently. For instance, a static variable in a function is initialized only the first-time program execution passes through its
definition.
Finally, it is helpful to understand what happens when objects are repeatedly created and destroyed on the free store or heap.
Memory when used in this fashion, becomes fragmented. Unlike some other languages, C++ does not carry out garbage collection.
Garbage collection is the process of recycling the memory space when that memory space is no longer needed. Programmers
understanding and developing programs for numerical analysis must recognize that memory management is an important issue
as are speed of execution and accuracy of results.
In the rest of the chapter, we will start looking at how to use free store during dynamic memory allocation.
8.2 Pointers
As people can locate places and things based on their addresses, pointers in C++ can be used to manipulate information based
on memory addresses. Pointer data types represent a reference to an object, or a location and pointers may be specialized by
the type of the object referred to. In this chapter, we will see pointers represented by a memory address; however, they can be
more complicated as we will see in later chapters.
Here is a declaration of a pointer variable.
int *pnX;
The variable pnX can hold the memory address of a variable of type int. Note that the declaration requires an asterisk * in front
of the variable name. The above style is preferable to the following:
int* pnX;
The problem occurs when multiple variables are declared with the same statement. For example, what is implied by the following
statement?
int* pnX, pnY;
As it turns out, pnX is a pointer variable but pnY is a regular int. If the intent is to declare both of them as pointer variables, one
must use the following declaration.
int *pnX, *pnY;
So how does one use a pointer variable? Consider the task of pointer variable pnX required to point to an int variable nX. One
could generate the following statements.
int *pnX, nX; // pnX is a pointer (variable) to an integer, nX is an integer variable
nX = 10; // nX now has a value of 10
pnX = &nX; // pnX now contains the memory address of nX
In the first statement, variable pnX is declared as a pointer variable and nX is declared as an int. In the second statement, the
value of nX is set as 10. In the third statement, the value (memory address) of pnX is set by using the address symbol & (recall this
is the same symbol that is used when a function argument is declared as passing-by-reference) along with the variable that the
pointer variable is pointing to. Let us now look at an example to learn more about pointers.
Example Program 8.2.1 Example Showing Pointer Usage
In this example we will see the use of int variables and int pointers. The basic ideas can be extended to other data types. Simple
pointer arithmetic is also illustrated.
main.cpp
The program uses three variables that are declared in lines 22, 23 and 24. Two of these variables are pointer variables. The value
of the int variable, nV1, is assigned in line 26. The memory address of this variable is obtained in line 27 and assigned to the
pointer variable, pnV1. Then the function ShowValues is called to display three lines of output associated with the int variable
and the pointer variable associated with that int variable.
In line 12 we see how pointers can be used as function arguments. As a matter of style, we will use what is shown in line 3 rather
than
void ShowValues (int nV, int* pnV)
just to drive home the point that the second parameter is a pointer to an int. In line 17, we display the int value at the memory
address pointed to by pnV. When the usage (with a pointer variable) involves the asterisk symbol, *, the symbol is referred to as
the dereferencing operator. The pointer variable must be dereferenced when the value stored at the memory location needs
to be accessed. In other words, pnV refers to the memory address and *pnV refers to the value in memory pointed to by pnV.
Look at line 31. The value at the memory location pointed to by pnV1 is set to 100. In the subsequent display of the values (see
Fig. 8.2.1) we see that the value of nV1 is also changed because of this statement. The memory address is expressed in hex
(hexadecimal) and is likely to be different on different machines. In line 35, we see an assignment statement involving pointer
variables. Note that
pnV2 = pnV1; // address assignment
is not the same as
*pnV2 = *pnV1; // value assignment
In line 36, the value at the memory location pointed to by pnV2 is set to 400. In the subsequent display of the values (see Fig.
8.2.1) we see that the value nV1 and consequently, *pnV1, is also changed because of this statement.
new operator
In its simplest form, the new operator has the following syntax.
new type‐name;
where type‐name is a valid object. The new operator can be used to allocate objects and arrays of objects. This allocation takes
place from a program memory area called the “heap” or “freestore.” We will see more about memory-related issues in the next
section. When new is used to allocate a single object, it yields a pointer to that object; the resultant type is type‐name*. When new
is used to allocate a singly dimensioned array of objects, it yields a pointer to the first element of the array, and the resultant type
is type‐name*. Here is an example of declarations and usage.
float *pfX, fY; // pfX is a pointer variable, fY is a float variable
pfX = new float; // allocates memory equal to storage of one float variable
*pfX = 43.5f; // use as any other float variable with dereferencing operator
fY = *pfX; // now fY has the same value as pointed to by pfX
delete operator
It is a good practice to release the memory occupied by an object when that object is no longer needed. This is accomplished
in C++ using the delete operator. In its simplest form, the delete operator has the following two forms.
delete pointer‐object;
delete [] pointer‐object;
where the first form is used for one object and the second form is for deallocating a number of objects. Here is an example
illustrating both the new and the delete operators.
float *pfX, *pfVY;
pfX = new float; // allocates memory equal to one float
pfVY = new float[40]; // allocates memory equal to 40 floats
….
delete pfX; // releases memory back to freestore
delete [] pfVY; // releases memory back to freestore
The compiler generates the appropriate statements to obtain the memory required to store the instructions associated with the
CPoint class and the memory locations to store the data associated with the object Point12. In this example, we will see another
approach in obtaining the resources (memory) dynamically.
main.cpp
In line 15, pPoint12 is declared as a pointer to a CPoint object. However, the memory to hold a CPoint object is allocated in line
19. The coordinate values are set in line 22. Note the usage of the member function SetValues. When an object is used to
invoke the member function, the member selection operator . is used. When a pointer to an object is used, the member selection
via pointer operator ‐> (or arrow pointer) is used. C++ provides another way to write the statement in line 13 as
(*pPoint).SetValues (1.2f, ‐17.65f);
Finally, note line 28. The memory allocated in line 19 using new must be deallocated using the delete operator.
We will see more useful usages of dynamic memory allocations in later examples.
We define two int variables nA and nB in lines 21 and 22. The function CallViaPointers is invoked in line 27. As we can see
from the function definition in lines 12 through 16, the first argument is a constant int pointer and the second argument is a
pointer to an int. In the function, the value of the second argument is set as twice the value of the first parameter. Note the
expression in line 6 and the manner in which the dereferencing operator is used. While the following statements will work
*nB = 2**nA;
*nB = 2* *nA;
In line 34, a double precision vector dVX of length 4 is declared and initialized. In line 35, a double precision pointer is declared,
and the address is set as the starting location of the dVX vector - when the vector index is not defined, the first location or [0],
is implied. Values in the vector can be accessed using pointers as in line 42. With this example, the four memory locations in
the dVX vector can be accessed as *(pdVX),*(pdVX+1),*(pdVX+2) and *(pdVX+3). In other words, by adding an integer to the
memory address stored in a pointer, we are able to access other memory locations as long as the data type does not change, and
the resulting memory address is a legal address.
nullptr pointer
C++ has pointer literal of type std::nullptr_t called nullptr. When a pointer is assigned as a nullptr it implies that the pointer
does not point to an object. For example, if memory allocation takes place via new and if the memory allocation fails, the pointer
variable holding the memory address is assigned a nullptr value. The programmer must check to see if the memory allocation
was unsuccessful and take appropriate action.
Example Program 8.3.3 Vector Data Type
We can now finally show an example where the memory for a vector data type can be allocated and deallocated dynamically.
The example is a modification of the second part of the previous example.
main.cpp
we have
float *fVX;
int n = 4; // n need not be declared as const int
fVX = new float[n];
Note that the vector is dynamically allocated. We could have obtained, say from the user of the program, the size of the vector
to be used in the program as follows.
int nSize;
GetInteractive (“What is the size of the vector? “, nSize);
float *fVX;
fVX = new float[nSize];
Once past the initial declaration and allocation, we have exactly the same usage for fVX in the program. Access to the elements
of the vector is via the [] operator. In other words
*(fVX+i)
is equivalent to
fVX[i]
At the end of the program when fVX is no longer needed, memory is deallocated in line 43.
Tip: Vectors and pointers are intimately related. For example, if we have
float fVX[4];
float* pfVX = fVX;
then
pfVX = &(fVX[0]) is the same as pfVX = fVX
fVX[i] is the same as *(pfVX + i)
&(fVX[i]) is the same as (pfVX+i)
pfVX[i] is the same as fVX[i]
We will now take these ideas and develop a class to handle vector data type.
(d) Finally, we will illustrate how one can start building vector operations that are a part of the class’s publicly available
member functions.
The default constructor is shown in line 11. The overloaded constructor in line 12 has a single argument – the number of rows
or elements in the vector. The copy constructor in line 13 also has a single argument. For the first time as we will see below, the
destructor shown in line 14, will not be empty. There are four helper functions. The GetSize function returns the number of
rows in the vector. The At function used to access the elements of the vector is overloaded. In line 20, we declare the version
used to obtain a floating-point value of the ith element. In line 21, this version of the At function returns a float reference to
the ith element of the vector. The Display function is designed to display the elements of the vector. Finally, we show the genesis
of a numerical-oriented vector operations library in the form of the DotProduct function. We will leave the development of
other useful vector-related functions as an exercise.
The attributes of the class, the pointer to contain the memory address and the number of rows in the vector, should be declared
as private. However, we will declare the pointer, m_pData as a public variable to illustrate how the vector elements can (not
should) be accessed.
Here is the implementation of the CMyVector class.
CMyVector.cpp
The default constructor is used for initializing the member variables – number of rows to zero and the pointer variable to
nullptr; no memory is allocated. However, the overloaded constructor uses the value of the argument (nRows) to dynamically
allocate the memory. In both the overloaded constructor and the copy constructor, the try-catch block is used to detect if
memory allocation did not take place. The destructor is defined in lines 52 through 59. Memory deallocation takes place here
automatically in the sense that the destructor is called automatically when the CMyVector object goes out of scope. We could
have also written the check (line 54) as
if (m_pData != nullptr)
The value-based vector access function is defined in lines 62 through 76. A check is made in line 67 to see if the vector index
has a valid value. The reference-based vector access function is defined in lines 79 through 93. In this function, the memory
reference is returned. What is the difference between lines 70 and 87 that appear to be identical? The use of the value-based
function can take place only when the vector element is not being modified. In other words, if we had only the value-based
function, the following statement
fVA.At(i) = 12.06f;
would not compile since a value cannot appear as a l-value (left of the assignment operator). On the other hand, pointers and
references can appear as both l-value and r-value. When a reference to a value is returned from a function, the address (or
reference) operator is not required. What is one of the advantages of the reference return type? With the reference version of
the At function, we can have cascading statements of the form
fVX.At(i) = fVX.At(i+1) = 3.1415926f;
Finally, a note about the development of vector-related functions. The DotProduct function is defined in lines 110 through 130.
Checks are made in the function to ensure that the operation is valid (see line 116).
The vector elements are accessed in line 121 and the At function is used as a safety precaution (exception handling). We could
have rewritten the statement as
fDP += m_pData[i] * fV.m_pData[i];
Finally, we will see how to use the CMyVector class in a program. In this program we will dynamically allocate and populate two
vectors and then compute their dot product.
main.cpp
The two vectors are declared in line 22 both of the same size. We show two different ways of populating the vectors in lines 28
and 31. The disadvantage with the usage on line 28 is that the vector index value is not checked for correctness (it can be if the
[] operator is overloaded that we will see in Chapter 9) whereas the check is carried out in the At function. The dot product
function is called in line 35. One could also have written the statement as
float fDotP = fVY.DotProduct (fVX);
since X Y = Y X . The dot product result is displayed in lines 39 through 41 using the Display member function to display
the contents of the two vectors. The copy constructor usage is briefly illustrated in line 43.
There are two catch blocks – one for memory allocation error (new fails) and one for the errors emanating from the CMyVector
class.
A retrospective look at what we have done in this example is necessary to lay the foundation of what we can do to improve the
program, some of which we will see in the next chapter.
(1) There is a deficiency in the class definition. What if we declared a vector as
CMyVector fVA;
The default constructor would be called setting the number of rows in the vector to zero. How do we then set the
size of the vector later in the program? The solution to this problem is to define a member function, say void SetRows
(int nRows) that would behave like the overloaded constructor. However, it would first check to see if memory for
the vector has been allocated before. If memory has been allocated, it would then deallocate the memory and allocate
new memory as specified by the function parameter.
(2) As we mentioned in Chapter 4, functions can be made more useful if they can be converted to a function template.
With this example, we can declare and manipulate only vectors containing float values. We will learn how to declare
and define template classes in the next chapter.
(3) Ideally, the pointer variable m_pData should be a private member variable since it would be disastrous to change the
address either inadvertently or willfully. Furthermore, it is awkward to use the At function to access the elements of
the vector. Ideally one would like to access elements of the vector using () or [] such as
fVX(i) = static_cast<float>(log(i+1));
Both these issues can be addressed by using operator overloading as we will see in the next chapter.
Summary
In this chapter we saw more about memory management and especially dynamic memory management. We learnt about
pointers, pointer arithmetic, the new and delete operators, and the development of classes where memory allocation and
deallocation can take place in a consistent and safe manner. There are much more advanced techniques in C++ to safely use
(and misuse) pointers such as smart pointers described in C++ standard library (e.g., unique_ptr, shared_ptr, weak_ptr etc.)
but that discussion is outside the scope of this book.
Exercises
Most of the problems below involve the development of one or more classes. In each case (a) develop a plan to test
the classes(s), and (b) implement the plan in a main program.
Appetizers
Problem 8.1
What is the output that is generated by the following statements?
int *pnA;
int nA = 5;
pnA = &nA;
std::cout << "nA is " << nA << " and *pnA is " << *pnA << '\n';
*pnA = 10;
std::cout << "nA is " << nA << " and *pnA is " << *pnA << '\n';
Problem 8.2
What is the output that is generated by the following statements? Is there an error in this program segment?
int *pnA, *pnB;
pnA = new int;
pnB = new int;
*pnA = 100;
*pnB = 110;
std::cout << *pnA << “ “ << *pnB << ‘\n’;
pnA = pnB;
std::cout << *pnA << “ “ << *pnB << ‘\n’;
*pnA = 300;
std::cout << *pnA << “ “ << *pnB << ‘\n’;
delete pnA;
delete pnB;
Problem 8.3
Will the following statements compile? If not, correct the statements. What is the output that is generated?
int *pnVA;
int nVA[4] = {0, 1, 2, 3};
pnVA = nVA;
std::cout << *pnVA << ‘\n’;
std::cout << pnVA[2] << ‘\n’;
++pnVA;
std::cout << *pnVA << ‘\n’;
std::cout << pnVA[2] << ‘\n’;
Main Course
Problem 8.4
Extend the capabilities of the CMyVector class by adding a member function
void SetRows (int nRows);
that would dynamically either set the size or reset the size of a vector.
Problem 8.5
Extend the capabilities of the CMyVector class by adding other vector operations as public member functions.
Function Prototype Remarks
float MaxNorm ();
Computes the maximum absolute value. x max x i
float TwoNorm(); n
Computes the length of the vector as x 2 x i 1
2
i
void UnitVector (const CMyVector& fVB, Computes the unit vector from current point to point fVB and
CMyVector& fVUnitVector);
stores the result in fVUnitVector.
void CrossProduct (const CMyVector& fVB, Computes the cross product between the current vector and fVB
CMyVector& fVR); and stores the result in fVR.
C++ Concepts
Problem 8.6
It is possible to have a pointer to a pointer. For example, the following
int **pMA;
defines a pointer pMA that points to a pointer. Consider the following code segment.
int *pVA[3]; // a vector of int pointers
int **pMA; // pointer to a int pointer
int nV1[2] = {11, 12};
int nV2[2] = {21, 22};
int nV3[2] = {31, 32};
// helper functions
int GetRows();
int GetColumns();
float At (int, int) const; // for use as rvalue
float& At (int, int); // for use as lvalue
void Display (const std::string&) const;
private:
References
http://www.agner.org/optimize/
https://www.youtube.com/watch?v=L7zSU9HI-6I (a very long video)
https://lwn.net/Articles/250967/
https://www.youtube.com/watch?v=YQs6IC-vgmo
Chapter
“Knowledgeisnot achieveduntilshared.”Anon
Ideas dealing with classes and associated objects were introduced in Chapter 7. In Chapter 8, we looked at pointers and resource
(memory) allocation issues. We will build on those ideas in this chapter by learning about operator overloading, static member
functions and variables, template classes, and arrays where memory is dynamically allocated. We will use exception handling to
ensure that potential problems are detected and handled appropriately.
Objectives
To build on the earlier concepts associated with classes and objects.
To understand the concepts associated with operator overloading, template classes, and dynamically managed arrays.
To learn more about software engineering and the development of a matrix toolbox necessary for numerical analysis.
To understand how to routinely use C++ exception handling mechanism.
Unless the operators + and << are overloaded, the above statements will not compile.
Before we learn the details of operator overloading, we need to understand (a) a special pointer called the this pointer, (b) what
is meant by friend classes and functions, (c) how the const qualifier should be used, and (d) the reference operator &.
this pointer
There is a special pointer called this that provides every object access to itself. This pointer can be used to refer to both the
member variables and member functions. Here is an example that uses this pointer in the (rewritten version of the) Display
member function in the CPoint class.
// helper function
void CPoint::Display (const std::string& strBanner)
{
// display the current coordinates
std::cout << strBanner
<< "[X,Y] Coordinates = ["
<< this‐>m_fXCoor << ","
<< (*this).m_fYCoor << "].\n";
}
In this function, the use of this pointer is an unnecessary concoction since m_fXCoor is equivalent to this‐>m_fXCoor, etc.
Also note that, this‐> is equivalent to (*this). as we obtain the value of the two coordinates two different ways. We will see
a much better example of the use of this pointer with operator overloading.
public:
// constructor
CPoint (); // default
….
private:
float m_fXCoor;
float m_fYCoor;
};
The CTriangle class is declared to be a friend of the CPoint class in the CPoint class header file. The friend keyword is associated
with neither public nor private qualifiers. Hence it should be declared as shown in the above example. With this declaration, we
could have the following statements in a CTriangle class to access the two private member variables.
#include "point.h"
class CTriangle
{
public:
// constructor
CTriangle (); // default
….
private:
CPoint m_Vertex[3];
};
…..
void CTriangle::ComputeSide ()
{
float fSide1 = Distance (m_Vertex[0].m_fXCoor, m_Vertex[0].m_fYCoor,
m_Vertex[1].m_fXCoor, m_Vertex[1].m_fYCoor);
….
}
A few things to note about the “friend” concept. If CPoint declares CTriangle as a friend, then CPoint does not automatically
gain the friend status of CTriangle. This declaration must be explicitly made in the CTriangle class. In other words, a class must
be explicitly identified as a friend class; the reciprocity idea does not apply here. Also, if CTriangle is a friend of CPoint class
and CPrism is a friend of CTriangle class, then one cannot infer that CPrism is a friend of CPoint class. Proper use of the friend
class concept can enhance the readability and performance of a program.
Finally, one can also declare and define friend function of a class. This is a function that is defined outside the class but has
access to all the members of a class. We will see examples of this friend function with the overloaded << and >> operators.
Both these declarations will work fine since the arguments are passed by value. However, the second version adds an additional
check so that if an attempt is made in the Add function to change the values of either n1 or n2, the compiler will issue an error
message.
The three versions are quite different. The first case has pass-by-value argument. The copy constructor is called and a copy of
the variable strMessage is created and used in the function. If strMessage is modified in the function, the changes do not
propagate back to the calling function. However, additional instructions are executed due to the creation of a local copy of the
variable. The second version has pass-as-reference argument, and no local copy of the variable is made. If the intent is to just
display the message in the function, it is possible to inadvertently modify the message. However, this cannot take place in the
third version due to the use of const qualifier.
Now let us look at the following case where another function is called from a function.
void DisplayMessage (std::string& strMessage); // no const used here
….
int ComputeAbsSum (const std::string& strMessage,
int nV[], int nValues)
{
int nSum = 0;
if (nValues <= 0)
DisplayMessage ("Invalid number of values");
else
{
for (int i=0; i < nValues; i++)
nSum += abs(nV[i]);
DisplayMessage (strMessage);
}
return nSum;
}
This code will not compile since strMessage is declared as a const variable within the ComputeAbsSum function and cannot be
passed to the DisplayMessage function where it is not a const string. In other words, if an argument is declared as a const in
one function then it must be declared as a const in all functions where it is used. We can also use the const qualifier with return
values from a function. For example, the following function prototype
const int HowMany ();
signifies a function that returns an integer. However, the returned value cannot be modified because of the const qualifier used
before the function return type. Finally, we can use the const qualifier with calling objects. We can have a member function
designed so that it does not modify the value of the calling object.
class CPoint
{
public:
void Display () const;
…..
private:
int m_nItems;
}
In this example, by placing const at the end of the Display function declaration, we tell the compiler not to allow the function
to change the value of the calling object. The following function body is incorrect if m_nItems is a member variable of the CPoint
class.
void CPoint::Display () const
{
++m_nItems; // NOT correct
std::cout << "Number of items is " << m_nItems << "\n";
}
References
We saw the reference operator & used and discussed in Chapters 4 and 7. To make operator overloading work, we need to
understand references to functions. Returning a reference to a function is similar to returning an alias to a variable. If an object
returned by a function is to be an l-value then it must be returned by reference. Let T denote a class type. Consider the following
function prototypes.
(a) T function (); where a value is returned and cannot be used as l-value. The returned value can be changed.
The copy constructor is used in this case.
(b) const T function (); is the same as (a) but the returned value cannot be changed.
(c) T& function (); where a reference is returned can be used as l-value. The returned value can be changed. The
copy constructor is not used in this case.
(d) const T& function (); where a const reference is returned cannot be used as l-value. The returned value
cannot be changed. The copy constructor is not used in this case.
The differences will become clear as we look at a specific example.
Operator overloading
As we mentioned at the beginning of this section, C++ allows operator overloading with user-defined data types makes
programs easier to read and maintain. The overloaded operators can be implemented either as global functions or as member
functions of a class. The syntax is as follows.
operatorx
where operator is a C++ reserved keyword and x is the operator. Here are the definitions for the addition operator and the
stream insertion operator.
operator+
operator<<
First, we define the header file. Look at lines 31 through 35 to see how operators are overloaded as public functions.
Point.h
We will pay particular attention to the new features shown in lines 13-14 (operator overloading via friend functions), and
operator loading shown in lines 35-39, all of which will make writing the client code cleaner and easier to understand.
Now the rest of the server code only dealing with overloaded operator functions only is shown below.
point.cpp
Lines 76-87 show how the reference return type (CPoint&) is used. By having the return type as a reference (&) to a CPoint object,
it is possible to have statements such as the following that involve multiple assignments with CPoint objects as
P1 = P2 = P3;
In other words, the following prototype
void CPoint::operator= (const CPoint& PRight);
would permit only the following assignment
P1 = P2;
but not the two following statements
CPoint P1 = P2;
P1 = P2 = P3;
Line 79 is a defensive programming statement to avoid problems with statements like
P1 = P1;
and shows a nice use for the this pointer. The addition overloaded operator + has a CPoint return type to support statements
such as
P1 = P2 + P3 + P4;
The function body clearly shows that a CPoint object has to be created in the function (line 97) to facilitate this addition. This
is the downside to overloaded operators. In the case of a CPoint object not much additional resource (storage spaces for x and
y coordinates) is going to be used temporarily. However, with larger objects, creating temporary objects will be resource
intensive.
The boolean operators (== and !=) check if two CPoint objects are equal or not as we will see in the example. The overloaded
operators << and >> are implemented as non-class friend functions. Once again, by having the return type as a reference type, it
is possible to output multiple CPoint objects using a single statement or read multiple points using a single statement.
And finally, here is an example program (client code) where the use of the overloaded operators is shown using the CPoint class.
main.cpp
The use of copy constructor is illustrated in line 22. The usage of overloaded operators is illustrated is several statements - line
26 for the addition operator, line 30 for one of the boolean operators, lines 36-37 for the stream extraction operator, line 40 for
the subtraction operator, and finally line 43 for the stream insertion operator.
Second, the static variable is initialized outside of the class. Usually this is done at the beginning of the file containing the class
definitions. Third, in this example, it is used in the class constructor in a manner similar to any class variable.
int CPoint::m_nObjectsDefined = 0;
CPoint::CPoint ()
{
++m_nObjectsDefined;
}
…. // rest of the class definitions follow
In a similar manner, static member functions are member functions that do not access an object’s data. In other words, if a
member function does not access the member non-static variables, then it is possible to declare that member function as static.
Continuing with the previous example, let us assume that we need a public function to obtain the current number of CPoint
objects defined and that function is NumObjects(). The modified header file is as follows.
class CPoint
{
public:
// constructor
CPoint (); // default
static int NumObjects (); // static member function
….
private:
float m_fXCoor;
float m_fYCoor;
static int m_nObjectsDefined;
};
The static keyword is used to declare the member function but is not used in the member function definition itself. In other
words, the static member function is defined as follows.
int CPoint::NumObjects ()
{
return m_nObjectsDefined;
}
The static function can access m_nObjectsDefined since that variable is a static variable.
The syntax is friend followed by the keyword class followed by the class name and a terminating semicolon.
While this class shows all member variables and functions to be static, we can design classes where only one or more functions
and variables are declared to be static in a similar fashion.
math.cpp
Statements 9-12 initialize the values of the static variables outside the body of the member functions.
Note that the static keyword is not used with the function definition. The static keyword is used only in the header file with
the function declarations.
Defensive programming would dictate that we check for math errors; for example, we should check to see if d2 is zero before
dividing (line 35). Finally, we present the client code.
main.cpp
There is no explicit variable (object) associated with the CMath class! The scope resolution operator :: is used to invoke the static
member functions that are treated differently that non-static member functions. A transient object is created automatically and
destroyed.
public:
// constructor
CPoint (); // default
CPoint (const CPoint&); // copy constructor
CPoint (T, T); // overloaded
// helper function
void Display (const std::string&) const;
// modifier function
void SetCoordinates (T, T);
void SetCoordinates (const CPoint&);
// accessor function
void GetCoordinates (T&, T&) const;
void GetCoordinates (CPoint&) const;
// overloaded operators
const CPoint &operator= (const CPoint&);
CPoint operator+ (const CPoint&) const;
CPoint operator‐ (const CPoint&) const;
bool operator!= (const CPoint&) const;
bool operator== (const CPoint&) const;
private:
T m_XCoor; // stores x‐coordinate
T m_YCoor; // stores y‐coordinate
};
There are just two differences when this version is compared to the float-based CPoint version shown in the last section. The
first line reads
template <class T>
signifying that we have one template argument for the class that is identified as T. The symbol T is merely a placeholder. It can
be any other symbol. Second, wherever we had specified float as the data type in the non-template version of CPoint, we now
specify the corresponding data type as T. The template function body must follow a slightly different syntax. For example, the
SetCoordinates function would be defined as follows.
template <class T>
void CPoint<T>::SetCoordinates (T fX, T fY)
{
…
}
In addition to the first line containing the template keyword, the <T> needs to be appended to the class name before the function
name and everywhere else it is used (e.g., as a function parameter).
To use the CPoint template class in a program we would do the following.
#include “point.h”
….
CPoint<int> nPA, nPB; // integer‐valued points
CPoint<float> fPA, fPB; // float‐valued points
point.h
A few new concepts are shown here. The stream insertion (>>) and extraction (<<) operations are overloaded (lines 14-15) so
that point data can be read and displayed. Next, the operation to scale the (x,y) coordinates by a constant c is shown via the *
operator. The result of this operation is a (new) CPoint object that is created (line 21) and returned (line 26). Here is an example
of its usage
CPoint<int> P (2, 3), Pc;
Pc = 10*P;
Only a select few member functions are shown and discussed here.
Note how the overloaded constructor is declared in lines 84 and 85 with the use of the T symbol. This syntax will be used in all
the member functions. Next, the assignment operator = and the addition operator + overloaded functions are shown.
Overloaded operators that return a bool value work similarly as shown with the == operator.
Finally, the multiplication overloaded operator * is shown next. This function will be called for the following statement
Pc = P*10.0;
The main program to illustrate the usage of the templated CPoint class is listed below.
One way of learning how these overloaded operators and copy constructors and destructors work, is to use the debugger and
step through one statement at a time. For example, how are the statements in line 29 executed? The overloaded operator=
function is called first with P1 as PRight (line 143 in point.h). The P6 values are set to those of P1. Again, the overloaded
operator= function is called and the P5 values are set. In other words, the evaluation is from right to left.
Similarly, let’s examine how line 29 is executed. Overloaded operator‐ function is called first with P2 as PRight. A local CPoint
object is created that contains P1‐P2. This local CPoint object is then used with the copy constructor to create a copy of this
local object (say PCopy1) for later use, and the local CPoint object is destroyed. This saved copy is then used with the overloaded
operator+ function with P3 as PRight. Once again, a local CPoint object (say PCopy2) is created to store the results from the next
operation, PCopy1+P3. The copy constructor and destructor are called to save PCopy2. Finally, the overloaded operator= is used
with PCopy2 as PRight to transfer the values from PCopy2 to P4. The destructor is called to release the memory associated with
PCopy1 and PCopy2. Most of these operations take place behind the scenes without any user-supplied code.
Some of the remaining overloaded operator functions are shown next. Of these, the two that require additional explanations
are lines 58 and 62. For line 58, P5=2.3*P1;, the overloaded friend function is called.
The rest of the statements in the main program are shown below.
9.4 Arrays
One of the most important objects or data structures that we will deal with is the area of numerical analysis is an array – vectors
and two-dimensional matrices. Almost always, it can be assumed that the algorithm is able to ascertain a priori, the size (number
of rows and/or columns) of the array. With this scenario, arrays provide the most convenient data structure to store and
manipulate engineering data.
We will use the template approach in defining vector and matrix classes.
9.4.1 Vector Container Class1
In Section 8.3, we defined and used the CMyVector class. In this section, we will use a similar but improved version of that class
called CVector.
We will now define the attributes and behavior of arrays starting with the vector class. Recall that a vector either is a row vector
or a column vector. The CVector class is general enough to store any data type. It has the following properties.
(a) Both row and column vectors will be stored as a vector with n storage locations.
(b) The indexing will start at 1. In other words, the indexing will be between 1 and n (both inclusive). The () operator will be
overloaded and will be used to access the elements of the vector, e.g. nV(j) will point to the jth element of the vector.
(c) Exception handling will occur if the vector index does not have a legal value. This check will be carried out only for the
DEBUG version of a program.
1 Strictly speaking, containers have several properties such as iterators, overloaded operators etc. that we do not support with the CVector and
CMatrix classes. However, see Section 10.10.
(d) Member functions will be provided to dynamically allocate as well as change the size of the vector.
(e) The = operator will be overloaded.
Our template-based vector class is defined below and is followed by a sample program that uses the CVector class. Some of the
member functions are discussed below.
Function Name Remarks
SetSize The initial size of the vector is determined by the constructor used. The default constructor sets
the size as zero. This public function can be used to set or reset the size. If the size is reset, the
original contents are destroyed.
GetSize This public function returns the current size of the vector.
Set This public function is used to set the specified value for all the elements of the vector.
ErrorMessage This public function is used to display the error message in the catch block if an error is thrown.
Release This protected2 function is used to release the memory allocated to store the elements of the
vector.
We first present select portions of source code for the CVector template class that is contained entirely in a header file.
vectortemplate.h
There are seven constructors to ease the way CVector objects can be created – size is used to signify the number of nonzero
elements in the vector, iv is the initial value for all the elements in the vector, name is the name of the vector and is used only
for identification purposes. Instructions for debugging this class when used with Microsoft Visual Studio are shown in the top
of the file.
Several helper functions and overloaded operator functions provide added benefits. These are listed in lines 48-60.
2 We will see protected functions in greater detail in Chapter 13. However, keep in mind that access to member functions and variables is
restricted as follows.
public protected private
Members of the same class or friend classes Yes Yes Yes
Members of the derived class Yes Yes No
Others Yes No No
The listing of the member functions is not shown, and the reader is strongly urged to go through and understand the
implementation. Adding additional functionalities such as overloading the << and >> operators, overloading other operators,
using member initializer list, support for vector operations such as dot product, cross product, etc. are left as an exercise.
Example Program 9.4.1 Using the CVector class
In this example, we write a program to define two float vectors A and B, add the contents of the two vectors, store the result
in a third vector, C, and display the contents of C. We will store and manipulate these vectors using the CVector class. This
example can then be generalized for other types of applications.
main.cpp
The client code (in main.cpp) enhances the capabilities by adding two non-member functions – Display and AddVectors. The
reader is encouraged to compare how incorporating the two functionalities as member functions of the class would affect ease
of use and execution of the program versus this version. The main program defines the values in three float vectors A, B and
C in lines 40-32. The initial values are set to zero for vectors A and B. The actual values of these two vectors are defined in line
lines 47-51 and displayed via calls to the Display function.
The interesting part of the program is in lines 59-68 where three incorrect uses of the CVector class are illustrated – the reader
should uncomment only one of these at a time and comment the other two. Line 61 shows use of an invalid index – size of C
is three, but the 4th location is being accessed. Lines 64-65 shows the same vector addition operation but with the result to be
stored in vector CC whose length is only 2. And finally, in line 68, a vector of zero length is being created.
Note that we could have asked the user to specify the size of the A and B vectors at run time and then dynamically allocated
memory for the three vectors as follows.
int nVecSize;
std::cout << "What is the size of the vectors? ";
std::cin >> nVecSize;
CVector<float> fVA(nVecSize),fVB(nVecSize),fVC(nVecSize);
Or
fVA.SetSize(nVecSize), fVB.SetSize(nVecSize), fVC.SetSize(nVecSize);
Tip: Note how line 61 fails because of the illegal (vector) index value. This is the most common programming error with the
usage of vectors and matrices. Placing a breakpoint in the ErrorHandler member function in the CVector class helps in detecting
the offending statement! The standard usage with C++ vector and std::vector fails to detect such errors.
float fVX[10];
vector<float> fVXX(10);
….
fVX[10] = 23.5; // illegal access
nV3
31 32
Fig. 9.4.2.1 Memory map
The memory map is shown in Fig. 9.4.2.1. Arrows emanate from pMA and pVA since the pointer variables point to memory
locations. There are two key statements in the above code.
pMA = &pVA[i]; // grab the address stored in pVA[i]
In the above statement, the address stored in pVA[i] is assigned to pMA. These refer to the three dotted lines in Fig. 9.4.2.1. The
next important statement is the statement in which the int value is accessed via
pMA[0][j]
In general, pMA[i][j] would imply the value stored at the memory location that is accessed as followed. Locate the memory
address stored in pMA[i], to that address add j and finally, get the value that is stored at that memory location!
We can improve on this implementation by combining what the variables pVA and pMA do (that is to store memory addresses).
A refined memory map is shown in Fig. 9.4.2.2. If we store the starting address of each row in the matrix in pMA, then &pMA[i]
would have the starting address of row(i-1). Since values in each row are stored in adjacent memory locations, pMA[i][j] would
point to the value stored at row i and column j!
pMA First row of matrix
11 12 ... 1m
Second row of matrix
... 21 22 ... 2m
...
... ... ... ...
so that when we use cells[1] it does point to the starting address of the first row. This is followed by setting the starting address
for each row as follows.
for (i=2; i <= nR; i++)
cells[i] = cells[i‐1] + nC;
The above statement simply computes the starting address of rows 2 through the last row, as the starting address of the previous
row plus the number of columns in that row (or the matrix) as shown in Fig. 9.4.2.3.
cells[1] cells[1][1]
1st row
...
2nd row
...
...
last row
...
Fig. 9.4.2.3 Memory mapping implementation diagram for CMatrix class (wasted space not shown)
The above implementation is for a row-oriented matrix. However, a similar implementation can be devised for a column-
oriented matrix.
Our template-based matrix class is defined below and is followed by a sample program that uses the CMatrix class. We first
present select portions of source code for the CMatrix template class that is contained entirely in a header file.
matrixtemplate.h
There are seven constructors to ease the way CMatrix objects can be created – rows and cols are used to signify the number of
nonzero rows and columns in the matrix, iv is the initial value for all the elements in the matrix, name is the name of the matrix
and is used only for identification purposes. Instructions for debugging this class when used with Microsoft Visual Studio are
shown at the top of the file. Several helper functions and overloaded operator functions provide added benefits. These are listed
in lines 47-59.
The listing of the member functions is not shown, and the reader is strongly urged to go through and understand the
implementation. Adding additional functionalities such as overloading the << and >> operators, overloading other operators,
using member initializer list, support for matrix operations such as addition, transpose, inverse, determinant, etc. are left as an
exercise.
The client code enhances the capabilities by adding two non-member functions – Display and AddMatrices. The main program
defines the values in two matrices A and B and then stores the sum of those two matrices in another matrix, C.
The three matrices are defined as having 3 rows and 2 columns with the initial values in A and B being zero.
Note that we could have asked the user to specify the size of these matrices at run time and then dynamically allocated memory
for the three matrices as follows.
int nR, nC;
std::cout << “How many rows and columns?”;
std::cin >> nR >> nC;
CMatrix<float> fMA(nR, nC),fMB(nR, nC),fMC(nR, nC);
Or
fMA.SetSize(nR, nC), fMB.SetSize(nR, nC), fMC.SetSize(nR, nC);
Tip: The overloading of the operators () makes it possible to detect indexing errors with the CMatrix class just as we saw the
CVector class handle this problem.
auto BB = AA;
the following operations take place: (1) AA is constructed with storage allocated to store 6 double precision values, (2) the copy
constructor is then called, a nullptr is assigned to BB, then 6 storage locations are allocated for BB updating the resource pointer,
and (3) values from AA are copied into the locations allocated for BB. Storage allocation and copying values are slow processes.
Sometimes, the source object (i.e., AA) isn't needed anymore. With move semantics, the compiler detects cases where the old
object is not tied to a variable, and instead performs a move operation. For example, since C++11, if the intent is to copy the
contents of AA into BB and not have to use AA anymore, one could write
auto BB = std::move (AA); // requires #include <utility>
and the move constructor would be used where the following operations take place: (1) AA is constructed with storage allocated
to store 6 double precision values, (2) the move constructor is then called, the resource pointer for BB is set to point to the same
location as that for AA, and (3) the resource pointer for AA is set to nullptr. No additional resource allocation and deallocation
takes place. These actions must be taken by the person writing the code as C++ compiler does not fully understand how the
resource is used in the code.
Moving objects only makes sense if the object type owns a resource of some sort, e.g., the CMatrix class allocates memory in
freestore and then deallocates when no longer required. If all data is contained within the object, the most efficient way to move
an object is to just copy it.
Let’s see how this takes place in the CMatrix class. Here are some key statements taken from matrixcontainerEXH.h file.
CMatrix (CMatrix<T>&&); // move constructor
CMatrix<T>& operator= (CMatrix<T>&&); // overloaded move = operator
The first statement shows the move constructor syntax. Note that, when appropriate, the move constructor is automatically
called by the C++ compiler generated code. It is explicitly called when, for example, std::move is used. It is implicitly called
when the compiler recognizes that it is needed. The second statement shows that the CMatrix class is providing its own
overloaded assignment operator.
Note how the move constructor works. Lines 229-232 simply copy the essential member variables from A (the source) to the
current CMatrix object (the target). The state of A is altered in lines 236-238 so that while object (A) exists, it cannot be used to
carry out any useful matrix operations.
The overloaded move assignment operator function works in three stages. In the first stage, error checks are carried out in lines
700-705. The current matrix and A must be compatible. Second, if the current matrix is a matrix holding data, the memory is
released by calling the Release() function (line 707). And finally, lines 710-713 show the resource pointer for the current matrix
being set to those of A, the properties of A being copied, and lines 717-718 show the resource pointer for A being set to nullptr
with the number of rows and columns reset to zero (line 718).
Example Program 9.4.3 Using the CMatrix class with and without move semantics
In this example, we will show how the move semantics works in the CMatrix class and why it is a useful feature in C++. To
make the comparison between using and not using move semantics, the preprocessor variable __NOMOVECTOR__ needs to be
defined when the move semantics should not be used. The variable can be defined and removed using the Project-
>Example9_4_3 properties menu option, selecting C++, then Preprocessor item, and adding or removing __NOMOVECTOR__
from the Preprocessor Definitions field.
By defining three rectangular matrices - A m n Aij 1.0, B m n Bij 2.0, C m n C ij 3.0 , a fourth matrix is defined as
D m n Dij 2 Aij 3Bij 4C ij
A very easy to read statement would be to use overloaded operators and write the C++ statement as
D = 2.0*A - B*3.0 + 4.0*C;
Handling an expression like this is where the move semantics shows its efficiency.
The storage requirement n B (in bytes) for a typical matrix A m n is
n B m 1 * sizeof T * ( m n 1) * sizeof ( T )
If T is double, then nB 8 m mn 2 .
main.cpp
We will find the execution time for this program using an unsophisticated but reasonably accurate technique of getting the
system time at two points in the program (line 20 and line 86) and taking the difference between the two. The four matrices are
declared in lines 27-30. Memory allocations take place, but the matrices are not initialized with any value. The initial values are
set in lines 38-40 for matrices A, B, and C.
If the program is set to use overloaded operators (see line 22 where the variable can be set as true or false), lines 41-47 show
how the result is computed. In the displayed version of the program, the move semantics is deactivated, and the overloaded
operators are used.
Here is the sequence of operations if the move semantics are not turned on – compute TM1=4.0*C and store the result in TM1,
a temporary matrix; compute and store TM2=B*3.0; compute and store TM3=2.0*A; compute and store TM4=TM3‐TM2, then using
the copy constructor transfer the result to TM5; compute TM6=TM4+TM1, then using the copy constructor transfer the result to TM7;
finally, assign the contents of TM7 to D. A total of 7 temporary matrices are used.
Here is the sequence of operations if the move semantics are turned on – compute TM1=4.0*C and store the result in TM1, a
temporary matrix; compute and store TM2=B*3.0; compute and store TM3=2.0*A; compute and store TM4=TM3‐TM2, then use the
move constructor; compute TM5=TM4+TM1, then use the move constructor; finally, assign the contents of TM5 to D. A total of 5
temporary matrices are used, two less than when the move semantics is not used. The alternate approach is to carry out the
evaluation as shown in lines 51-57 where the operator overloading is not used.
Additional code is used to ensure that the calculations are correct as shown in lines 60-67.
Finally, the remaining statements are shown below. Line 84 and 86-87 are used to show memory allocation/deallocation and
the execution time taken, respectively.
Finally, Table 9.4.1 shows the performance of the code when executed using different combinations of move semantics and
overloaded operators.
Table 9.4.1 Resource Usage Comparison
Move semantics Yes No No
Overloaded operators Yes No Yes
Freestore Memory Allocation and 36 003 600 000 16 001 600 000 44 004 400 000
Deallocation (bytes)
Wall Clock Time (sec) 11 6 15
Code Readability Excellent Good Excellent
where argc is the number of command line arguments and argv is a character array containing the command line arguments.
Consider the following scenario - you have created a console application search and you wish to specify the file from which to
search for a specific string. Now assume that you launch the program from command line as follows
With this example, argc is 3 – there are 3 command line arguments. The first argument (contained argv[0]) is search, the second
argument (contained in argv[1]) is address.dat and the last argument is Iowa (contained in argv[2]).
The program is simple and self-explanatory. It should be clear that the minimum value of argc is 1 so that at least the program
name is available in argv[0]. One or more blank spaces separate one argument from the next. The sample output is shown in
Fig. 9.5.1.
In more complex scenarios, it is possible that this manner of handling errors can result in code that is complicated, difficult to
read and maintain – imagine what would need to happen to function A calls function B that in turn calls functions C, D and E,
with each one capable of generating an error! The reader is encouraged to read the contents of this webpage:
https://isocpp.org/wiki/faq/exceptions.
Furthermore, there are C++ constructs that do not permit error handling the usual way – constructor, destructor, etc. So what
is the solution? To find one, we first start by studying exception handling features provided by C++. Fig. 9.6.1 shows the
standard exceptions.
std::exception
std::overflow_error std::bad_exception
std::domain_error
std::bad_function_call
std::invalid_argument std::range_error
std::bad_typeid
std::length_error std::system_error std::bad_weak_ptr
std::out_of_range std::underflow_error std::ios_base::failure
append. The second argument (position of the first character in the first argument that is copied; first character is denoted as 0,
not 1) has a valid value between 0 and 5 since there are 6 characters in oosoft.
main.cpp
The first example shows an exception generated by the std::string class. In line 34, the intent is for the string in rstr to be
appended to the string in MS. The append function extends the string by adding additional characters at the end and has three
arguments – the string to use to append (in this case rstr), position of the first character (in this case 7 which is invalid since
rstr has only 6 characters and the second argument can be between 0 and 5) in the string being used (rstr)and the length of
the substring to be copied (in this case std::string::npos indicates all characters until the end of rstr).
The second example shows an error that occurs when an invalid value is used in the new statement. In both the second and
third examples, the error is due to the fact that the value used in resource allocation becomes negative.
The third example shows a similar type of error but one using std::string class object.
Finally, the last example shows how to throw and catch and error from a user-defined function. Rect_Area is created to compute
the area of a rectangle. The two arguments to the function are the values of the length and the width of the rectangle. Three
error conditions are flagged – the length and/or the width values are less than or equal to zero, and if the width is greater than
the length.
catch (...)
{
std::cout << "Sorry, could not catch the error whatever it is.\n";
}
Note that catching by value may result in an invalid object being created (e.g. slicing problem). Hence the exception should be
caught by reference.
Tip 2: Use noexcept carefully
C++ permits tagging the keyword noexcept to function declarations to indicate that the function does not throw exceptions
and is not designed to handle exceptional situations. Here is an example.
#include <iostream>
#include <string>
return 1;
}
int main ()
{
try
{
int x = bar ();
}
catch (std::exception& err)
{
std::cout << "Caught: " << err.what() << "\n";
}
std::cout << "Reached here ...\n";
return 0;
}
In this example, the program will terminate abnormally.
Summary
The second more advanced look at classes and objects begins to show the strength of OOP and C++. Ideas associated with
the this pointer, proper use of const qualifier, friend classes, use of the reference operator with functions, operator
overloading, and template classes. The development of two classes to manage vectors and two-dimensional matrices were
studied. These classes will prove to be indispensable in handling numerical analysis algorithms. Both classes assume that the
elements of the vectors and matrices are stored at contiguous locations.
Programming Style Tip 9.1: Pass objects including arrays by reference
To avoid making a copy of the object being passed, especially if the object is complex and resource hungry, pass objects by
reference. For example, the AddMatrices function uses pass-by-reference technique. This process is preferable since a local copy
is not created if passed by reference. Making a local copy can be time-consuming if the size of the matrix is large. If the matrix
should not be modified, use the const qualifier.
Programming Style Tip 9.2: Ask yourself if overloaded operators are really necessary?
There is no doubt that the use of overloaded operators in client code makes the code easier to read and hence maintain.
However, when resource allocation is an issue, one should be careful in implementing and using overloaded operators. As we
saw, sometimes a copy of the object is made during the execution. If this execution is going to consume additional scarce
resources, then it may not be a good idea to implement overloaded operators with that class.
Programming Style Tip 9.3: Need for the Big Five
C++ programmers recognize the need for the Big Five – copy constructor, overloaded copy assignment operator, move
constructor, overloaded move assignment operator, and destructor, when designing and implementing a class. In other words,
if you write your own copy constructor then you should write your own assignment operators and the destructor. The default
functionality provided by the C++ compiler (in the absence of your version) may not provide the exact functionality that is
necessary for a robust, efficient and correct code.
Exercises
Most of the problems below involve the development of one or more classes. In each case (a) develop a plan to test
the classes(s), and (b) implement the plan in a main program.
Appetizers
Problem 9.1
Problem 7.4 dealt with the development and implementation of the CFraction class. Now enhance the capabilities of the class
by overloading the following operators. F1, F2 and F3 are CFraction objects.
F1 = F2; // overloaded operator =
F3 = F1 + F2; // overloaded operator +
F3 = F1 ‐ F2; // overloaded operator ‐
F3 = F1 * F2; // overloaded operator *
F3 = F1 / F2; // overloaded operator /
if (F1 == F2) // overloaded operator ==
if (F1 != F2) // overloaded operator !=
cout << F3 << “\n”; // overloaded operator <<
cin >> F3 ; // overloaded operator >>
Problem 9.2
Convert the statistical functions discussed in Example 4.4.1 into member functions of a CStatistics class. However, use the
CVector class to handle vectors instead of C++ arrays. This way you will be able to deal with any number of data values.
Write your main program in such a way that you have a mini-statistical package to support the functionalities discussed in the
example. In other words, the program you develop should ask the user for the number of data values, allocate the memory
dynamically to store those values, and then call the CStatistics member functions to compute all the statistical values and
display them on the screen. Set up the program for one time execution only.
Problem 9.3
Change the CStatistics class developed in Problem 9.2 to a template class.
Main Course
Problem 9.4
The differential equation for a transverse deflection of a beam of length L subjected to an arbitrary loading, w ( x ) is given by
d 4 y w( x )
. Take w ( x ) as a uniform load, w .
dx 4 EI
(a) The solution for a simply-supported beam is given as
wx
y( x )
24 EI
x 3 2Lx 2 L3
(b) The solution for a fixed-fixed beam is given as
wx 2
y( x )
24 EI
x 2 2Lx L2
(c) The solution for a cantilever beam is given as
wx 2
y( x )
24 EI
x 2 4 Lx 6L2
Obtain the values of L , E, I , w and the units for force and length from the user. Divide the length of the beam into 20 equally
spaced points. Display a table on the screen. The table should have four columns – location and corresponding displacement
for the three beam types. Store the table data in a matrix (use the CMatrix class).
C++ Concepts
Problem 9.5
Complex numbers are used a variety of engineering and scientific calculations. A complex number z is written as
z x iy
The operations between two complex numbers z 1 and z 2 can be expressed as
z 1 z 2 ( x 1 x 2 ) i y1 y 2
z 1 z 2 ( x 1 x 2 ) i y1 y 2
z 1z 2 ( x 1x 2 y1 y 2 ) i x 1 y 2 x 2 y1
z 1 x 1 x 2 y1 y 2 x 2 y1 x 1 y 2
i
z 2 x 22 y 22 x 2 y2
2 2
z x iy x 2 y 2
Implement the CComplex class whose class definition is shown below.
#pragma once
#include <iostream>
class CComplex
{
public:
CComplex ();
CComplex (double dR, double dI);
CComplex (const CComplex&);
~CComplex ();
friend std::ostream &operator<< (std::ostream&, const CComplex&);
friend double fabs (const CComplex&);
// accessor functions
double Real ()const;
double Imaginary ()const;
void GetValues (double&, double&) const;
double Modulus ()const;
// modifier functions
void Real (double);
void Imaginary (double);
void SetValues (double, double);
// overloaded operators
CComplex operator+ (const CComplex& dR) const;
CComplex& operator+= (const CComplex& dR);
CComplex operator‐ (const CComplex& dR) const;
CComplex& operator‐= (const CComplex& dR);
CComplex operator* (const CComplex& dR) const;
CComplex& operator*= (const CComplex& dR);
CComplex operator/ (const CComplex& dR) const;
CComplex& operator/= (const CComplex& dR);
CComplex& operator= (const CComplex& dR);
CComplex& operator= (const double);
CComplex operator‐ () const;
private:
double m_dReal;
double m_dImaginary;
};
Problem 9.6
Add exception handling to all the problems in this chapter using the style from Example 9.6.2.
Problem 9.7
Both the CVector and the CMatrix implementations waste storage locations in order to start indexing the contents with 1, i.e.
fVA(1) is the first element in float vector A, and fMA(1,1) is the first element in float matrix A. One storage location is wasted
for every CVector object, and two storage locations for every CMatrix object. However, noting that addresses can be manipulated
by very simple math operations, modify that source code in vectortemplate.h and matrixtemplate.h so that there are no
wasted storage spaces.
References
http://www.stroustrup.com/C++.html
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf
https://isocpp.org/std/status
https://isocpp.org/wiki/faq/exceptions
https://www.acodersjourney.com/top-15-c-exception-handling-mistakes-avoid/
http://www.gotw.ca/publications/mill22.htm
http://stdcxx.apache.org/doc/stdlibref/2-3.html
https://en.cppreference.com/w/cpp/error/nested_exception
https://en.cppreference.com/w/cpp/error/throw_with_nested
https://dzone.com/articles/some-useful-facts-to-know-when-using-c-exceptions
Chapter
Matrix Algebra
“To me, mathematics, computer science, and the arts are insanely related. They’re all creative expressions.” Sebastian
Thrun
“The art of doing mathematics consists in finding that special case which contains all the germs of generality.” David
Hilbert
In Chapter 6, we saw the solution of an equation in a single unknown. The challenge was to find the roots of a nonlinear
equation. More often than not, engineering problems are described by several (hundreds, perhaps million) equations. These
equations are usually linear in nature or can be approximated as such. In this chapter, we will start by first reviewing the basics
of matrix algebra – vector and matrix operations. With that background, we will start to look at different methods to obtain the
solution of linear algebraic equations. Finally, we will be in a position to develop an object-oriented matrix toolbox based on the
CVector and CMatrix classes developed in Chapter 9. This toolbox will be used in almost all the later chapters dealing with
numerical analysis.
Objectives
To understand matrix algebra.
To understand the role of matrix algebra in engineering and scientific analysis.
To understand and implement the steps to solve a system of linear algebraic equations.
To understand and implement a template matrix toolbox.
12 3 1
B33 3 8 0
1 0 22
is a symmetric matrix.
Diagonal Matrix: A square matrix such that Aij 0 if i j is a diagonal matrix. For example,
12 0 0
B33 0 8 0
0 0 22
is a diagonal matrix.
Identity Matrix: A diagonal matrix such that Aii 1 , Aij 0, i j is an identity matrix and is denoted In n . For example, the
following is an identity (or, unit) matrix of order or size 3.
1 0 0
I 33 0 1 0
0 0 1
Upper Triangular Matrix: A square matrix such that Aij 0 if i j is an upper triangular matrix. For example,
12 55 0
B33 0 8 10
0 0 22
is an upper triangular matrix.
Lower Triangular Matrix: A square matrix such that Aij 0 if i j is an lower triangular matrix. For example,
12 0 0
B33 55 8 0
0 10 22
is a lower triangular matrix.
Positive Definite Matrix: A square matrix such that all its eigenvalues are positive. We will look at eigenvalues in Chapter 16.
Orthogonal Matrix: A square matrix such that its transpose is equal to its inverse. In other words, A T A AA T I .
Hermitian or Self-Adjoint Matrix: A square matrix such that it is equal to its complex-conjugate of its transpose. In other words,
A A † . For a real matrix, Hermitian means the same as symmetric.
10.1.2 Operations
Addition and Subtraction: Two matrices of the same size can be added or subtracted from one another. For example, if
A m n Bm n Cm n then Aij Bij C ij (10.1.2-1)
and, if
A m n Bm n C m n then Aij Bij C ij (10.1.2-2)
Consider the following example. Let
12 3 1 0 12 1
B33 3 8 0 and C33 15 8 1
1 0 22 11 0 7
Then
12 9 0 12 15 2
A B C 12 16 1 and A B C 18 0 1
12 0 29 10 0 15
Multiplication: Two matrices can be multiplied as follows
A m n B m o Co n (10.1.2-3)
provided the number of columns in B is equal to the number of rows in C . This condition makes the two matrices
conformable. The resulting matrix A has its number of rows equal to the number of rows in B and number of columns equal
to the number of columns in C . To generate the elements of the resulting matrix A we need
o
Aij BikC kj (10.1.2-4)
k 1
In other words, the product of the corresponding elements from row i of B with the elements from column j of C yields
Aij . This operation is similar to computing the dot product.
For example, let
12 3 1 0 12
B33
3 8 0 and C32 15 8
1 0 22 11 0
Then A 32 B33C32 can be computed by writing the three matrices as follows.
0 12
15 8
11 0
12 3 1 A11 A12
3 8 0 = A21 A22
1 0 22 A A32
31
where A11 = the product of the first row of B times the first column of C
(12)(0) ( 3)(15) (1)(11) 34
A12 = the product of the first row of B times the second column of C
(12)(12) ( 3)(8) (1)(0) 120
A21 = the product of the second row of B times the first column of C
( 3)(0) (8)(15) (0)(11) 120
A22 = the product of the second row of B times the second column of C
( 3)(12) (8)(8) (0)(0) 28
A31 = the product of the third row of B times the first column of C
(1)(0) (0)(15) (22)(11) 242
A32 = the product of the third row of B times the second column of C
(1)(12) (0)(8) (22)(0) 12
Transpose: The transpose of matrix A m n is denoted A nT m . The transpose matrix is constructed such that
AijT A ji (10.1.2-5)
As can be seen from Eqn. (6.1.2-5), the transpose matrix is obtained by interchanging the rows and columns of the original
matrix. Let
0 12
0 15 11
C32 15 8 . Then C 2T3
11 0 12 8 0
n
or, det( A ) a ij M ij for any j 1, 2,..., n (10.1.2-6b)
i 1
where minor M ij is the determinant of the ( n 1) ( n 1) submatrix obtained by deleting the ith row and jth column, and
cofactor a ij associated with M ij is defined to be a ij ( 1)i j M ij . While it will not be necessary for us to compute the
determinant of a matrix, we still need to understand the concept. Let
4 3 8 3
A 2 2 and B22 16 6
1 6
Then using Eqn. (10.1.2-6a) with i 1 ,
n
det( A ) a ij Aij a11 A11 a12 A12 4 a11 3a 12
j 1
a11 ( 1) 1 1
det 6 (1)(6) 6 a12 ( 1)1 2 det 1 ( 1)(1) 1
det( A ) 4 a11 3a12 4(6) 3( 1) 27
n
and, det( B ) bij Bij b11 B11 b12 B12 8b11 3b12
j 1
b11 ( 1) 1 1
det 6 (1)(6) 6 b12 ( 1)1 2 det 16 ( 1)(16) 16
det( B ) 8b11 3b12 8(6) 3( 16) 0
Since the determinant of B is zero, B is known as a singular matrix. In the next section we will see a more efficient way to
compute the determinant of a matrix.
If m n , we have an over determined system. On the other hand, if m n then it is possible that there are no unique solutions.
Equation solvers can be categorized as being either direct or iterative. Direct solvers are those that solve Eqns. (10.2.1) in a non-
iterative fashion. The procedure typically involves one pass through the n equations. On the other hand, iterative solvers
transform the problem into an equivalent problem and the solution is improved iteratively.
There are other issues such as parallelizing and vectorizing the solution procedure on specialized hardware and software systems,
but they are beyond the scope of the discussions here.
10.2.1 Storage Scheme
The coefficient matrix A n n can take on many different forms depending on the application. The matrix can be full meaning
that all the elements in the matrix are nonzero. At the other end of the spectrum the matrix can be sparse. For example, solution
of some of the popular partial differential equations involves solving a system of equations where the nonzero entries make up
a few percent (1-10%) of the entire matrix. The coefficient matrix can have other forms and properties – symmetric, banded,
anti-symmetric, skyline, positive definite, indefinite, etc. The implementation efficiency of an algorithm can be tied to its storage
scheme.
Full: This is the simplest storage form where the matrix can be stored either rowwise or columnwise. If A n n is stored rowwise
then the elements of the matrix are stored as
A11 , A12 ,..., A1n , A21 , A22 ,..., A2n ,... An1 , An 2 ,..., Ann
On the other hand, if A n n is stored columnwise then the elements of the matrix are stored as
A11 , A21 ,..., An1 , A12 , A22 ,..., An 2 ,... A1n , A2n ,..., Ann
In FORTRAN, matrix elements are stored columnwise. In statically allocated C++ matrices, matrix elements are stored
rowwise. The CMatrix class that we developed in Chapter 9 is designed to store the elements rowwise and with a little
tweaking can also be designed to store the elements columnwise. Either way the total storage requirement is n 2 locations.
Banded and Skyline: In the context of finite element analysis, A n n is mostly symmetric and positive definite. Under certain
scenarios, A has a special form that can be exploited from a storage perspective. Consider the symmetric matrix shown in Fig.
10.2.1. The figure shows only the nonzero upper triangular components.
A11 A12 A15 A16
A22 A25 A26
A33 A34 A35 A36
A44 A45 A46
A55 A56 A57 A58 A59 A5,10
A66 A67 A68 A69 A6,10
A77 A78
A88
A99 A9,10
A10,10
Fig. 10.2.1 Upper triangular portion of the system stiffness matrix
We will draw a special box that encompasses all the nonzero elements in the upper triangular portion of the matrix. As can be
seen from Fig. 10.2.2, the nonzero elements are all contained within a band. The width of this band is known as the half-band
width (HBW) of the matrix. To find the HBW, we scan each row of the matrix starting with the diagonal element and look for
the last nonzero entry in that row. The maximum distance from the last nonzero element to the diagonal element (in a particular
row) for all the rows gives the HBW. The formula for HBW of row i is given as
HBW i c i 1 (10.2.2a)
where c is the column number of the last nonzero element in row i (note, the +1 is used so that the distance includes both
the last nonzero element and the diagonal element) and
HBW max HBW i (10.2.2b)
i
For example, in row 1, since the last nonzero element is A16 , this distance is 6 1 1 6 . As can be deduced from Fig. 10.2.2,
the half-band width (HBW) of the given matrix is 6.
Ai , j 0 if j i (10.2.4)
else Ai , j Aibanded
, j i 1 (10.2.5)
As we can see in Fig. 10.2.2, a good number of the elements within the band are zero! We can capitalize even more on this
characteristic by storing only the elements within the skyline profile that is shown in Fig. 10.2.4.
A11 A12 A15 A16
A22 A25 A26
A33 A34 A35 A36
A44 A45 A46
A55 A56 A57 A58 A59 A5
A66 A67 A68 A69 A6
A77 A78
A88
A99 A9
A10,10
Fig. 10.2.4 The “skyline” profile
The original matrix, in fact, can be stored in a vector as follows.
351 A11 , A22 , A12 , A33 , A44 , A34 ,..., A10,10 , A9,10 , A8,10 , A7,10 , A6,10 , A5,10
A skyline (10.2.6)
Note that each column is stored starting with the diagonal element of that column followed by all the other elements in that
column until the last nonzero entry in that column. To facilitate mapping the original elements of the matrix, an additional
indexing vector, Dlocn1 is created that has ( n 1) elements. These elements store the location of the diagonal element (of each
column) with the last element storing the (last element+1) in the stiffness matrix. In other words, the last element contains one
more than the total number of entries in the skyline profile. Going back to the current example, we have the following Dlocn1
vector.
loc
D11 1, 2, 4, 5, 7,12,18, 21, 25, 30, 36 (10.2.7)
The relationship (mapping) between the elements in the original A in Fig. 10.2.2 and the skyline form A skyline in Fig. 10.2.4 can
be derived as follows.
Ai , j 0 if j i D locj 1 D locj (10.2.8)
Ai , j 0 if j i (10.2.9)
Ai , j l D locj j i Alskyline (10.2.10)
For example, to locate A68 we note that (a) i 6, j 8 , (b) D 21 , (c) l 21 8 6 23 . Hence, A68 A23skyline .
loc
8
Finally, let us compare the storage requirements of the three schemes – full, banded, and skyline, assuming that the stiffness
matrix is stored in the double precision format, and that two integer words make up a single double precision word
( q hbw , m Dlocn1 1) .
Storage Scheme What is to be stored? Equivalent integer words
Full A n n 2n2
Banded A banded
n hbw
2nq
Skyline A skyline
m , Dlocn1 2m n 1
Using our example, we have the three values in the last column as 200,120 and 81 integer words – significant savings with
increasing sophistication of the storage scheme.
35
100 35%
Sparse: It is not evident with our simple example that the matrix is sparse. The percent sparsity of the matrix is
100
. For some engineering problems, as the size of the problem increases, the sparsity of the matrix increases (or the number of
nonzero terms decreases). In one of the sparse storage schemes, three vectors are used to track the locations of the nonzero
entries. The matrix is stored in a vector rowwise with only the non-zero entries being stored1 in A sparse
m1 . Using our previous
example, we have the following.
311 A11 , A12 , A15 , A16 , A22 , A25 , A26 ,..., A88 , A99 , A9,10 , A10,10
A sparse (10.2.11)
To access the entries in the matrix, two additional (indexing) vectors are needed. The first, C m1 is used to store the column
numbers of the nonzero entries. Again, we have with our example
C311 1, 2, 5, 6, 2, 5, 6, 3, 4, 5, 6,...,8, 9,10,10 (10.2.12)
The second, R ( n 1)1 , stores the starting location of each row. Again, we have with our example
R 111 1, 5,8,12,15, 21, 26, 28, 29, 31, 32 (10.2.13)
Discussion of sparse equation solvers is outside the scope of this book.
Offline: There are special situations when the solution procedure operates on matrices that are written on and retrieved from
computer hard disk. This is typically done when the size of the stiffness matrix and other associated vectors/matrices in any
storage format is several times the size of the computer’s random-access memory (RAM). Discussion of offline equation solvers
is outside the scope of this book.
10.2.1 Direct Solvers
In this section we will see several solution methods starting with the Gaussian Elimination technique.
Gaussian Elimination
There are three operations that can be used on a system of equations to obtain another equivalent system.
(1) We can interchange the order of two equations.
(2) Both sides of an equation may be multiplied by a nonzero constant.
(3) A multiple of one equation can be added to another equation.
Consider the following set of equations.
8 1 x 1 6 x 1 1
4 7 x 18 x 2
2 2
(1) Interchanging the two equations, we have the following set of equations.
4 7 x 1 18 x 1 1
8 1 x 6 x 2
2 2
Note that row interchanges do not require changing the order of the unknowns.
(2) We could multiply the second equation by a constant (say 1.5) to obtain the following set of equations.
8 1 x 1 6 x 1 1
6 10.5 x 27 x 2
2 2
4
(3) If we multiply the first equation by 0.5 and add to the second equation, we obtain the following set of equations.
8
1 It should be noted that zero entries within the skyline profile may become nonzero during the solution phase.
8 1 x 1 6 x 1 1
0 7.5 x 15 x 2
2 2
The central idea in the Gaussian Elimination technique is based on generating an equivalent set of equations using row
operations (3) discussed earlier.
There are two phases to the solution – forward elimination and backward substitution. In the forward elimination phase, the
basic idea is to take the set of equations from the original form
A11 A12 A13 A1i A1n x 1 b1
A A22 A23 A2 i A2 n x 2 b2
21
A31 A32 A33 A3i A3n x 3 b3
(10.2.14)
Ai 1 Ai 2 Ai 3 Aii Ain x i bi
A
n1 An 2 An 3 Ani Ann x n bn
In the backward substitution phase (dropping the superscript), we first compute the value of the last unknown x n followed
by the other unknowns as shown below.
b
xn n (10.2.18)
Ann
n
bi
j i 1
Aij x j
and xi i n 1, n 2,...,1 (10.2.19)
Aii
Algorithm
Step 1: Forward Elimination. Loop through rows, k 1,..., n 1 .
Step 2: Check if Akk . If yes, stop. The equations are linearly dependent (or A is singular).
Step 3: Loop through rows, i k 1,..., n .
Aik
Step 4: Compute constant, c .
Akk
Step 5: Loop through columns j k 1,..., n .
Step 6: Set Aij Aij cAkj .
Step 7: End loop j .
Step 8: Set bi bi cbk .
Step 9: End loop i .
Step 10: End loop k .
Step 11: Backward substitution. Set x n bn Ann .
Step 12: Loop through all rows, i n 1,...,1 .
n
Step 13: Compute sum
j i 1
Aij x j .
bi sum
Step 14: Compute x i .
Aii
Step 15: End loop i .
Example 10.2.1
Solve the following set of equations using Gaussian Elimination method.
10 5 2 x 1 6
3 20 5 x 58
2
2 7 15 x 3 57
Solution
Forward Substitution
Noting that n 3 , the successive snapshots as we go through the algorithm are as follows.
10 5 2 x1 6
k 1 21.5 4.4 x 2 56.2
6 15.4 x 3 58.2
10 5 2 x1 6
k 2 21.5 4.4 x 2 56.2
14.172 x 3 42.5163
Backward Substitution
42.5163
x3 3
14.172
56.2 4.4 3
i 2 x2 2
21.5
6 ( 4)
i 1 x1 1
10
1
Hence the solution is x 31 2 .
3
Pivoting and Scaling: Numerical problems of the form of truncation and round-off errors can be problematic if the coefficient
matrix A is not well-conditioned. A small change in b resulting in a large change in x is symptomatic of this ill-conditioning.
The culprit in the standard implementation of the Gaussian Elimination technique is Akk that is used in Step 4. With rounding
errors, this number could turn out to be very small and using this value could result in gross errors. Pivoting can be used to
improve the computations and reduce the effects of the numerical errors.
Partial pivoting: For 1 k n 1 , at the kth stage, let
c k max Aik( k ) (10.2.20a)
k i n
Let i be the row index such that i k for which we obtain c k . If i k , we switch rows k and i in both A and b . By using
the largest remaining element in A , we prevent the creation of elements in A ( k ) of greatly varying size that leads to numerical
errors.
Total pivoting: For 1 k n 1 , at the kth stage, let
c k max Aij( k ) (10.2.20b)
k i , j n
Let i , j be the row and column indices such that i , j k for which we obtain c k . If i k , we switch rows k and i in both
A and b . If j k , we switch columns k and j in A and the order of the unknowns in x . At the end of the solution, we
need to switch the order of the unknowns back to the original form.
Numerical studies have shown that total pivoting prevents the catastrophic accumulation of roundoff errors. However, total
pivoting is an expensive process. Partial pivoting provides adequate relief for most practical problems.
Scaling: Round-off errors are likely to dominate if the elements of the coefficient matrix A vary greatly in value. One solution to
this problem is to scale A so that this variation in value is less, and this can be achieved by multiplying the rows and columns
by an appropriate constant. In practice, it is necessary to scale the rows so that they are approximately equal in magnitude.
Similarly, x should be scaled so that all the unknowns are approximately equal.
Let S1 and S2 be diagonal scaling matrices so that
S AS
A (10.2.21a)
1 2
which is a modification of the condition used in partial pivoting (Eqn. (10.2.20b)). The modified Gaussian Elimination algorithm
is presented below. There is an additional vector piv that is needed to store the row interchange information. In other words,
if piv ( k ) k , then no row interchange took place. Otherwise piv ( k ) i indicates that rows i and k were interchanged at
step k .
Algorithm
Step 1: Forward Elimination. Compute s i max Aij i 1, 2,..., n .
1 j n
bi sum
Step 17: Compute x i .
Aii
Step 18: End loop i .
Error Analysis
It is practical and necessary at the end of the solution process to examine the quality of the solution. If x is indeed the solution,
then the residual vector r = Ax - b should be zero. Numerically, the residual vector is not zero and one must ascertain the
magnitude of the residual terms. One can define the absolute error and relative errors
abs r (10.2.22)
r
rel (10.2.23)
b
Provided both these measures are small, the solution x is acceptable.
Example 10.2.2
Solve the following set of equations using Gaussian Elimination method with and without partial pivoting. Compute the residual
vector in each case.
0.7 0.8 0.9 x 1 0.7
1.0000001 1.0 1.0 x 0.8
2
1.3 1.2 1.1 x 3 1.0
Solution
Numerically the coefficient matrix is nearly singular.
The solution and residual vector using pivoting is as follows.
499999.9994 1.16415(10 10 )
x 1000000.649 r 5.82076(10 11 )
499999.7994 0
The solution and residual vector without pivoting is as follows.
0.7 0.280000188
x 0.2000001 r 0.20000015
0.09999988 0.220000252
LU Factorization
The matrix A in Ax b can be factored2 as A LU where L is a lower triangular matrix with nonzero values on the
diagonal and below, and U is an upper triangular matrix with nonzero values on the diagonal and above. For example, we
could describe the situation symbolically as follows.
A11 A12 A13 A14 L11 U 11 U 12 U 13 U 14
A A A A L L U 22 U 23 U 24
21 22 23 24
21 22 (10.2.24)
A31 A32 A33 A34 L 31 L 32 L 33 U 33 U 34
A41 A42 A43 A44 L 41 L 42 L 43 L 44 U 44
The attractive aspect of this procedure is that we can solve the original equations (10.2.1) as follows.
A n n x n 1 bn 1 (10.2.25a)
or L n n Un n x n 1 bn 1 (10.2.25b)
Let L n n y n1 bn 1 (10.2.25c)
and solve for y . Then solve
Un n x n 1 y n 1 (10.2.25d)
for x . The implication is that once L and U have been obtained, a new RHS vector requires just forward and backward
substitutions in Eqns. (10.2.21c-d). In other words, the forward substitution involves computing
b1
y1 (10.2.26a)
L11
i 1
bi L ij y j
j 1
yi i 2, 3,..., n (10.2.26b)
L ii
Similarly, the backward substitution involves computing
yn
xn (10.2.27a)
U nn
n
yi U
j i 1
ij xj
xi i n 1, n 2,...,1 (10.2.27b)
U ii
The major question that remains is how do we obtain the L and U matrices? If we multiply the L and U matrices, we obtain
for a typical term
Aij L i 1U 1 j L i 2U 2 j ...... (10.2.28)
However, we should note that all elements of L and U matrices do not exist. Hence,
i j: Aij L i 1U 1 j L i 2U 2 j ..... L ii U ij (upper triangular elements) (10.2.29a)
i j: Aij L i 1U 1 j L i 2U 2 j ..... L ii U jj (diagonal elements) (10.2.29b)
i j: Aij L i 1U 1 j L i 2U 2 j ..... L ij U jj (lower triangular elements) (10.2.29c)
These three sets of equations indicate that we have n 2 equations for n 2 n unknowns – the elements in L and U matrices.
Crout’s Algorithm simply is to set the diagonal entries in L as unity, i.e. L ii 1 . Now we can solve n 2 equations in n 2
unknowns. An examination of the three equations will show that we can start with column 1 and compute U11 . Once we know
U11 , we can compute all the elements in the first column of L , i.e. L i 1 , i 2,..., n . Next, we compute the elements in the
second column of U , i.e. U12 , U 22 . After that we can compute the elements in the second column of L , i.e. L i 2 , i 3,..., n.
We repeat this process until the elements in all the columns of L and U have been computed.
Algorithm
Phase 1: Factorization
Step 1: Loop through all diagonal entries in L and set them to unity, L ii 1, i 1, 2,..., n .
Step 2: Loop through j 1, 2, 3,..., n .
i 1
Step 3: Loop through i 1, 2,.., j . Set U ij Aij L ikU kj .
k 1
Step 4: Check if U jj . If yes, stop. The equations are linearly dependent (or A is singular).
j 1
Aij L ikU kj
Step 5: Loop through i j 1, j 2,.., n . Set L ij k 1
.
U jj
Step 6: End loop through j .
Phase 2: Forward and backward substitution
Step 1: Solve Eqn. (10.2.25c): Solve for y1 (Eqn. 10.2.26a). Loop through i 2, 3,..., n and solve for y i using Eqn. (10.2.26b).
Step 2: Solve Eqn. (10.2.25d): Solve for x n (Eqn. 10.2.27a). Loop through i n 1, n 2,...,1 and solve for x i using Eqn.
(10.2.27b).
The attractive aspect of this algorithm is that we really do not need (additional) storage space for the L and U matrices. The
original A matrix can be overwritten with these matrices such that
A11 A12 A13 A14 U11 U 12 U 13 U 14
A A22 A23 A24 L U 22 U 23 U 24
21 21
A31 A32 A33 A34 L 31 L 32 U 33 U 34
A41 A42 A43 A44 L 41 L 42 L 43 U 44
Furthermore, we do not need space for y . We first implement Step 1 in Phase 2 using x to store the y values. Then Step 2
can be computed with every successive y i value replaced with the x i value starting y n x n .
The LU Factorization provides a convenient and efficient way to compute the determinant of a matrix. Since A = LU , we
have det( A ) det( L )det( U ) . Since det( L ) 1 , the determinant of the original matrix is simply
det( A ) det( U ) U11U 22 U nn .
Example 10.2.3
Solve the following set of equations using LU Factorization and compute the determinant of the coefficient matrix.
10 5 2 x 1 6
3 20 5 x 2 58
2 7 15 x 3 57
Solution
Factorization n 3
j 1, i 1 U11 A11 10
A21 3
j 1, i 2 L 21 0.3
U11 10
A31 2
j 1, i 3 L 31 0.2
U11 10
j 2, i 1 U12 A12 5
j 2, i 2 U 22 A22 L 21 U12 20 (0.3)( 5) 21.5
A32 L 31U12 7 ( 0.2)( 5)
j 2, i 3 L 32 0.27907
U 22 21.5
j 3, i 1 U13 A13 2
j 3, i 2 U 23 A23 L 21U13 5 (0.3)(2) 4.4
j 3, i 3 U 33 A33 L 31U13 L 32U 23 15 ( 0.2)(2) (0.27907)(4.4) 14.1721
Hence the new coefficient matrix replaced by the elements of L and U matrices is as follows.
10 5 2
A 0.3 21.5 4.4
0.2 0.27907 14.1721
Forward Substitution
b1 6
i 1 y1 6
L11 1
b2 L 21 y1 58 (0.3)(6)
i 2 y2 56.2
L 22 1
b3 L 31 y1 L 32 y 2 57 ( 0.2)(6) (0.27907)(56.2)
i 3 y3 42.5163
L 33 1
Backward Substitution
y 42.5163
i 3 x3 3 3
U 33 14.1721
y 2 U 23 x 3 56.2 (4.4)(3)
i 2 x2 2
U 22 21.5
y1 U12 x 2 U13 x 3 6 ( 5)(2) (2)(3)
i 1 x1 1
U11 10
Determinant of A
det( A ) U11U 22U 33 (10)(21.5)(14.1721) 3047
D1 D1 L 21 D1 L 31 . D1 L n 1
D1 L221 D2 D1 L 21 L 31 D2 L 32 . D1 L 21 L n 1 D2 L n 2
D1 L231 D2 L232 D3 . D1 L 31 L n 1 D2 L 32 L n 2 D3 L n 3 (10.2.30)
. .
sym D1 L2n 1 D2 L2n 2 ... Dn
By comparing the RHS of Eqn. (10.2.30) with the LHS we have the mechanism to compute the L and D matrices given the
A matrix.
To obtain the solution once the decomposition or factorization is completed, we have the following steps.
LDLT x b (10.2.31)
Let Ly b (10.2.32)
Then DLT x y (10.2.33)
T
Note that DL is of the upper triangular form.
3 Strictly speaking, Cholesky Decomposition A = U T U can only be used with a symmetric positive definite A .
D1 D1 L12 . D1 L1n
D . D
2 L 2n
DLT
2
(10.2.34)
. .
0 Dn
We can solve Eqns. (10.2.32) for y through the forward substitution procedure that we have seen before. Once y has been
computed, we can solve Eqns. (10.2.33) through the backward substitution process.
Algorithm
Factorization Phase
Step 1: Loop through rows, i 1,..., n .
i 1
Step 2: Set Di Aii L2ij D j . If Di , stop. The matrix is not positive definite.
j 1
i 1
A ji L jk Dk L ik
Step 3: For j i 1,..., n , set L ji k 1
.
Di
Step 4: End loop i .
Forward and Backward Substitutions
Step 5: Forward Substitution. Set y1 b1 .
i 1
Step 6: For i 2,..., n , set y i bi L ij y j . This ends the Forward Substitution phase.
j 1
yn
Step 7: Backward Substitution. Set x n .
Dn
yi n
Step 8: For i n 1,...,1 , set x i
Di
L
j i 1
ji x j . This ends the Backward Substitution phase.
A careful examination of the steps will show that no extra storage is required. The storage locations in A can be used to store
both D and L . Similarly, the storage locations in x can be used to store the elements of y .
Similar to LU Decomposition, Cholesky Decomposition requires no special effort to solve additional RHS vectors since the
factorization and the forward/backward substitutions steps are separate. Once the factorization step is completed (once), the
forward/backward substitutions steps can be repeated as many times as required.
Example 10.2.4
Solve the following set of equations using Cholesky Decomposition method.
3.5120 0.7679 0 0 0 x1 0
0.7679 3.1520 0 2 0 x 2 0
0 0 3.5120 0.7679 0.7679 x 3 0
0 2 0.7679 3.1520 1.1520 x 4 0.04
0 0 0.7679 1.1520 3.1520 x 5 0
Solution
Factorization n 5
i 1 D1 A11 3.5120 .
A21 0.7679
j 2 L 21 =0.21865 . Also, L 31 L 41 L51 0 .
D1 3.5120
A32 L 31D1 L 21
j 3 L 32 0.
D2
A42 L 41 D1 L 21 2 0
j 4 L 42 0.670219 . Also, L52 0 .
D2 2.9841
i 3 D3 A33 L231 D1 L232 D2 3.5120 0 0 3.5120 .
A43 L 41D1 L 31 L 42 D2 L 32 0.7679 0 0
j 4 L 43 0.21865 .
D3 3.5120
A53 L 51D1 L 31 L 52 D2 L 32 0.7679 0 0
j 5 L 53 0.21865 .
D3 3.5120
i 4 D4 A44 L241 D1 L242 D2 L243 D3
A54 L 51 D1 L 41 L 52 D2 L 42 L 53 D3 L 43
j 5 L 54
D4
1.1520 0 0 0.21865 3.5120 0.21865
-0.598724
1.64366
i 5 D5 A55 L251 D1 L252 D2 L253 D3 L254 D4
Forward Substitution Ly = b
1 0 0 0 0 y1 0
0.21865 1 0 0 0 y 0
2
0 0 1 0 0 y3 0
0 0.670219 0.21865 1 0 y 4 0.04
0 0 0.21865 0.598724 1 y 5 0
i 1 y1 b1 0
i 2 y 2 b2 L 21 y1 0
i 3 y 3 b3 L 31 y1 L 32 y 2 0
i 4 y 4 b4 L 41 y1 L 42 y 2 L 43 y 3 0.04
i 5 y5 b5 L51 y1 L 52 y 2 L53 y 3 L 54 y 4 0 0.598724 0.04 0.023949
.. ..
A
n1 Ani Anj Ann x n bn
Note that the modified equations Ax = b still has a symmetric and positive definite coefficient matrix. The only question left
to answer is “What is the suitable value for the large number C ?” A popular choice that seems to work effectively, is to make
the constant a function of the largest element in the coefficient matrix.
C 10 4 max A pq , 1 p , q n (10.2.41)
Example 10.2.5
Modify the following equations
10 5 2 x 1 6
3 20 5 x 58
2
2 7 15 x 3 57
so that the condition x 2 3 can be imposed.
Solution
Using the general procedure listed above, we have the following modified equations.
10 0 2 x 1 6 5(3) 21
0 1 0 x 3 3
2
2 0 15 x 3 57 7(3) 36
These equations can now be solved using LU Decomposition.
Example 10.2.6
Modify the following equations
10 5 2 x 1 6
5 20 5 x 58
2
2 5 15 x 3 57
so that the condition 2x 1 x 3 3 can be imposed.
Solution
Using the procedure listed above, we have the following modified equations.
10 20 104 2 2 5 2 20 10 4 (2)(1) x 1 6 20 104 (3)(2)
5 20 5 x 2 58
2 x
2 20 10 (2)(1) 5 15 20 10 1 3 57 20 10 (3)(1)
4 4 4
Or, simplifying
800010 5 400002 x 1 1200006
5
20 5 x 2 58
400002 5 200015 x 3 600057
These equations can now be solved using LDLT Decomposition.
A sample client code that uses three of the functions from the matrix toolbox is shown below. Development of the remaining
functionalities is left as an exercise – See Problem 10.5.
Two versions of the matrix toolbox – float and double (defined in lines 15-16), are used in the test program. The float version
is used in testing vector addition and matrix-matrix multiplication. The double version is used in testing the solution to
simultaneous linear algebraic equations.
main.cpp
Only the vector addition test is shown here. The loop that contains the three tests starts in line 18. The try block is wrapped
around all the tests so that if an exception is thrown it can be caught and handled before the next test is executed. The vector
addition takes place on line 34 using float vectors of size 3.
The program terminates with the floating-point operation count statistics from the double precision version of the toolbox.
Summary
Matrix algebra forms the foundation of most of numerical engineering analysis. We saw several matrix operations and solution
techniques in this chapter. We will see more solution techniques such as the numerical solution to eigenproblems, solving
ordinary and partial differential equations, numerical optimization and computer graphics in later chapters.
Exercises
Appetizers
Problem 10.1
Solve the following set of equations by hand using LU Factorization. Also compute the residual vector and the absolute and
relative errors.
6 2 2 x 1 2
2 2 3 1 3 x 1
2
1 2 1 x 3 0
Main Course
Problem 10.2
Solve the following set of equations by hand using Cholesky Decomposition. Also compute the residual vector and the absolute
and relative errors.
2.25 3.0 4.5 x 1 1000.0
3.0 5.0 10.0 x 0
2
4.5 10.0 34.0 x 3 500.0
Problem 10.3
Write two functions that help solve Ax = b where A is a symmetric, positive definite matrix stored in the rectangular format.
The function prototypes are given below.
void CholeskyFactorizationBanded (CMatrix<T>& A, const T TOL);
void CholeskySolveBanded (const CMatrix<T>& A, CVector<T>& x);
Problem 10.4
Write two functions that help solve Ax = b where A is a symmetric, positive definite matrix stored in the skyline format. The
function prototypes are given below.
void LDLTFactorizationSkyline (CVector<T>& A, CVector<int>& DLoc,
const int n, const T TOL);
void LDLTSolveSkyline (const CVector<T>& A, const CVector<int>&
DLoc, CVector<T>& x, const int n);
References
Gilberg and Forouzan, Data Structures: A Pseudocode Approach with C++, Brooks/Cole, 2001.
Kruse and Ryba, Data Structures and Program Design in C++, Prentice Hall, 1999.
Clifford Shaffer, Data Structures and Algorithm Analysis, Prentice-Hall, 1997.
Chapter
Regression Analysis
“Ifyourexperimentneedsstatistics,you oughttohavedone abetterexperiment.”ErnestRutherford
“There isgreatcorrelationbetweenmusicandimages.”GrahamNash
Engineers and scientists deal with a large amount of data sometimes involving several variables. The data is used to understand
the relationship between these variables. Often, a model is built that facilitates this understanding and can be used as a predictive
tool. In later chapters, we will see some of these models in use with sophisticated numerical techniques.
In this chapter, the model will be described by simple yet powerful, functions. To understand how well the model captures the
data, a function is created and used. This function is called a figure-of-merit or merit function that measures the agreement
between data and the fitting model for a particular choice of the parameters1. The agreement is good when the value of the
merit function is small. In the process known as regression, parameters are adjusted based on the value of the merit function
until a smallest value is obtained, thus producing a best-fit with the corresponding parameters giving the smallest value of the
merit function known as the best-fit parameters (Press et al. 1992, p. 498).
Objectives
To understand what regression analysis is.
To understand how to fit data to a model.
To understand some of the basics of distinguishing good fits and poor fits.
1 http://mathworld.wolfram.com/MeritFunction.html
y, v
A x
B
B
L
(b)
(a)
Fig. 11.1.1 (a) Experimental setup to measure the tip displacement of a cantilever beam and (b) Model schematics
Placing a few different known weights at the tip and measuring the tip displacement (deflection) generates the data set. A sample
data set may look like (0.22 lb, 0.022 in), (0.44 lb, 0.045 in) and (1.10 lb, 0.11 in). Using the data, one can postulate a reasonable
model of the form
y x ) P a0 a2 P
ˆ( (11.1.1)
since the plot of the load-deflection data (Fig. 11.1.2) suggests a linear relationship.
We will look at one of the simplest examples dealing with the problem.
Example 11.2.1
Consider the data shown below.
x y
1 3
2 4
3 5
4 6
5 6.2188
Let us assume that we wish to fit a linear polynomial ˆ(
y x ) a 0 a1 x using the data. Then the problem can be expressed as
Find a 0 , a1 (11.2.3)
5
f ( a; x ) a 0 a1 x i y i
2
to minimize (11.2.4)
i 1
6 y = 0.8438x + 2.3125
R² = 0.9668
5
4
y(x)
0
0 1 2 3 4 5 6
x
f n
2 y i a 0 a1x i ... a k x i k 0
a 0 i 1
f n
2 y i a 0 a1x i ... a k x i k x i 0
a1 i 1 (11.2.6)
f n
2 y i a 0 a1 x i ... a k x i k x i k 0
a k i 1
i
(11.2.7)
i 1` i 1` i 1`
i 1`
n n n a k n
xk x i 2k xky
i
i 1`
x
i 1`
i
1 k
i 1`
i i
i 1`
The coefficient matrix is a Vandermonde matrix and Eqn. (11.2.7) can be expressed differently as
1 x1 x 12 x 1k a 0 y1
1 x2 x 22 x 2 k a1 y 2
(11.2.8)
1 xn xn2 x n k a k y n
Or, y Xa (11.2.9)
Hence the solution to Eqn. (11.2.7) or (11.2.8) is simply
a (X T X)1 X T y (11.2.10)
In practice, the solution is not obtained as shown in Eqn. (11.2.10) but is solved as (X X)a X y .
T T
Algorithm
Step 1: Obtain the input data and the degree of polynomial, k, to fit the data.
Step 2: Form the coefficient matrix X and the RHS vector y shown in Eqn. (11.2.8).
Step 3: Solve the system of equations (X T X)a X T y to obtain a .
Example Program 11.2.1 Least Squares Polynomial Fit Program
We will develop a program to carry out a least squares polynomial fit given a set of data points. The theory and algorithm are
encapsulated in the CLSF class that has a public member function Fit to carry out the least-squares fit.
main.cpp
The client code for solving data from Example 11.2.1 is shown below. We will leverage the tools developed earlier – the CVector,
CMatrix and CMatToolBox classes. CLSF’s default constructor is used in line 29 followed by the call to the Fit function in line
32.
An error message is shown if a fit is not possible. Otherwise, the output contains details of the fit (see next section for details
of the residual and R2 values). The output from the program is shown in Fig. 11.2.1.
1 1 1.09861
1 2 1.38629
X 1 3 , y 1.60944
1 4 1.79176
1 5 1.82759
0.983709
Setting up and solving (X T X)a X T y yields the following solution, a . Hence
0.186343
y 2.67436e 0.186343 x
Similarly, a logarithmic function can be written as
y A B ln x (11.2.17)
and the problem can be reduced to
1 ln x 1 y1
1 ln x A
2 y2
(11.2.18)
B
1 ln x n yn
Example 11.3.2
Use the data from Example 11.2.1 to fit a logarithmic curve.
Using the data we obtain the following.
1 0 3
1 0.693147 4
X 1 1.098612 , y 5
1 1.386294 6
1 1.609438 6.218876
2.827
Setting up and solving (X T X)a X T y yields the following solution, a . Hence
2.1063
y 2.827 2.1063 ln x
Once the fit is completed, the obvious question to ask is “How good or bad is the fit?” The correlation coefficient (also known
as cross-correlation factor) is a value that describes the quality of a fit for a set of data. In the least square methods, the value of
R 2 is a fraction between 0 and 1 and is unitless. When R 2 equals 0 it means that the curve does not fit the data better than a
horizontal line going through the mean of the data would. When R 2 equals 1 it means that all the points lie on the curve and
there is no scatter.
R 2 is calculated from the sum of the squares of the vertical distance of the points from the best-fit curve, SSreg , and the sum of
the squares of the vertical distance of the points from a horizontal line through the mean of all Y values, SStot . The equations
for SSreg , SStot and R 2 are as follows:
n
SSreg ( y i y )2 (11.2.19)
i 1
n
SStot ( y i y )2 (11.2.20)
i 1
SSreg
R2 1 (11.2.21)
SStot
where, ˆy i are the y i values corresponding to the best fit curve and y is the mean for the given set of data.
Example 11.3.3
y x ) 2.3125 0.8438x . Using the solution
Consider the data analyzed in Example 11.2.1. The solution to the problem was ˆ(
we can construct a new table shown below.
x y ŷ
1 3 3.1563
2 4 4.0001
3 5 4.8439
4 6 5.6877
5 6.2188 6.5315
The R 2 value can be computed as follows.
3 4 5 6 6.218876
y 4.8438
5
n
SStot ( y i y )2 (3 4.8438)2 (4 4.8438)2 (5 4.8438)2
i 1
SSreg 0.244
R2 1 1 0.9669
SStot 7.36
Summary
We looked at the basics of data modeling in this chapter. There is much more to this topic than what has been discussed in this
chapter. Other closely associated topics include statistical analysis and design of experiments. Engineering and the sciences move
forward when experiments are performed to understand a topic that is then usually followed by construction of one or more
models that can then be used in lieu of experiments, as a predictive analysis or design tool.
Exercises
For each of the following problems use the CLSQ class used in Example 11.2.1. It may be useful to modify the class
to accept a pointer to the function(s) that computes the
Appetizers
Problem 11.1
Consider the following data points (2.08, 1.45), (2.30, 2.85), (3.01, 2.15), (4.71, 4.74) and (5.50, 7.73). (a) Fit a polynomial function
and compute the R2 value. (b) Fit an exponential function and compute the R2 value.
Problem 11.2
Consider the following data points (0.5, 4.2), (1.0, 4.4), (3.5, 5.8), (5.5, 7.0), (7.5, 8.6), (8.5, 9.5), (10.0, 10.5) and (12.8, 15.0). (a)
Fit a polynomial function and compute the R2 value. (b) Fit an exponential function and compute the R2 value.
Problem 11.3
Consider the following data points (0.5, 9.7), (1.0, 10.05), (3.5, 10.6), (5.5, 10.8), (7.5, 10.9), (8.5, 11.1), (10.0, 11.15) and (12.8,
11.2). (a) Fit a polynomial function and compute the R2 value. (b) Fit a logarithmic function and compute the R2 value.
Main Course
Problem 11.4
Derive the solution to least squares fit using the function y ( x ) a1 x a 2 by linearizing the function.
Problem 11.5
Using the following data (1, 2.45), (2, 3.08), (3, 3.47), (4, 3.79), (5, 4.05), (6, 4.28), (7, 4.48) and (8, 4.67) fit the function
y ( x ) a1 x a 2 . How good is the fit?
References
Press, Flannery, Teukolsky and Vetterling, Numerical Recipes in C, Cambridge Press, 1988.
Kottegoda and Rosso, Applied Statistics for Civil and Environmental Engineers, Blackwell Publishing, 2008.
Walpole, Myers, Myers and Ye, Probability & Statistics for Engineers & Scientists, 2002.
Montgomery, Design and Analysis of Experiments, John Wiley & Sons, 2013.
Chapter
File Handling
“Torturethedata,anditwillconfessto anything.”RonaldCoase
“Youmustbethechangeyouwishtoseeintheworld.”MahatmaGandhi
Almost all computer programs require some input from the user and they generate some output. In programs that are graphical-
user interfaced (GUI), the input is via the keyboard, mouse or even an external file. The output is shown graphically or as a
report either on the screen, or in the printed form, or stored in an external file such that it can be viewed later. There are
innumerable occasions when data need to be transported from one program to another or data created by a program need to
be retained even after the program has finished execution. Typically, the data under these scenarios, are stored in an external file
on a computer’s hard disk. If the contents of the file can be viewed meaningfully by a text editor1 then the files have text data
in them. Otherwise, the files are holding binary data. In this chapter, we will see how to handle files and manipulate data in them
using C++.
Some of the C++ file handling concepts requires an understanding of advanced object-oriented concepts such as inheritance.
However, the value of file handling takes precedence at this stage. We will see the advanced OO concepts, e.g. inheritance, in
Chapter 13 when we will be in a better position to understand all the nuances of the C++ file handling classes.
Objectives
To understand what are external files.
To understand how to read the data from an input file.
To understand how to write data to an output file.
1 Wordpad© or Notepad© are examples of Windows text editors. vi or emacs are examples of Linux text editors.
istream ostream
fstream
Fig. 12.1.1 Class hierarchy
In this section, we will see how to read data from an external file and how to output information into an external file. Note that
files are either text (meaning that they can be viewed in a text editor such as Windows Notepad etc.) or binary (meaning that
the information is stored as 0’s and 1’s whose meaning can be interpreted by the program reading and writing the data). Text
files are usually be accessed sequentially. Random access can take place with binary files.
First, we need to create an object associated with the ifstream class. The general syntax is the usual syntax associate with the
declaration of a class-related object.
ifstream object_name;
For example,
ifstream FileForInput;
FileForInput is the variable or object. Once the object is declared, it can then be used to open the file using the open member
function. For example,
FileForInput.open ("DataSet1.dat", ios::in);
opens a file called DataSet1.dat for reading only. The member function open has two parameters. The first is the character
string that contains the name of the file (e.g., DataSet1.dat) and the second defines the openmode value. The openmode value in
this case is ios::in that indicates that the file exists and is being opened for reading only. Once the file is opened, the references
to the file do not take place with respect to the file name. Instead, the object name is used for the rest of the program as an alias
for the file name.
When the file is no longer needed, it must be closed before the program terminates using the close member function. For
example,
FileForInput.close ();
closes the file that was opened earlier.
To read the data from the file, we can use the >> operator. For example, if we wish to read an integer followed by a float from
the file, using the example discussed above, we would have the following statement.
FileForInput >> nIntV >> fFloatV;
where nIntV is an int variable and fFloatV is a float variable.
12.2.2 Opening a text file for writing
A file must be opened first before data can be written to it. To open a file, one must know the name of the file. A file can be
opened for writing so that (a) it appends data to an existing file, (b) it overwrites an existing file, or (c) it creates a new file. The
header file and the accompanying statements are as follows.
#include <fstream>
using std::ofstream;
First, we need to create an object associated with the ofstream class. For example,
ofstream FileforOutput;
declares FileForOutput as an ofstream variable or object. Once the object is declared, it can then be used to open the file. Two
examples are shown below.
FileforOutput.open (DataSet1.out, ios::out);
FileforOutput.open (DataSet1.out, ios::out | ios::app);
In the above examples, ios::out indicates that the file is created if it does not exist or if it exists, the contents are destroyed.
Similarly, ios::out | ios::app indicates that if the file exists, then new information must be appended to the end of the file.
When the file is no longer needed, it must be closed before the program terminates using the close member function. For
example,
FileForOutput.close ();
closes the file. It is possible that you may not see the expected contents in a file if you do not close the file. This is because the
contents are buffered. The contents are put in a special location and are written to the file only if the buffer is full or the close
member function is invoked. The member function flush can be explicitly called to ‘flush the buffer’; flush is automatically
called by the close function.
To write data to a file, we can use the << operator. For example, if we wish to write an integer followed by a float on a single
line, using the example discussed above, we would have the following statement.
FileForOutput << nIntV << " " << fFloatV << "\n";
Tip: Text files are created and accessed sequentially. If a file is being created, the contents are first written to line 1, followed by
line 2, and so on. A new line is created only if the newline character is output to the file. One cannot go back and rewrite a
specific line in a text file. Similarly, if a file is being read, the contents are read sequentially starting at the beginning of the file.
To read a specific line that is not the first line in the file, one must skip the previous lines by reading the contents and discarding
them, before reading the required line.
fail: This member function is used to test if the stream operation has failed or not by accessing the failbit or badbit value.
The fail function returns true if the last stream operation was unsuccessful. For example, to see if the open function worked
or not, we could have the following code.
FileForInput.open (DataSet1.dat, ios::in);
if (FileForInput.fail())
{
std::cout << “Could not open specified output file.\n”;
// take appropriate action
…
}
eof: This member function is used to test and see if the end-of-file condition has been reached by accessing the eofbit value.
A file cannot be read beyond its end. Here is an example code that repeatedly reads an integer from a file until there is no more
input.
int nV;
while (!FileForInput.eof())
{
FileForInput >> nV;
// do what’s appropriate with the input
…
}
Note that the end-of-file character is a special character that is not visible on most text editors. The computer’s file system
automatically adds this marker to every file.
clear: This member function clears all of the standard input status flags. For example, to be able to read a file once again after
the end-of-file is reached, one can use the code shown below.
FileForInput.clear();
The overloaded form of this function is clear (state) that clears all and sets the state flag.
good: This member function checks to see if the stream is OK by accessing the goodbit value.
bad: This member function checks to see if a fatal error has occurred by accessing the badbit value.
rdstate: This member function returns the currently set flags. The following example checks to see if the failbit is set and
clears it if necessary.
if (FileForInput.failbit())
{
FileForInput.clear (FileForInput.rdstate() & ~std::ios::failbit);
}
In line 13 the required header file <fstream> is included. For opening both the input and output files, a do loop is used that
repeatedly executes until the file is opened successfully. Note the way the input file name obtained from the user is passed to
the open function (lines 37 and 59). The open function does not permit a std::string object; instead it expects to see a char
string. The std::string member function c_str() provides the address of the location where the string is stored. Also note
that the clear member function is called if for some reason the file cannot be opened.
The reading of the input file starts with the for loop in line 82. Each input line is read in two parts. First, the point number is
read (line 84) followed by the statement to read the coordinates and the temperature on line 86. After both reads, the end-of-
file condition is checked using the eof member function.
The data is immediately written to the output file (line 89). The formatting statements (using setiosflags and std::setw
functions) are similar to those that we saw before. They ensure that the column headings and the column data are properly
aligned. Finally, the close member functions are called to close both the input and output files (lines 99-100).
Tip: The >> operator will read past leading and trailing blanks as well as newline character to get to the next input. In other
words, one can reform the sample data file for the program as follows.
1
12.0
‐45.6
33.3
2 16.0 45.6 88.6
Error checking mechanism is not very robust in the program. For example, omitting even a single value will lead to erroneous
results. We will develop a much, much more robust and versatile way of reading a text input file as we progress through this
chapter.
This function assigns the next character to the passed argument and returns the stream. The state of the stream indicates if the
read was successful. Here is an example.
char c;
FileForInput.get(c);
This function reads up to count‐1 characters into str. However, the reading is terminated if the next character to be read is the
newline character (\n) that is the default or the delim character. The terminating character is not read. The state of the stream
indicates if the read was successful. Here is an example.
const int MAXCHARS = 256;
char szUserInput[MAXCHARS];
FileForInput.getline(szUserInput, MAXCHARS);
As we have seen before with other objects, stream objects should be passed as references.
A typical line in the file is either a line that starts with Rem: (a remark or comment that should be ignored) or is a blank line (that
should be ignored) or is a data line. The data line contains three fields – the city name, the date of travel and the current odometer
reading.
main.cpp
The fundamental idea in this approach is to read the entire line one at a time and then parse the input based on what we expect
to see on a typical line. Reading one line at a time is done in the function ReadNextLine. The input is read into character string
strInputString using the getline function (line 19). Error checks are carried out immediately (lines 22-28). If an end-of-file
condition is reached, a true value is returned, and reading of the file is terminated. If a fatal error is encountered, the function
throws an exception. Otherwise, a false value is returned to indicate that the end-of-file condition was not detected.
The main program starts by opening the database file, dbfile.txt, a text file for reading. Five variables are used to facilitate
reading the file – lines 47-48 store the data from each line while the contents of line are stored in strInputLine and nLine is
used to track the current line number.
The ReadNextLine function is called in the for loop (line 51) until a true value is returned because the end-of-file condition is
detected. It is assumed that each line will not have more than 255 characters. Once the contents of a line are captured in the
strInputLine, we determine if the input line is a remark line or a blank line by checking to see if the first four characters are
Rem: or if the length of the input string is zero, respectively. If both these conditions are not satisfied, then we can assume that
the input line contains data. The istringstream object is then used to read (or parse) the three input fields. First, in line 61, the
contents are copied into the istringstream object, strFormatString. Next, in line 62, the actual reading is done using the now
familiar >> operator. The read contents are displayed on the screen. Finally, if the end-of-file condition was detected in line 53,
the break statement is used to exit the for loop. The error states detected and thrown in the main program and the functions
are caught and the error message displayed in lines 73-89. Since a character string is thrown with each error, the first catch block
is used to display those errors.
where BinaryFile is a fstream object. This form of the open statement opens the file for both reading and writing in a random
manner. Few things to note about random access files.
(1) Data cannot be read unless data is written first to the file.
(2) The file needs to be positioned first before reading from or writing to that location. It is possible to overwrite the
contents at the specified location.
(3) The IO operations take place in bytes using char data type and type casting is done during the function calls.
The following functions are used for reading and writing. The letter g is used to signify get and the letter p is for put.
The appropriate header files are shown in lines 10-12. Files that are used for reading and writing must be associated with the
fstream class. This declaration takes place on line 18 and the file is opened on line 19. Note the multiple openmode flags that are
used. The trunc flag is needed if the file does not exist or if the file exists and the contents can be overwritten. The trunc flag
should be omitted if the file exists and the contents need to be reused.
The loop starting at line 29 is used to interactively obtain the point data from the user. The nRecords variable keeps track of
how many point records have been created. It is then used to compute the location on the file where the reading or writing of
the point data is to take place. The corresponding computations are on lines 37 and 47. In line 237, the seekp function is used
to position the file before writing. Next the write function is used to write the data. There are two arguments to the function –
the first argument is const char* type and the second argument is std::streamsize (effectively an integer). The
reinterpret_cast<const char*> type casting is used to convert the data into const char* type before writing.
Similarly, before reading the data back, the seekg function is used to position the file. The read member function is called to
read the data. Once again type casting needs to take place and is done using the reinterpret_cast<char*> construct. Finally,
the file is closed in line 54. One should note that C++ generates instructions to store the data associated with any object in
contiguous memory locations. That is the reason why writing or reading sizeof(CPoint) bytes works properly and one does
not have to write or read the tag, the x coordinate and the y coordinate values individually.
The overloaded friend function in the CPoint class is shown below. Line 160 is used to prevent identification tags that are
invalid including longer than 10 characters (CPoint::MAXIDTAG) to be stored.
The strcpy_s function is used to copy from a std::string object to a char string variable and back.
17 16 15 14 13
7 10
5 12
10.0
4 9
6 8 13
11
18 6 12
5 7
3 4 8 14
2 3 9 15
16
1
2.0
1 2 10 11
8.0 4.0
C2DSurface
CFileIO
CParser
CPoint CTriangle
CVector
Fig. 12.3.4. Class diagram showing solution used in Example Program 12.3.3
The primary focus will be on the CParser class. Features of this class will be used to read the input file and provide the parsed
data to the C2DSurface data for data validation and storage. The point and triangle data will be stored as CVector objects. We
will start by looking at the main program.
main.cpp
The one and only C2DSurface object is declared in line 17 and the program execution is a simple two step process. In the first
step (line 21), the input file is read. Once it has been successfully read, the next step is to compute the properties of the discretized
planar surface that in this example is to just compute the total area as the sum of the areas of all the triangles. If an error is
thrown in either step, it is caught, and a generic error message is displayed. It is expected that a more detailed error message
would be displayed closer to where the error is first detected, e.g., in the C2DSurface class.
2DSurface.h
Next, we will look at the ReadData member function in the C2DSurface class. The variable CurCommand is used to hold the current
command handled by the program. Every line in the input file is broken into tokens using the delimiter character and stored as
character strings in strVTokens. For example, if the input line is triangle, 4, 18, 4, 17, then the parser detects 5 tokens
(stored in nTokens) and each token (triangle, 4, 18, 4, 17) are stored as strings in strVTokens[0]…strVTokens[4]. The CParse
object Parse is declared in line 31 with the first argument as the delimiter character(s), the second argument as the character(s)
that start a comment line, and the third argument to specify if all the input is converted to lowercase characters. To store if the
end-of-file has been reached, the variable bEOF is used when the parser is called. The CFileIO object is used in lines 35-36 to
open the input file. In other words, the user is prompted repeatedly for the input file until the OpenInputFilebyName member
function is able to open the file.
Lines 39-41 are local variables that are used to store various pieces of input data – point number, triangle number, x and y
coordinates, and the list of point numbers that define a triangle.
The reading of the input file starts in line 44 and ends in line 125. The parser’s member function GetTokens is called in line 47.
The input argument is IFile (see line 34), and the function returns the current line number, the tokens in the current line
number, the number of tokens in the line, and if the end-of-file was read in the current line. If the current line is empty and has
the end-of-file character (line 48), the program exits (line 49) the for loop. The next seven sections of the code (if …else if)
deal with how to handle each command - *model, *point, *polygon, and *end.
As soon as the *end line is read, the break statement (line 68) is used to exit the for loop. Each model data contains the number
of points and number of triangles (lines 72-83). Hence, the number of tokens (nTokens) needs to be two with the number of
points stored at m_strVTokens[0] and the number of triangles in m_strVTokens[1]. To help get the integer values from these
strings, the parser has member function GetIntValue (with return type bool) that can be used. An error is thrown if the return
value is false (lines 76, 79). If the values are legal, memory allocation for the two vectors, m_PointData and m_TriangleData,
(lines 84-85) are made by calling the SetSize member function.
Each line containing point data has three pieces of input (lines 89-98) – the point #, x-coordinate and y-coordinate. Hence, the
number of tokens (nTokens) needs to be three with the point # stored at m_strVTokens[0], x-coordinate in m_strVTokens[1]
and the y-coordinate in m_strVTokens[2]. To help get the integer and float values from these strings, the parser has member
functions GetIntValue and GetFloatValue (with return type bool) that can be used. An error is thrown if the return value is
false. If not, the values are stored in the m_PointData vector (line 99).
Reading of the triangle data is done very similarly (lines 103-118). If there are no errors, the values are stored in the
m_TriangleData vector (lines 119-121).
An input line that does not conform to the input file format is detected and dealt with in line 124. Upon successfully reading
the input file, the summary statistics (lines 128-130) are shown on the screen.
Finally, the function that computes the total area is shown in lines 153-160.
The reader is encouraged to edit the input file, create potentially erroneous input, and see how the program detects and handles
the error.
Finally, it would be helpful to make a list of potential improvements that can be made to the program in terms of program
architecture as well as programmatically.
(1) Fig. 12.3.4 is key to understanding how the key components store and access the data. Duplicate data is stored as the
same CPoint data is also stored in the CTriangle class. While some duplication is healthy, storing several copies of a
piece of data can lead to excessive storage requirements.
(2) This program creates the data but does not edit the data. For example, what would happen to the triangle data if a
point is moved? Deleted?
(3) How easily can the capabilities of the program be enhanced if the surface is discretized with other planar shapes such
as quadrilaterals?
(4) Error checking of the model data can be vastly improved, e.g., have all the points and triangles been defined, are there
any duplicates? are the three vertices of the triangle distinct?
(5) A careful examination of the given data (Fig. 12.3.3(a)) would show that the surface area is
42
(10)(14) 114.867 units. Comparing this value with the one shown in Fig. 12.3.6 shows that the
2
discretization error is about 2.5%. Can the reader spot the discretization error in Fig. 12.3.3(b)? How can the
discretization error be decreased?
Summary
The ability to manipulate information stored in external files opens a lot of doors for us. Rarely do we have a single piece of
software taking care of all our needs. Data needs to be taken as input, manipulated using a numerically-based algorithm, and
finally exported to use by other programs.
In this chapter we saw how files can be opened and closed, how text and binary data can be read from and written to files in a
dynamic manner and how C++ provides functions to ensure that these are taking place in a safe and efficient manner. In later
chapters, we will have several opportunities to use these concepts for solving practical problems.
2 https://en.wikipedia.org/wiki/Big_data
3 By Myworkforwiki - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=29452425
Exercises
For the following problems, where appropriate use the CVector and CMatrix template classes to store the data.
Design and implement appropriate classes to store and manipulate the data. In other words, you need to implement
an object-oriented solution to each problem.
Appetizers
Problem 12.1
The program is required to create data to plot a cubic polynomial y ( x ) a bx cx 2 dx 3 . The user interactively supplies
the values of the 4 coefficients, the starting value of x , the ending value of x and the increment to be used. Write a program
that will create an input file (say, comma separated file) for Microsoft Excel so that the graphing features in MS Excel can be
used. Name this file P12‐1.csv.
Main Course
Problem 12.2
Look at the statistical functions defined in Problem 4.11. Create a template class, CStatPak, that has the functionalities defined in
the problem. Note that the class definition shown below is incomplete.
template <class T>
class CStatPak
{
public:
T StatMean (const CVector<T>& fV);
T StatMedian (const CVector<T>& fV);
T StatStandardDeviation (const CVector<T>& fV);
T StatVariance (const CVector<T>& fV);
T StatCoVariance (const CVector<T>& fVA,
const CVector<T>& fVB);
private:
};
Now write a non-member function (that is called from the main ()) whose prototype is as follows.
int ReadInput (std::ifstream& InpFile, CVector<double>& dVStats);
The function should read the data from the file associated with InpFile and compute the statistical values using the CStatPak
class. The results from the statistical analysis should be stored in the dVStats vector with the mean stored in the first location,
followed by the median, standard deviation and finally variance. The return value is 0 if no error was encountered and 1 if an
error was encountered. The input file has the following format.
Line 1: Number of data values, n
Line 2: First value
Line 3: Second Value
…
Line n+1: Last value
Carry out sufficient error checks to make this function robust. Create the template statistical functions in file statpak.h and the
non-member functions in file readinput.cpp.
Problem 12.3
Write a program that will analyze experimental data and compare the results with an analytical model. The program will read
data from an input file. It will then filter the data as per the specifications. It will compare the input to the theoretical value and
then create the output file properly formatted. The sequence of events in the program is as follows.
1. Ask the user for the input file name. Open the input file.
2. Ask the user for the output file name. Open the output file.
3. Read the input file one line at a time.
Filtering the Data: The displacement values (in inches) are expected to be between 0 and 10, and the load values (in pounds)
are expected to be between 0 and 10000.
Output File Format: Create the output file that displays the data in tabular form as follows and using the filtered data, compute
the area under the load-deflection curve.
Name:
Date:
Filtered Data
Data Point Load (lb) Experimental Displacement Theoretical Percent Difference
(in) Displacement (in)
Bad Data
Data Point Load (lb) Displacement (in)
Area under the load‐deflection curve = xxxx lb‐in (Load on the y‐axis and deflection on the x‐axis).
Percent Difference
Analytical Experimental
%Difference 100
Analytical
Problem 12.4
It is required to write a computer program to compute the amount of earthwork or excavation necessary at a site. The site (in
plan view or x-y plane, Fig. P12.4) is divided into a regular grid with the grid spacing at 3 feet in both directions. The contractor
using surveying techniques has created an input file that contains the (x,y,z) values of these grid points.
113 126
16
15 28
x
1 2 14
Fig. P12.4 Sample grid (all grid points are not labeled)
The program flow is expected to be as follows (you must use dynamically defined matrices to store the data).
Ask the user for the input file name. Open the input file.
Ask the user for the output file name. Open the output file.
Read the input file one line at a time and store the data.
Interpolate to fill missing values resulting from bad data (zero elevation). Interpolation is to seek the average value of all adjacent
(good) grid points.
Adjacent points are the immediate next points along the row and column. Write a function to implement this task.
Compute the average elevation and excavation/earthwork quantity.
The volume of a grid of heights h1 , h2 , h3 , h4 can be taken as ( plan area )( havg hex ) where havg is the average of the four values.
Write a function to implement this task.
Create the output file. Write a function to implement this task.
Terminate the program.
Input File Format: The input file contains the following data on the first line.
Number of grid points in the x direction(blank)Number of grid points in the y direction
The second line contains the excavation elevation value.
Z coordinate representing the excavation elevation (this is hex )
Then it contains the (x,y,z) coordinates of the grid points rowwise.
X coordinate(blank)Y Coordinate(blank)Z Coordinate
X coordinate(blank)Y Coordinate(blank)Z Coordinate
Coordinate values are in feet.
An example of the input file:
5 4
8.1
0.0 0.0 12.1
3.0 0.0 12.5
6.0 0.0 11.9
9.0 0.0 11.5
12.0 0.0 11.4
0.0 3.0 12.6
3.0 3.0 12.4
6.0 3.0 12.3
Name:
Date:
INPUT DATA
Grid Point X Coordinate Y Coordinate Z Coordinate Status
1 Good
2 Interpolated
3 Good
EXCAVATION DATA
Grid Points Excavation (ft^3)
1‐2‐6‐7
C++ Concepts
Problem 12.5
Write a computer program to handle a cross-sections database. The database supports circular, hollow circular, rectangular and
hollow rectangular cross-sections. Devise a scheme for identifying each cross-section in the database with a unique identification
tag. The computed properties to be stored in the database include cross-sectional area and the two principle moments of inertia.
Internally, the program should be able to store the data in a pre-defined set of units. However, it should allow the user to select
the units at run time. The computer program should support the following top-level commands.
units: Used to specify the length units.
create: Used to create information in the database.
edit: Used to edit information already in the database.
find: Used to find the cross-sectional properties of a cross-section already in the database.
search: To find the closest cross-section that meets a specified search criterion.
delete: Delete the cross-section from the database.
Create and store the database file as a binary file whose life extends beyond the single execution of the program.
Problem 12.6
C++ does not have a function to check if a file exists. Develop a function with the following prototype.
bool FileExists (const char* filename);
that will return true if the file exists, false otherwise. Will this function work on all operating systems?
13
Chapter
“Whenaskedhowmucheducatedmenweresuperiortothoseuneducated,”Aristotleanswered,“Asmuchastheliving
aretothedead.”.” DiogenesLaertius
We were introduced to fundamental object-oriented concepts in Chapter 7. We learnt about classes and objects. We learnt how
to develop and write server code. We also learnt how to develop and write client code where objects would be declared and
manipulated. As we saw in Chapters 8 and 9, object-oriented concepts help us define and use classes in a powerful yet safe
manner. In this chapter, we will first see the process that leads to the development of a computer program. Second, we will see
the concepts associated with inheritance such as polymorphism. What this means is that through the process of data abstraction, we
will be able to enhance already available capabilities from existing classes by building specialized classes on top of them. Finally,
we will see the concepts associated with functors and suggest how they can be used in writing user-defined functions commonly
used in numerical analysis.
Objectives
To understand the basics of software engineering.
To understand how to leverage already developed classes and build additional functionality.
To understand the concepts of polymorphism.
To understand the concepts associated with functors and use them as a tool in writing user-defined functions for numerical
analysis.
(b) These W36x925s are the heaviest hot-rolled wide-flange sections in the world. They are over 43 in. deep with 4.5 in.-
thick flanges and a 3-in. web. This 60-ft-long beam is roughly the size of a humpback whale, weighing approximately
27.75 tons. (https://www.aisc.org/modernsteel/news/2015/may/steel-shots-mega-steel/)
Fig. 13.1.1 Wide-flange beams used in a steel structures
This design problem is typical of engineering design problems – off-the-shelf (OTS) solution or custom solution exists that can
be used in a (numerical) model to check if the required performance requirements are met (see Fig. 13.1.2). Among the very
many tasks in obtaining a realistic solution is one where the challenge is to find a “best” solution.
Model Performance
Requirements
OTS or Custom
Solution
Fig. 13.1.2. Typical solution blocks for an engineering design problem
With reference to the given problem, the OTS solution consists of commercially available wide-flange beams, the performance
requirements are given in Eqns. (13.1.1) and (13.1.2), and the model used to obtain the largest bending moment, the largest
tensile force, and the largest compressive force that the beam is subjected to, is not discussed here (see the author’s book on
structural analysis and design). There are several similar examples – design of an air conditioning system for a room in a building
using off-the-shelf air conditioners, design of a piping network for a small factory using commercially available pipes, etc.
We will see models for solving engineering problems described via partial differential equations in Chapter 15.
Solution: Analyzing the problem statement, we can identify the following nouns or noun clauses.
Beam I-Section User Input
Largest Moment Tensile Force Compressive Force
Lightest Cross-section Cross-sections Database
The next step in the process is to identify the different entities that describe the system. A closer examination of these nouns
will show the following. The I-Section is in fact the beam cross-section that is the end product of the design process. All the
different cross-sections are in a Cross-sections Database. The User Input contains the Largest Moment, Tensile Force and
Compressive Force. Once we recognize how to use the user input, we can generate the criterion to locate the Lightest Cross-
section.
The distinct entities can now be captured in classes. We could use a class CISection to store the properties of a typical I-section
such as the cross-sectional area, moment of inertia, etc. We could store the properties of all the available I-sections using a class
CXSDatabase. The class would allow access to all the individual I-sections, provide the mechanism to add new sections, or to
delete sections that are no longer manufactured. Finally, to allow the selection of the lightest cross-section from the I-section
database, we could define and use CXSSelector class.
Let us review our plan of action to assure ourselves that these classes are the major classes to achieve our objectives. The
program would use the CXSDatabase class to load all the information about existing I-sections. The program would then ask the
user for the input and based on the values compute the ‘smallest’ properties of the I-section necessary to meet the strength
requirements. Then the CXSSelector class would be used to find the lightest I-section (we will assume that the I-section database
is arranged in order of increasing weights). Finally, the answer (the properties of the I-section) will be communicated to the user
using the CISection class. The responsibility of each class is to define what each class is capable of doing – identify the attributes
and behavior through member functions and variables. Sometimes, achieving these objectives requires the help of other classes
– the helper classes. The Class Responsibility Collaborator (CRC) cards describing the classes are shown in Fig. 13.1.3.
Responsibilities: Helpers:
Responsibilities: Helpers: know all I-sections CISection
know properties std::string allow access to individual I-section std::vector
allow access to properties add new sections std::ifstream
remove existing sections
(a) (b)
Responsibilities: Helpers:
get an I-section CISection
CXSDatabase
(c)
Fig. 13.1.3 Program classes described using CRC cards
Using the CRC cards as a guide, we can start writing the code by defining the class members and variables. The CISection class
is described first.
isection.h
As usual we define three constructors (lines 13-16), destructor, accessor, modifier, and helper functions. There are five member
variables declared in lines 32 through 36. The identification tag for the section shown in Fig. 13.1.1 is W36x925. While there are
many more properties that are available from the manufacturers and used by design engineers, we store the values of cross-
sectional area, the section moduli about the y and z axes, and the weight per unit length.
Next, we will look at the class that stores the cross-section data – the CXSDatabase class. The database is stored in a std::vector
object m_listofISections declared in line 26. I-sections can be added and deleted one at a time. As the names suggest, the Add
and the Remove modifier functions add one I-section to the cross-section database and remove one section. The member variable
m_nSize stores the current number of I-sections in the database that is accessed via the std::ifstream object, m_IFile. The
function GetOne obtains the CISection object that is at the ith location in the database, i.e., is at the ith location in
m_listofISections.
xsdatabase.h
The next class to develop is the CXSSelector class that is used to implement Eqns. (13.1.1)-(13.1.2) using the user-supplied data
and select the lightest I-beam that meets the performance requirements via the public member function GetXSection.
xsselector.h
We seem to have defined all the major classes. But have we? How do we obtain the user input? How do we process the user
input? One way is to define a new class CWizard that would solicit the required input from the user, process the input, and
display the results. This new class is shown in Fig. 13.1.4.
Responsibilities: Helpers:
know user input CISection
process user input CXSDatabase
display the results CXSSelector
Our initial design appears to be complete. Once all the classes are identified and defined, it is necessary to build a blueprint or a
software architecture. The blueprint can then be used by a software developer to “put the pieces together” so as to build and
test the software system.
Tip: Objects such as the different I-sections need identification. Sometimes the identification needs to be unique. For example,
every person in the US has a unique identification that is the Social Security Number. The attributes that are associated with the
identification constitute the key. In other words, the key is the identification tag. The data type associated with the key plays an
important role in the manner in which the object is stored.
In the context of this exercise, we should recognize that there may be tens and hundreds of I-sections. How do we differentiate
one I-section from the next? This is usually done through the section identifier, e.g., W36x300. In the context of the class
definition, the variable m_strID is the key. We will see more about keys and classes later.
Object-Oriented Testing: As we can see even with the simple example discussed in this section, the chances of making errors –
modeling, design and coding errors, are plentiful.
Once the initial blueprint is completed and the coding phase has started, testing should begin immediately. First, the classes can
be tested individually. Then collaborating classes can be tested. In keeping with this trend, subsystems formed by several
collaborating classes can be tested. If it is found that the current state of the model and design is incomplete, incorrect, or
extraneous, the modeling and design may have to be redone.
Example Program 13.1.1 Testing the CISection class
We will develop the first program to test the CISection class.
The listing of main program (client code that carries out unit testing) is shown below. In the include file is used in line 7. Three
distinct I-sections are created as three objects – First, Second and Third by declaring them in lines 11, 14, and 17. The
overloaded constructor is used for the First and Second objects while the modifier function is used for the Third object. To
check if the values in the object are correct, the Display function is used. Finally, the Fourth object is created in line 21 using
the copy constructor. The testing program can be improved since all the functionalities are not being tested – the GetProperties,
the overloaded SetProperties functions are skipped.
main.cpp
A sample database text file (xsdatabase.dat) is shown below containing just three I-section data.
W21X201 59.2 461.0 86.1 201.0
W36X300 88.3 1110.0 156.0 300.0
W44X335 98.3 1410.0 150.0 335.0
In the constructor, the database file is opened, and the data is read one line at a time (line 28). A CISection object is modified
in line 31 with the read data and a new section is added to the database in line 32. We are now ready to test the class.
main.cpp
Note how the CISection class previously developed is included in this test program in line 8. When line 14 is executed the cross-
section database with the 3 cross-sections is loaded into the DBISection object. In line 17, a fourth cross-section is added. In
line 19 the first cross-section is obtained, and the properties displayed in line 20, whereas in line 22, the last cross-section is
obtained, and the properties displayed in line 23. Sample execution details are shown in Fig. 13.1.4.
Observation: Software development is an evolutionary process. The specifications, classes, and program architecture are likely
to change over time. It is important to integrate the testing and prototyping process early in the software development cycle.
13.2 Inheritance
Object-oriented methodology provides building blocks. One such building block process is known as inheritance in which a
new class called the derived (or child) class is created by building that class on top of an existing class called the base (or parent)
class. The derived class then has all the member variables and ordinary member functions from the base class and can, in
addition, define more variables and functions. Let us understand this concept.
In the earlier section, we saw the use of beams with an I-shaped cross-section. Clearly humans have invented other shapes that
can be used. Fig. 13.2.1 shows five common cross-sectional shapes that are used as structural members. All the five cross-
sectional shapes have the same set of (derived) attributes – cross-sectional area, moments of inertia, section modulus, and other
similar properties. However, they are described differently. For example, a circular cross-section (Fig. 13.2.1(b)) is defined in
terms of a single attribute – radius, r whereas a rectangular hollow section (Fig. 13.2.1(c)) is described in terms of four attributes
h, b, th , tb .
z z
y
h y
w (b)
(a)
z z
t
ri
h y
y
tb
th
b
(c) (d)
z
ft
wh
y
wt
fw
(e)
Fig. 13.2.1 Suite of cross-section types (a) Rectangular solid (b) Circular solid (c) Rectangular hollow (d) Circular
hollow (e) Symmetric I-section
Inheritance provides the means of enhancing the capabilities of existing classes. If a new class is to be defined and an existing
class is available, the new class can inherit the properties from the existing class. For example, we can define a base class, CXSType,
for all cross-sectional shapes. This class would store and make available such things as the cross-section identification and the
sectional properties. We could then define derived classes for different shapes such as CISection for I-sections and CCircSolid for
circular solid cross-sections. If later, we need to add a new shape – rectangular solid, we could simply define a new derived class
inheriting the properties of the base class, CXSType. The inheritance diagram for the cross-sections is shown in Fig. 13.2.2.
CXSType
CRectHollow CCircHollow
Fig. 13.2.2 Inheritance diagram
In this example, the base class contains what is generic to all cross-sectional shapes. Base classes are sometimes called abstract
classes since base class objects are usually not constructed. CXSType is an abstract class since in a program we will not define an
object directly tied to this class. However, we will define and use the constructor to store pertinent data associated with the class
– the cross-section identification, the cross-sectional properties such as area and so on, and the cross-sectional dimensions (for
example, height and width for a rectangular section). The base class also provides the generic accessor and modifier functions
for both the cross-sectional properties and dimensions. What the base class cannot provide are the functions for computing
the cross-sectional properties using the cross-sectional dimensions, and displaying the cross-sectional dimensions. This is
because these are dependent on the cross-sectional shape that the base class does not know. Each cross-section has the following
attributes and properties.
(1) An identification tag, e.g., W36x300.
(2) The cross-section area.
(3) The section modulus about the (local) y and z axes.
(4) The number of dimensions associated with the cross-section and the dimension values stored in a vector.
This information can be stored for every cross-section in the base class. However, the base class has no way of knowing how
to compute the cross-sectional properties that must be computed separately for each cross-section type. For the sake of
simplicity, we will ignore the issue of units and assume that the values are stored in consistent units (e.g., inches). The base class,
CXSType, is shown below.
xstype.h
Note the use of protected qualifier. Earlier we saw two other qualifiers – public and private. This protected qualifier enables
only the derived classes to access these member variables and functions of the base class. Using private would preclude that
possibility. Using this base class definition, we can define the derived class. The derived class, CISection, is shown below.
isection.h
With this definition, the CISection class inherits all the non-specialized public member functions in the CXSType class such as
DisplayProperties(), and GetDimensions(), and the member variables m_strID, m_fArea, etc. The specialized member
functions of a class such as the constructor, the destructor, the copy constructor and the assignment operator = are not inherited.
A derived class can also have a member function with the same name and argument types as the base class. This is called
redefining3 the inherited member function. In this example, the DisplayDimensions() function is redefined. As we will see
later, the context in which the use (in the client code) takes place determines which function (base or derived) gets called.
Note how the derived class is defined. There is a colon after the derived class name followed by the keyword public and the
base class name. As mentioned earlier, the constructor in the base class is not inherited. When the derived class object is
instantiated, one should instantiate the base class explicitly. If the base class is not explicitly invoked, then the default base class
constructor is called. A code cannot be compiled if the default base class constructor is not defined. Here is the code from the
derived class, CISection, that shows the instantiation of derived class using an overloaded version of the base class. The
statement
CISection::CISection (const string& strID,
const CVector<float>& fV) : CXSType (numISDimensions)
shows how the base class is instantiated using the overloaded CISection constructor. The base class overloaded constructor is
called first followed by the derived class constructor.
The design of this class is such that the cross-sectional properties are computed as soon as the cross-section is defined in terms
of its ID and its dimensions. This design construct is the reason why the default constructors are not provided in the derived
3 There is a difference between redefining and overloading. If there are two or more functions with the same name in a derived class, then the
functions are overloaded. Similarly, if there are two or more functions with the same name in a base class, then the functions are overloaded.
If there are functions in the base class and derived class with the same name but different parameters, then the base class function is redefined.
classes since there is no way to call the derived class’s function to compute the cross-sectional properties. In the next section,
we will see how this potential problem can be avoided.
One should remember that if class B is derived from class A, then instantiating a class B object would involve class A constructor
being invoked first followed by class B constructor. In the same vein, if object B goes out of scope, the destructor for class B is
called first followed by the destructor for class A.
In the following example we show how the cross-sections inheritance concept is used in a client code.
Example Program 13.2.1 Using inheritance with cross-sectional shapes
Develop a program that has a base class for storing data connected with various cross-sectional shapes. Build two derived classes
to specifically store data connected with the I-section and the rectangular solid section.
We will first look at the base class functions.
xstype.cpp
Note how the base class provides several functionalities that are common for all possible cross-sectional shapes. There are two
constructors that initialize the values of the cross-sectional dimensions and the ID. The overloaded constructor also allocates
memory (line 18) to store the values of the cross-sectional dimensions.
The DisplayProperties() member function displays the values of the cross-sectional properties that are common to all cross-
sections. However, the DisplayDimensions() function displays an error message since the base class does not know the meaning
of these dimensions. It would be inappropriate to call this function. The derived classes’ DisplayDimensions() function should
be called. The GetProperties and GetDimensions functions are not shown. They provide access
Next, we show the sample client code.
main.cpp
The program illustrates the use of public inheritance with the base class CXSType and the derived classes, CISection and
CRectSolid. The base class is purely abstract – there is no object in the program that is associated with the base class! When the
object ISection1 is declared (line 23), it inherits all the data and the attributes of the CXSType class in addition to the additional
data and attributes, if any, of the CISection class. The DisplayProperties() functionality is provided by the base class while the
DisplayDimensions() is provided by each derived class.
Access to Base Class’s Functions
In lines 24 and 32, the derived class’s version of DisplayDimensions is called. What if we wish to call the base class version for
some reason? We can if we use the scope :: operator. For example, if we replace line 24 with
ISection1.CXSType::DisplayDimensions();
would call the base class’s (CXStype) version of DisplayDimensions function.
Other Type of Inheritances
Protected and Private: In the previous example, we saw the derived classes CISection and CRectSolid publicly derived from the
CXSType base class.
class CISection: public CXSType
{
…
};
then the public members in the base class (CXSType) are protected in the derived class. If the inheritance is private, then all the
members of the base class are inaccessible in the derived class. The protected and private inheritances are seldom used in
practice.
Multiple: A class may be derived from more than one base class. We saw an example of this in Chapter 12 when we looked at
C++ file stream classes. For example, the iostream class is inherited from two classes – istream and ostream. Here are the
class definitions.
class istream // handles input
{
…
};
13.3 Polymorphism
In the previous section, we saw an example of inheritance with cross-sectional shapes. We asked the question as to what would
happen if (a) the client code does not know ahead of time what specific cross-sectionals shape are going to be used during a
program run, (b) new cross-sectional shapes are to be added, and (c) the process must be robust such that undefined situations
are detected? The answer lies in the idea associated with polymorphism.
Polymorphism is the ability of different objects to respond in their own way to the same message by means of virtual functions
or late binding or dynamic binding. In the context of the cross-sectional shape example, we want the program to automatically
call the function that would compute the cross-sectional properties for any cross-sectional shape. In other words, we want the
appropriate ComputeProperties() to be called when that function is invoked. We can achieve this objective by declaring the
ComputeProperties() function as a virtual function. When the C++ compiler encounters a virtual function, it generates
instructions to call the appropriate version of the function based on the instance of the object that is encountered during
program execution. For example, if a CRectSolid object is created during program execution and a call is made to
ComputeProperties, then CRectSolid’s version of ComputeProperties is called.
Example Program 13.3.1 Using inheritance with virtual member functions
Rewrite Example Program 13.2.1 using CXStype as an abstract base class but with DisplayDimensions and ComputeProperties
as virtual functions.
As we will see, the changes that we need to make to the program from the preceding section are small. We first start making
changes to the base class. Only the changes in the base class are shown.
xstype.h
There are two changes. We declare the DisplayDimensions and the ComputeProperties functions as virtual functions. The
virtual keyword precedes the function declaration. If a function is declared to be virtual and the function is redefined in the
derived class, then for any object associated with the derived class the derived class version of the virtual function is used not
the base class version. Both these functions are declared as public functions so that they can be called directly by the client code.
Since in this example, the use of the base class functions is not desirable, we have the following error messages embedded in
those two functions.
xstype.cpp
We will illustrate the changes that we need to make in the derived class by looking at the CISection class definition.
isection.h
While not required by C++ standards, we have declared the two functions as virtual functions just to remind ourselves that
these are virtual functions. C++ requires that the virtual keyword be used only in the base class. Note that both these functions
are declared as public functions so that they can be called directly by the client code. The virtual keyword should not be used
in when defining the body of the member function.
Finally, we do not have to make any changes to the client code (main.cpp) at all. In fact, this version of the program is more
robust!
base class functions are called only if the derived class fails to provide its own version of these functions. C++ provides a
mechanism to force the derived class to provide its copy of these functions. If the base class declares these functions to be pure
virtual functions, the derived class then must define its version of these functions. To define a member function as a pure virtual
function in the base class, one must define the function as follows.
virtual return_type functionname (….) = 0;
A base class containing one or more pure virtual functions is called an abstract class since an object cannot be directly associated
with an incomplete class. Similarly, a derived class is also abstract if it has one or more pure virtual functions. We will now
illustrate the definition and use of pure virtual class using the same cross-sections example.
Example Program 13.3.2 Using inheritance with pure virtual member functions
Rewrite Example Program 13.3.1 using CXStype as an abstract base class but with DisplayDimensions and ComputeProperties
as pure virtual functions.
Once again, the changes are small. First the changes in the base class header file are small.
xstype.h
Next, we will then delete the bodies of the two member functions (DisplayDimensions and ComputeProperties) from the base
class (xstype.cpp). No changes need to take place in the code associated with the derived classes. We finally have a program
that is just about right!
private:
float m_fMC; // moisture content
};
We have seen that every I-section or Rectangular Solid section is a cross-section derived from the base class. If we have the
following statements in a program, the program compiles fine.
CVector<float> fVDim(2); fVDim(1) = 4.0f; fVDim(2) = 2.0f;
CXSType xsection;
CRectSolid rsolid ("R2x4", fVDim, 10.5f);
xsection = rsolid;
It is perfectly valid to assign a derived class object to a base class object4. However, there is slicing problem. The information that
4It should also be noted that a derived class object cannot be assigned to a base class object. For example, the following statement will not
compile.
rsolid = xsection;
is available only in the derived class cannot be stored and made available in the base class. In other words, the moisture content
value is not available (sliced off the derived class data). A statement such as
std::cout << "Moisture content is " << Xsection.GetMC() << "\n";
will not compile since the GetMC() function is not visible to the base class object.
C++ provides a solution to this problem – manipulating derived class information via a base class object. For example, if we
change the above code as follows then the statements compile and execute correctly.
CVector<float> fVDim(2); fVDim(1) = 4.0f; fVDim(2) = 2.0f;
CXSType *pXsection;
CRectSolid *pRsolid = new CRectSolid("R2x4", fVDim, 10.5f);
pXsection = pRsolid;
Finally, a point about destructors. When inheritance is used, it is a good idea to define virtual destructors. In the above code,
when the statement
delete pRsolid;
is executed, the destructor of class CRectSolid is called. If on the other hand the code had been written as follows
CVector<float> fVDim(2); fVDim(1) = 4.0f; fVDim(2) = 2.0f;
CXSType *pXsection = new CRectSolid("R2x4", fVDim, 10.5f);
then the last statement would call the destructor of class CXSType. Since the destructor of class CRectSolid is not called, this
process can be dangerous especially if resource allocation and deallocation takes place in the CRectSolid class. All the derived
class destructors can be automatically called if the destructor for the base class is declared as a virtual destructor. In other
words, if a base class destructor is tagged virtual, then the derived class destructor is called first followed by the base class
destructor. It should also be noted that if the destructor for the base class is tagged virtual, then automatically destructors for
the derived classes are tagged virtual. This scheme provides a very nice approach to information management as we will later
see in the example in this section.
Example Program 13.3.3 Using inheritance with pure virtual member functions and virtual destructors
Develop a program that demonstrates how to dynamically generate, store and manipulate information dealing with cross-
sections. The type of the cross-section and its associated data are known only a runtime. The design should be such that adding
new types of cross-sections should have a well-defined easy and robust path.
The solution is built on all the discussions we have had in preceding sections. We explain the solution by showing the developed
source code.
xstype.h
Compared to the earlier uses of the CXType class, there are no changes to the definition of the base class except to declare the
destructor as a virtual destructor. This has major implications as we discussed earlier.
xstype.cpp
Compared to the earlier uses of the CXType class, there are no changes to the definition of the base class.
rectsolid.h
Compared to the earlier uses of the CRectSolid class, there are a few changes. We have added a private member variable, m_fMC
to store the maximum moisture content. To handle this extra member variable, changes are made to the overloaded constructor
(lines 14-15) and two new member functions are declared (lines 20 and 21).
rectsolid.cpp
Compared to the earlier uses of the CRectSolid class, there are a few changes. These relate to the new member variable, m_fMC
that is handled by the overloaded constructor and two new member functions.
main.cpp
This client code merits a close look. We use the CVector class as a container. The basic idea is to use pointers to store the address
of each cross-section as they are defined during execution in a vector (line 17). As we saw earlier in this section, base class
pointers can be used to manipulate derived class objects. In this sample program we define only two cross-sections. There is no
reason why the size of the vector cannot be set dynamically as we have done before in earlier chapters using the SetSize member
function. Line 26 shows how memory is dynamically allocated to store one object of type CISection. After that point, the
program is similar to the previous version. Line 34 shows another dynamic memory allocation to store another cross-section
object of type CRectSolid. Finally, note memory deallocation must take place (to avoid memory leak) and statements in line 46
achieve this objective. Lines 41 and 43 are explained next.
hierarchy (from the base class to a derived class). It should be noted that if a reference is used (instead of a pointer), the casting
failure results in a bad_cast exception rather than a nullptr pointer.
C++ also provides a typeid operator that can be used to obtain the type information. For example, this operator takes as the
argument either an expression or a class name and returns an object of type type_info. Here are a couple of examples to
illustrate the usage.
#include<typeinfo>
The typeid operator can also be used to test or compare the typeinfo value.
#include <typeinfo>
CRectSolid* pR1 = new CRectSolid ("R20x10", fV, 10.5f);
if (typeid(pR1) == typeid(CRectSolid *))
std::cout << "Cross section is a rectangular solid.\n";
Note that
if (typeid(pR1) == typeid(CRectSolid *))
is not the same as
if (typeid(pR1) == typeid(CRectSolid))
One should be cautious in defining virtual member functions. If design requires the use of virtual functions, then go ahead and
use them. Otherwise, do not. There is an overhead associated with the usage of virtual functions. This overhead can sometimes
slow down the execution of a program substantially. C++ compilers create a virtual function table for any class that uses virtual
functions. The address of each of these virtual functions is stored in the table. This table management can be quite involved if
multiple inheritances are involved, and member functions are overridden.
Nested classes
A class can be defined inside another class as long as there is no ambiguity. For example, the following class definition is not
valid since class CA refers to class CB that is private.
class CA
{
float m_fX; // by default, private
class CB { }; // by default, private
class CC // by default, private
{
CB m_b;
};
};
Similarly, the member variable m_fX cannot be used inside class CB or class CC since once again, m_fX is a private member variable.
In the above example, class CA is the enclosing class and classes CB and CC are nested classes. Nested classes are independent
classes. An object of the nested type does not have members defined by the enclosing class. Conversely, an object of the
enclosing class does not have members defined by the nested class. Here is an example that is syntactically correct.
class A // enclosing or outer class
{
private:
float m_fX;
class B { }; // private nested or inner class
public:
class C // public nested or inner class
{
void XYZ (int n); // private by default
…..
public:
void ABC (int n);
};
};
A nested class can be either public or private. If it is public, then it can be used outside the enclosing class. Using the above
example, one can declare a variable as
A::C aCobject;
Nested classes are typically used to avoid naming conflicts – to place the name of a class inside the scope of another class so
that two classes can contain another class inside them with the same name and having different functionalities.
Example Program 13.3.4 Nested Classes
Develop a program to store basic material properties and allow for conversion of values from one set of units to another (SI
and US Customary units). Assume that the SI-related units are kg, m, s, and the USC-related units are slug, ft, s.
We will define a base class CMaterial designed to have member functions and variables to store the material data. We will derive
two classes CMaterialSI and CMaterialUSC to define objects to store material properties in SI and US Customary units
respectively. In each of these classes we will define a nested class CConvert that will convert the material values from SI to USC
units and back.
material.h
Lines 11 through 39 define the base class. There are four member variables – name of the material, mass density, yield strength
and modulus of elasticity.
The derived classes have a nested class in them – CConvert that has a member function GetValue. The GetValue function is
designed to convert the value from one set of units to another. For example, the GetValue function defined within the
CMaterialSI::CConvert class is designed to convert the values from SI to USC units. The static member variable m_dVConvert
stores the conversion factor from SI to USC units for each one of the material properties. A member variable with the same
name is also defined in the CMaterialUSC::CConvert.
Since the member variable is static, the vector is initialized at the top of material.cpp file. Select functions are shown below.
material.cpp
main.cpp
A CMaterialSI object (for widely used steel material) is defined in line 23. The current values are displayed in line 24. An object
to facilitate the conversion of units is defined in line 19. Note how the scope operator :: is used as well as the fact that the
definition is possible since the nested class is declared public. Another object storing values for steel in USC units is defined in
lines 27-29. Note how the object to convert the units is used in conjunction with the GetValue member function. The reverse
process is then defined in lines 33 through 35 to now define a clone of the original (steel) object. We check our program by
making sure that execution of the Display function in lines 30 and 37 displays the same values.
Note that the names of the variables in the two examples are ptQuadraticRoots and ptLinearInterp. Once the function pointer
variables are declared, they can be used just like any other variable.
Example Program 13.4.1
In this example we will illustrate how non-class member pointer variables can be used.
main.cpp
In lines 10 and 11 we define two simple functions completely. Both these functions return a float value and accept two float
variables as function arguments. However, their names and functionalities are completely different. In lines 13 through 27 we
define a function, Use_In_A_Function, that has a single argument – a pointer to a function that returns a float value and accepts
two float variables as the function arguments.
We start illustrating the usage of the function pointer variables in the main program.
In line 33 we define a variable called ptAddFunction that is a pointer to a function that returns a float value and accepts two float
variables as function arguments. The variable is initialized to nullptr. In line 36, the variable is given a value, the address of the
function AddSimpleTwo that was declared and defined in line 10. The function is invoked in line 39 with the arguments as 2.3
and 3.4.
How the variable can be used as an argument in a function call is shown in line 42 where the function Use_In_A_Function is
called. More function pointer usage is shown in that function. In line 16, the == equality operator is used to check if the variable
is initialized with an appropriate address. In lines 13 and 15 the same == operator is used to check what function is passed as the
argument. Finally, line 26 shows how the variable is used to invoke the function similar to the usage in line 39.
The major difference in the declaration of the member function Use_In_A_Function is the use of the scope operator :: to
qualify the the member pointer variable is associated with the CMyAddLibrary class (line 12).
myaddlibrary.cpp
Once again we point the major differences in the definition and the usage. Lines 29 and 31 must now contain the class name
(CMyAddLibrary::) so that the scope of the usage is clear. In line 35, the appropriate member function from the class can be
invoked only by using (*this.*ptrFunc) qualifier.
main.cpp
There are several changes to bring in the class (object-oriented) associated usage. As before, we declare a pointer variable to a
function in line 14; however we need to add the class qualifier (CMyAddLibrary::). In addition, we need an object associated
with the class – this is specified in line 15. Note that lines 12, 15 and 18 are essentially the same as before except that we now
need the class qualifier. In lines 21 and 22 we show how the variable value can be reassigned and how the variable can be reused.
Functors
Function pointers can be used in the context of callback functions. A callback function is a function that is called using a function
pointer. We have already seen callback function usage not only in this section but also in Chapter 6. A Function Object, or
Functor is simply any object that can be called as if it is a function. Functors can encapsulate function-pointers in C and C++
using templates and polymorphism. Thus, you can build up a list of pointers to member-functions of arbitrary classes and call
them all through the same interface without bothering about their class or the need of a pointer to an instance5. However, all
the functions must have the same return-type and calling parameters. Sometimes Functors are also known as Closures. Functors
can also be used to implement callbacks.
We start the implementation by first defining a base class FunctorABC that provides a virtually overloaded operator () with
which one will be able to call the required member function. From the base class we derive a template class that is initialized
with a pointer to an object and a pointer to a member function in its constructor. The derived class overrides the operator ()
of the base class. In the overridden versions it calls the member function using the stored pointers to the object and to the
member function. Here is an example that illustrates the ideas.
Note that the class is a template class with Type being int, float , double etc. The () operator is overloaded and is a pure
virtual function.Next, we define our math library.
math2lib.h
Our math library is a template class derived from the abstract base class, FunctorABC. The class object is invoked (see line 7) by
specifying one of the math functions in the library and the data type associated with the computations, and the constructor
(lines 12 and 13) takes a pointer to an object associated with one of the math functions and a pointer to the member function
associated with the math function that actually carries out the computation. For example
CMath2Lib<MathFunction, DataType> object(&MathFunctionObject, &MemberFunction);
The overall process of using the functors is a 4 step process with the first 3 steps being used to set up the usage and step 4 being
the actual use. In the first step we instantiate objects associated with the two functions (lines 7 and 8) in the math library. The
next step involves instantiating the objects, the functors that will be used in invoking the two functions. To make this invocation
simple, we need the third step where we define the abstract base class array where each element in the array contains the address
of the pointer objects that can be used to invoke the individual functions in the library. From this point onwards, the math
functions can be invoked simply by using the elements of this array as seen in lines 25 and 27.
It may be worthwhile to look at the Example13_4_4 project that illustrates how user-defined functions can be integrated in
programs using numerical solution techniques with the Newton-Raphson root finding technique as an example. The example
is not discussed here since the next section presents a more comprehensive example – development of a
Some of these ideas are illustrated in the next example program. As the last example in the last chapter that deals primarily with
modern C++, most previous examples are updated and brought under a common framework with the intent that the reader
will spend time enhancing the features so that the developed code will be useful for engineering and scientific applications.
6 https://docs.microsoft.com/en-us/cpp/cpp/errors-and-exception-handling-modern-cpp?view=vs-2017
Unit Testing
std::exception
CNFToolBox
Newton-Raphson
Numerical Differentation CArrayBase
....
CVector CMatrix
Each error in the list of errors needs to be caught in the CVector or CMatrix class and sent to its own ErrorHandler function.
Two examples are shown below. The first involves a system error, std::bad_alloc that is handled via the
VECTOR_ALLOCATION_ERROR as shown in Fig. E13.5.1(d).
Numerical Functions Toolbox: The second part of the program is the numerical functions toolbox. This class is derived
from the std::exception class so as to enable exception handling functionalities. The public portion of the class is shown in
Fig. E13.5.1(g). There are four errors identified in lines 19 and 20 – (Newton-Raphson Method) when the derivative used to
compute the next estimate of the root is close to zero, when the root cannot be found in a specified number of iterations and
tolerance, when a floating point exception takes place, and (Ridders’ Method) when an invalid input is specified by the user for
the step size.
Three error handler functions are declared – in line 34 for the errors listed on line 19-20, in line 35 for handling errors from the
CVector and CMatrix classes, and in line 36 for handling system-generated exceptions. In lines 37 and 39 we see functions that
can be called to check if overflow or underflow has taken place or not. The two solution methods are available via their compute
functions (NRCompute and RMCompute) shown in lines 22-31 where an argument is used to point to the user-supplied function for
computing the required function and derivative values.
A revised version of the code presented in Example Program 6.4.1 is used here for the Newton-Raphson Method. Note that
the Newton-Raphson technique update takes place as
f xi
x i 1 x i i 1, 2,...... (13.5.1)
f xi
where a polynomial of order n has (n+1) terms - f ( x ) a n x n a n 1 x n 1 ... a1 x a 0 .
The first error condition (DERIVATIVEISZERO) is detected in line 43-44 so that a potential divide by zero or an overflow condition
does not take place when Eqn. (13.5.1) is used. The second error condition (UNABLETOCONVERGE) is detected outside the iterative
loop, in line 67. Finally, instead of checking every single floating point computation, overflow and underflow conditions
(FLOATEXCEPTION) are detected in line 63-44. The reader is urged to look at the the two functions bool
CNFToolBox::OverFlowDetected () and bool CNFToolBox::UnderFlowDetected (), to see what system function calls are
necessary to obtain the floating point exceptions. The main program is shown below.
main.cpp
A menu of test options (lines 19-22) is shown in line 28 via the GetInteractive function in line 27. It should be realtively easy
matter to add more methods to both the toolbox and hence, the menu and test cases. The test cases are created and solved in
the try block (lines 33-47). The files TestCases.h and TestCases.cpp contain the code for the classes that act as an intermediary
between the main program the the numerical toolbox. These intermediary test classes are really not needed, but they help put
the user-defined functions in identifiable classes. Adding new functions to these test cases is as simple as commenting and
uncommenting lines in the source code.
The catch block is in three parts – the toolbox error is caught in lines 45-48, the CVector and CMatrix errors are caught in lines
50-53, the system-generated errors are caught in lines 55-58, and rest of the errors that do not belong to these three categories
are caught in the catch all block in lines 60-63.
Finally, two executions of the program are shown. A rather unique polynomial (Fig. E13.5.1(i)) whose root is difficult to find is
used to illustrate the exception handling concepts, and the use of Ridder’s Method in computing derivatives is shown in Fig.
E13.5.1(j).
Summary
This chapter completes the formal treatment of C++ related object-oriented techniques in this book. In the rest of the chapters,
we will see how to use these OO ideas to develop and implement algorithms and solve engineering and scientific problems
numerically.
Exercises
As a reader, learner and practitioner, you have reached a high level of maturity in organizing your thoughts, breaking
a non-trivial problem into small, manageable pieces, developing specifications, and implementing detailed
algorithms via C++. The problems below involve this higher level of thinking. There is no unique, best answer.
There are poor and then there are well thought-out solutions!
Appetizers
Problem 13.1
Review Section 6.1. Write the specifications for a class that can be used to find the roots of a function. Implement, test and
document the class.
Problem 13.2
Review Section 6.2. Write the specifications for a class that can be used to differentiate a function. Implement, test and document
the class.
Problem 13.3
Review Section 6.3. Write the specifications for a class that can be used to integrate a function. Implement, test and document
the class.
Main Course
Problem 13.4
Review Section 10.3 and your solution to Problem 10.5. Rewrite the specifications for the matrix (template) toolbox class. What
changes did you make and why? Implement, test and document the class.
Problem 13.5
Review Section 11.3. Write the specifications for a class that can be used for data fitting via regression analysis. Implement, test
and document the class.
C++ Concepts
Problem 13.6
Develop a template class (COOMatrix) to store two-dimensional arrays similar to the CMatrix class that we saw in Chapter 9.
However, derive this class from the CVector class and store the elements of the matrix in a vector. Note that if we have a matrix
Am n , then element Aij is located at ( i 1)n j if the matrix is stored row-wise and at ( j 1)m i if the matrix is stored
column-wise. Write a few test programs to compare the runtime performance of this implementation with the CMatrix
implementation.
Problem 13.7
Rework Example Program 13.5.1 by not using C++ exceptions but error codes, and then compare the two approaches to
handling and managing errors.
References
A good source for a number of C++ libraries: http://en.cppreference.com/w/cpp/links/libs
Chapter
“Scienceisadifferentialequation.Religionisaboundarycondition.”AlanTuring
A differential equation contains derivatives of the dependent variable. We have an ordinary differential equation (ODE) when
the equation contains only one independent variable. There are several engineering and scientific models that involve derivatives
that are related to other parameters via known expressions. For example, you may have seen in your first course in physics,
ds
velocity, v , of an object defined mathematically as v , the rate of change of its position. Or, in a college course in
dt
mechanics, the differential equation describing the transverse displacement, y , of a beam (see Chapter 11.1) can be expressed
d 4 y( x ) w( x )
as . Solving such differential equations either analytically or numerically provides a wealth of information in
dx 4 EI
understanding the complex relationship between the dependent and independent variables.
Objectives
To understand the different types of ordinary differential equations.
To understand how to use the Runge-Kutta Methods to solve ODEs.
taking y i 1 f x i h , y i f x i . The attractive feature of Eqn. (14.1.9) is that it is possible to solve for y1 using y 0 ,
y 2 using y1 , and so on.
k2 f x i p2 h , y i 21hk1 (14.2.2)
…
kn f x i pn h , y i n 1hk1 n 2 hk2 .... n ,n 1hkn 1 (14.2.3)
In general, we can use Eqns. (14.2.1)-(14.2.3) to generate a general solution as
n n
j 1
y i 1 y i h x i , y i , h y i h c j k j y i h c j f x i p j h , y i a jl kl (14.2.4)
j 1 j 1 l
We will now see how to customize Eqn. (14.2.4) to generate various forms of Runge-Kutta methods.
14.2.1 First-Order Runge-Kutta Method
Using the general form of the method, Eqn. (14.2.4) with n 1 , we have the expression for the first order Runge-Kutta method
as c 1 1
y i 1 y i hc 1k1 y i hf x i , y i (14.2.1.1)
Fig. 14.2.1 Solution to the given ODE with two different h values
k2 f x i p2 h , y i a 23hk1 (14.2.2.6)
We can use Taylor’s series expansion to expand Eqn. (14.2.2.6) as
k2 f x i p2 h , y i a 23 hk1 f i p2 hf x i a 21hf yi f i O h 2 (14.2.2.7)
and
y i 1 y i hf i c 1 c 2 h 2 f x i c 2 p2 h 2 f i f yi c 2 a 21 O h 3 (14.2.2.8)
h hk
k2 f x i , y i 1 (14.2.2.14)
2 2
h hk
k3 f x i , y i 2 (14.2.2.15)
2 2
k4 f x i h , y i hk3 (14.2.2.16)
We will use this version to develop an example program that implements the 4th order Runge-Kutta method.
The main program details are explained first. The variables associated with the two problems are as follows – the value of h
(lines 23 and 32), the range of x values (lines 24-25 and 33-34) and the initial value of y (lines 24 and 33) for each problem.
The problem to be solved is selected in line 16. The rest of the required details are computed next. The range of x values is
divided into equal parts using the value of h (line 37) and the computed number of points is used to allocate the memory
required to store y and dy dx values at each point in line 38. The ODE solver object is defined in line 41 by invoking the
overloaded constructor. The public member function GoCompute is called to obtain the solution (line 42), and the results are
displayed in a tabular form (lines 44-55).
main.cpp
The errors thrown by the overloaded constructor are caught and handled in lines 58-65.
This design of the ODE solver class makes it possible to hide the details of the solver from the user. The big question to ask is
“Where does the function evaluation take place to compute f i to obtain x i , y i ?”. The GoCompute function (file
ODESolver.cpp) reveals the where the 4th-order Runge-Kutta method class is used in obtaining the solution (lines 33-36).
Note the last argument in the Solve function – it is the name of the function where f i is computed. The member function
UserFE is where the function evaluation takes place - x i , yi
come in as input and the y i is updated with the computed f i
value. Finally we will look at the Runge-Kutta class. The class has three public functions – the constructor, destructor and the
Solve function (RK4OM.h file).
The Solve functionalities can be divided into three parts. In the first part, the function value at the initial point is computed
(lines 24-29).
The second part involves the loop where the solution at all the other points is computed. This involves computing the four ki
values given by Eqns. (14.2.2.13)-(14.2.2.16) – lines 34-55. For each ki , the user function is called to obtain the f i x i , y i
value.
In the final part, the recurrence formula, Eqn. (14.2.2.12), is used to compute the value at the next point (line 58), y i 1 , and
preparations are made for computations at the next point.
We will solve the problem using h 0.2 . Converting the original problem into two first-order differential equations and using
y1 y , we have an equivalent problem of the form
dy1
y2 y1 ( x 0) 0
dx
dy 2
y 2 5 y1 y 2 ( x 0) 1
dx
The slopes at the beginning of the interval are f f x , y1 , y 2
Problem setup takes place in lines 19-21. The problem selection is in line 19 with the problem details defined in lines 23-44. The
ODE solver object is created in line 50 and the system of ODEs solved in line 51.
The output showing the solution takes place in lines 53-70. The errors detected in the constructor in CODESolver are caught in
lines 73-80.
Summary
Differential equations are used to understand problems in a number of engineering and scientific disciplines. In this chapter, we
saw a very powerful technique, the Runge-Kutta Method to solve a system of ordinary differential equations. In the next chapter,
we will examine partial differential equations where there are more than one independent variables. In some cases, the PDEs
can be converted into a system of ODEs.
Exercises
Appetizers
Problem 14.1
dy( x )
Solve the following problem - x 2 y , 0 x 2 with y( x 0) 1 , by hand using h 0.1 .
dx
Problem 14.2
Modify the program Example_14_2_3 to solve Problem 14.1.
Main Course
Problem 14.3
d2 y dy dy
Solve the following second-order differential equation: 4 2
5 3 y 0 with y( x 0) 3, ( x 0) 1 in the
dx dx dx
interval 0 x 5 .
Problem 14.4
d 2x dx
The motion of a damped spring-mass system is given by m c kx 0 , where x is the displacement from the
dt 2 dt
equilibrium position, m is the mass, c is the damping coefficient, k is the spring stiffness, and t is time. The problem
description is complete if the initial t 0 displacement and velocity are known, and the time interval over which the solution
is required, i.e. 0 t t final . Write a program that uses the 4th order Runge-Kutta method to solve this problem.
Chapter
“Thepurposeofcomputingisinsight, NOTnumbers.”RichardHamming
In Chapter 14, we saw differential equations where the dependent variable was a function of a single independent variable. In
scientific and engineering applications, there are numerous problems where a dependent variable is a function of more than one
independent variable (and their derivatives). For example, if y y( x 1 , x 2 ,..., x n ) , then a partial differential equation (PDE)
y y y 2 y 2 y
may be written as f x 1 , x 2 ,..., x n , , ,..., , , ... ,... 0 .
x 1 x 2 x n x 12 x n2
This chapter is geared for scientists and engineers who wish to learn a powerful numerical technique called the finite element
method. Obviously, this textbook is not a textbook devoted exclusively to discussing the finite element method but the main
aim of this chapter is to show how the method can be used to solve one-dimensional PDEs via the finite element method.
Objectives
To understand the basic concepts associated with partial differential equations.
To understand the different engineering examples of PDEs.
To understand the basics of finite element method (FEM) and specifically, the Galerkin’s Method.
To understand and practice numerical solution of one-dimensional PDEs via the finite element method.
SUPPLEMENTAL MATERIAL
A Windows-based program, 1DBVP© can be downloaded from the book’s web site and used to solve the problems discussed
in this chapter. The program is based on the finite element method.
15.1 Background
One-dimensional boundary-value problems influence a variety of engineering areas - we list some of the more popular examples
below.
Specialty Area Problem Description
Solid Mechanics Transverse deflection of a cable
Hydrodynamics One-dimensional flow in an inviscid,
incompressible fluid
Magnetostatics One-dimensional magnetic potential
distribution
Heat Conduction One-dimensional heat flow in a solid medium
Electrostatics One-dimensional electric potential distribution
Consider a one-dimensional boundary value problem (also known as equilibrium problem) given by
d2 y dy
P ( x ) Q( x ) y F ( x ) (15.1.1)
dx 2 dx
If P and Q are constants, then
d2 y dy
2
P Qy F ( x ) (15.1.2)
dx dx
The total solution is
y ( x ) Ae 1x Be 2 x C 0 F ( x ) C 1 F '( x ) ..... (15.1.3)
where the constants of integration A and B can be found by substituting the two boundary conditions into Eqn. (15.1.2).
Note that we need two boundary conditions for the problem to be well-posed. The boundary conditions are of three types.
(a) The function y may be specified. This is known as the Dirichlet or Essential boundary condition.
(b) The derivative y ' may be specified. This is known as the Neumann or Natural boundary condition.
(c) The function y and the derivative y ' may be specified. This is known as the Mixed or Robin boundary condition.
In the FE solution, an approximate or trial solution ~y( x ) is constructed and solved for1. The FE approach has three distinct
operations.
(a) A trial solution ~y( x ) is constructed.
(b) An optimizing criterion is applied to ~y( x ) .
(c) An estimation of the accuracy of ~y( x ) is made.
Trial Solution
The trial solution is constructed with a finite number of terms as
~
y( x ; a ) 0 ( x ) a11 ( x ) a 2 2 ( x ) ...... a nn ( x ) (15.1.4)
where i ( x ) are known trial or basis functions and the coefficients a i are undetermined parameters known as degrees of
freedom (DOF). The purpose of the trial function 0 ( x ) is to satisfy some or all of the boundary conditions. The most
common form of trial solutions is to use polynomials. We will see more about this later.
1 Those familiar with the finite difference solution will note that there are two approaches to solving the boundary-value ODE – the shooting
Optimizing Criterion
The optimizing criterion is used to generate the appropriate equations so that we can solve for the numerical values of the
coefficients a i . As you can guess, the optimizing criterion is not unique and different approaches define what is meant by the
“best possible approximation” to the exact solution. The two most common forms are
(a) The Method of Weighted Residuals (MWR) – applicable when the problem is described by differential equations,
and
(b) The Ritz Variational Method (RVM) - applicable when the problem is described by integral (or, variational) equations.
In this lesson, the focus is on the former method. In the MWR, the criteria minimize an expression of error in the differential
equation. In the RVM, an attempt is made to extremize (typically, minimize) a physical quantity. We will see this approach in
the second module of this course.
Collocation Method
In this method, for each of the parameter a i a point x i is chosen and the residual is set to zero at that point.
R( x i ; a ) 0 i 1,.., n (15.1.7)
The points x i are known as the collocation points. Note that we are setting the error in the residual, not the error in the solution,
to zero.
Subdomain Method
In this method, for each of the parameter a i an interval x i is chosen and the average of the residual is set to zero.
1
x i R( x ; a ) dx 0
x i
i 1,.., n (15.1.8)
Least-squares Method
In this method, with respect to each a i we minimize the integral over the entire domain, the square of the residual.
a i
R 2 ( x ; a ) dx 0 i 1,.., n (15.1.9)
R( x ; a ) ( x )dx 0
i i 1,.., n (15.1.10)
The natural question is “Which of these techniques is the most appropriate?” A detailed answer is outside the scope of this text.
However, experience has shown that the Galerkin’s Method is the most suitable for the type of finite element applications
discussed in this text.
Example 15.1.1
Let us look at an example to see how the Galerkin’s Method works. Consider the problem
d dy( x ) 2
DE: x 1 x 2 (15.1.11)
dx dx x 2
BC: y( x 1) 2 (15.1.12)
dy 1
x (15.1.13)
dx x 2 2
~ 1
y 2 ( x 1) a 3 ( x 1)( x 3) a 4 ( x 1)( x 2 x 11) (15.1.19)
4
This equation is now in the familiar form
~
y( x ; a ) 0 ( x ) a11 ( x ) a 2 2 ( x ) (15.1.20)
1
where 0 ( x ) 2 ( x 1)
4
1 ( x ) ( x 1)( x 3)
2 ( x ) ( x 1)( x 2 x 11)
and we will assume for the sake of convenience that a1 and a 2 in (15.2.20) are the same as a 3 and a 4 in (15.2.19). We will now
write the residual as
d d y( x ) 2
~
R( x ; a ) x 0 (15.1.21)
dx dx x 2
Substituting (T3L2-10) in the above equation, we have
1 2
R( x ; a ) 4( x 1)a1 3(3x 2 4)a 2 2 (15.1.22)
4 x
Now using the Galerkin’s Method (Eqn. (15.1.10))
2
1 2
4 4( x 1)a
1
1 3(3x 2 4)a 2
x2
( x 1)( x 3) dx 0
2
1 2
4 4( x 1)a 3(3x 2 4)a 2 ( x 1)( x x 11) dx 0
2
1 (15.1.23a)
1
x2
Integrating yields two equations
5 41 29
3 8 ln 2
5 a1 6
(15.1.23b)
41 81 a 2 211
24 ln 2
5 2 16
And solving
a1 2.138 and a 2 0.348
Hence, substituting in Eqn. (15.1.10) yields
~ 1
y( x ; a ) 2 ( x 1) 2.138( x 1)( x 3) 0.348( x 1)( x 2 x 11)
4 (15.1.24)
0.348 x 3 2.138 x 2 4.629x 4.839
There is an important concept associated with a derived term - flux that we have not seen before and should have. Flux is
defined as
dy
( x ) x (15.1.25)
dx
Flux has a physical interpretation and we will discuss the concepts in the next topic. Substituting in Eqn. (15.1.15), we have
~ 1 1
(x; a ) ( x 2) 4.276x ( x 2) 1.043x ( x 2)( x 2)
2 4 (15.1.26)
1.043x 3 4.276x 2 4.629x
Finally, note that this solution is called the theoretical (or, classical) solution because of the manner in which the two boundary
conditions were imposed. In this methodology, the BCs were imposed on the trial solution itself so that the trial solution would
satisfy the BCs exactly. However, this in no way implies that the solution is correct in the interior of the problem domain.
~ 1
y( x ; a ) 2 ( x 1) 3.3725( x 1)( x 3) 0.8881( x 1)( x 2 x 11)
4
0.0864( x 1)( x 3 x 2 x 31) (15.2.30)
0.0864 x 0.888x 3.3725x 5.848x 5.277
4 3 2
and
~ 1 1
(x; a ) ( x 2) 6.745x ( x 2) 2.664 x ( x 2)( x 2) 0.346x ( x 2)( x 2 2x 4)
2 4 (15.2.31)
0.346x 4 2.664 x 3 6.745x 2 5.848x
We will now plot the three trial functions and the exact solution both for the function and the flux (Fig. 15.1.1).
Comparison of Solutions
2.00
0.0864*x*x*x*x-0.888*x*x*x+3.3725*x*x-5.848*x+5.277
-0.348*x*x*x+2.138*x*x-4.629*x+4.839
0.427x*x-1.958*x+3.531
2/x+0.5*ln(x)
1.75
y
1.50
1.25
1.0 1.2 1.4 1.6 1.8 2.0
1.50
-0.346*x*x*x*x+2.664*x*x*x-6.745*x*x+5.848*x
1.043*x*x*x-4.276*x*x+4.626*x
-0.854*x*x+1.958*x
1.25 (2/x)-0.5
1.00
0.75
0.50
0.25
1.0 1.2 1.4 1.6 1.8 2.0
here is quite clear – (a) small errors in the function do not translate to small errors in the flux that involve the derivatives of the
function, and (b) increasing order (polynomials) trial functions yield better and converging solutions2.
The residual function is the function corresponding to the original differential equation (with all the terms on the LHS) in which
an approximate solution is substituted. It measures how close the approximation is to satisfying the DE but does not tell us
how close the approximation is to the exact solution. The Method of Weighted Residuals converts the original DE into a set of
algebraic equations that are much easier to solve. There are different approaches used in terms of weighting the residual. Of all
the techniques discussed, the Galerkin’s Method is by far the most suitable for the type of problems encountered in engineering
analysis.
In the lessons dealing with this topic, we saw how to assume the approximate solution called the trial solution and use the
Galerkin’s Method to generate the algebraic equations. The trial solution is usually a polynomial because of the properties that
they possess (continuous, differential, easy to handle etc.). We saw how to enforce the boundary conditions on the trial solution.
We also looked at the flux term. Finally, we also saw how to obtain more accurate solutions by increasing the order of the
polynomial in the trial solution.
There are two problems with this classical (or, theoretical approach). First, the trial solution is valid for the entire problem
domain. Hence it is not possible to accurately model problems in which there are known discontinuities in the solution and the
flux. Second, the manner in which the boundary conditions are enforced can become cumbersome for more complex problems.
In this next topic, we will see how to overcome both these drawbacks.
2Later we will see that the trial functions must have certain properties for this to be true. If we keep on increasing order of the trial solution
will we converge to the exact solution for this problem?
Exercises
Problem 15.1.1
Consider the differential equation
d2 y y
0 0 x
dx 2 4
with y(0) 1 and y( ) 0 . Find the solution using the Galerkin’s Method by assuming the trial solution as
y ( x ; a ) a 1 a 2 x a 3 x 2 a 4 x 3 . Compare with the exact solution.
Problem 15.1.2
Consider the differential equation
d 2 dy 1
30x 204 x 351x 110x 0x 4
4 3 2
x
dx dx 12
with y(0) 1 and y(4) 0 . Find the solution using the Galerkin’s Method by assuming the trial solution as (i) a
quadratic polynomial, and (ii) a cubic polynomial. Compare to the exact solution.
Problem 15.1.3
Consider the differential equation
d dy( x )
x 1 0 1 x 2
dx dx
dy
with y( x 1) 1 ( x 1) 1
dx x 2
(a) Using the Galerkin’s method with a quadratic polynomial for the trial solution obtain an approximate solution for both
the function and the flux.
(b) Obtain a second approximate solution using a cubic polynomial for the trial solution.
(c) Compare the two solutions with each other and the exact solution.
We will stick with the example from the previous topic. Using the definition of the residual, we have
d d y( x ) 2
~
R( x ; a ) x (15.2.2)
dx dx x 2
Since we have n parameter trial function, we need n residual equations (see Eqn. (15.2.10))
xb
R( x ; a ) ( x )dx 0
xa
i i 1,..., n (15.2.3)
Notice that we have changed the limits of integration (instead of using 1 and 2 as the limits) just to make the derivation general
and more useful. Substituting, we have
xb ~
x y( x ) 2 i ( x )dx 0
d d
dx dx x 2 i 1,..., n (15.2.4)
xa
Step 2: Integrate by parts the highest derivative term in the residual equations (which would be first term containing the second-
order derivative). Rearranging the terms, we have
xb
y
~ ~
xb xb
d y d i 2 d
dx dx
x dx x 2 i dx i
dx x i 1,..., n (15.2.5)
xa xa
xa
Three observations here – (a) the highest order derivative in Eqn. (15.2.5) is now lower (only first order derivatives compared
to second-order in Eqn. (15.2.4)), and (b) the “stiffness” term is on the LHS and the “loading” terms are on the RHS, and (c)
the loading term contains the boundary flux term.
Step 3: Substituting Eqn. (15.2.1) in Eqn. (15.2.5) and noting that
~
d y( x ; a ) d 0 ( x ) n d ( x )
ai i (15.2.6)
dx dx i 1 dx
we have
zb
d y
~
n x b d i d j xb
2
j 1 x a dx
x
dx x
dx a j 2 i dx x
dx
i
xa
x a
xb
d i d 0
x dx i 1,.., n (15.2.7)
xa
dx dx
Let
d i d j
xb
K ij
xa
dx
x
dx
dx
x
d y b d i d 0
~
xb
2
and Fi 2 i dx x i x dx (15.2.8)
x dx x a dx dx
xa
Then we can rewrite Eqn. (15.2.7) in the matrix notation as
K 11 K 12 . . K 1n a1 F1
K K 22 . . K 2n a 2 F2
21
. . . . . . . (15.2.9)
. . . . . . .
K n 1 Kn2 . . K nn a n Fn
or,
K n n a n 1 Fn 1 (15.2.10)
The stiffness matrix K is symmetric since
xb
d j d i
K ji
xa
dx
x
dx
dx K ij (15.2.8b)
implying that
1 ( x ) 1 2 ( x ) x 3 ( x ) x 2 (15.2.12)
The only difference between Eqn. (15.2.1) and the above equation is the 0 term. In this modified approach we will be applying
the BCs numerically and hence there is no need for the 0 term. Now we are ready to compute the terms in Eqn. (15.2.8).
Using (15.2.12)
d 1 d 2 d 3
0 1 2x (15.2.13)
dx dx dx
We will compute a typical term in the stiffness matrix
xb
2 3
K 23 (1)( x )(2 x )dx
3
x b x a3 (15.2.14)
xa
~
~ d y
x a 2 x 2 a 3 x 2 (15.2.17)
dx
Step 5: Generate the algebraic equations as
1 1
2 ~
|x a |xb
~
0 0 0 b x x a
1 a
1 2 2 3 x ~
x b x a2 x b x a3 a 2 2 ln b |x a x a |xb x b
~
0 (15.2.18)
2 3 xa ~
2 3 a 3 ~
|x a x a2 |xb x b2
0
3
x b x a3 xb x a
4 4
2( x b x a )
We can now proceed further by substituting the problem data.
Step 6: Substituting the numerical data.
xa 1 xb 2
0 0 0 ~ | ~ |
a1 1 ~
x 1 x 2
0 3 14 ~
a 2 2 ln 2 |x 1 |x 2 (2) (15.2.19)
2
3
a 3 2 ~ ~
0 14 | | (4)
15
x 1 x 2
3
Step 7: Applying the BCs
dy 1
First we will apply the natural boundary condition x
dx x 2 2
~ 1
0 |x 1 2
0 0
a1 1 ~
0 3 14
a 2 2 ln 2 |x 1 1 (15.2.20)
2 3
a 3 2 ~
0 14 |x 1 2
15
3
Since the NBC is applied to the system equations it does not guarantee that the final solution will satisfy the NBC. In the classical
approach, we enforced the NBC up front on the trial solution itself so that the final solution did satisfy the NBC.
Applying the EBC y( x 1) 2 is different than what we saw in the Direct Stiffness Method (Topic 2). This is because there
is no equation directly associated with y( x ) . Imposing the EBC on the trial solution itself (Eqn. (15.2.11))
a1 a 2 a 3 2 (15.2.21)
We can write a 3 in terms of the other two parameters as
a 3 2 a1 a 2 (15.2.22)
Eliminating a 3 from Eqns. (15.2.20) using the above equation, we have
~ 3
0 0 |x 1 2
14 3 14 a1 ~ 31
|x 1 2 ln 2 (15.2.23)
3 2 3 a 2 3
15 14 ~
15 |x 1 34
3
Eliminating the last equation (from above, Eqn. (1)-Eqn. (3), Eqn.(2)-Eqn. (3))
31 65
15 3 a1 2
(15.2.24)
31 43 a 2 71
2 ln 2
3 6 3
The final equations are symmetric, do not contain any boundary terms and can be solved numerically.
Step 8: Solving the system equations
a1 3.719 a 2 2.254 (15.2.25)
Substituting in Eqn. (15.2.22),
a 3 0.535 . (15.2.26)
Hence
~
y 3.719 2.254 x 0.535x 2 (15.2.27)
and
~
2.254 x 1.070x 2 (15.2.28)
While we have overcome some obstacles with this procedure, the process has still a major drawback. We again revisit the issue
of imposing the boundary conditions.
One-Element Solution
Step 4: We will once again stick with the same problem as the last section. Let us assume that the domain is discretized into a
single element. Let us now assume a typical element as shown in Fig. 15.2.1.
1 2
x
x L x2
1
y
y 2
1
Fig. 15.2.1
The element is described by two nodes that are labeled 1 and 2 and are located at x 1 and x 2 respectively, with the length of
the element as L . Let us also assume that the solution varies linearly over the element and is y1 at node 1 and y 2 at node 2
noting that at this stage these values (called nodal values) are unknowns. We can describe the variation of the solution over the
element as a linear interpolation using the two nodal values. In other words
~
y( x ) a1 a 2 x 1 ( x ) y1 2 ( x ) y 2 (15.2.29)
This is indeed the trial solution that we have used as the starting point in the preceding sections. Using the end
conditions
y ( x x 1 ) y1 y( x x 2 ) y 2 (15.2.30)
and substituting in the first part of Eqn. (15.2.29) we obtain
y1 a 1 a 2 x 1
y 2 a1 a 2 x 2 (15.2.31)
Solving for the a i ’s, we have
x 2 y1 x 1 y 2 y 2 y1
a1 a2 (15.2.32)
x 2 x1 x 2 x1
Therefore,
~ x 2 y1 x 1 y 2 y 2 y 1
y( x ) x (15.2.33)
x 2 x1 x 2 x1
Rearranging
~ x2 x x x1 x x x x1
y( x ) y1 y2 2 y1 y2 (15.2.34)
x 2 x1 x 2 x1 L L
The trial functions
x x x x1
1 2 2 (15.2.35)
L L
are known as the shape functions. They have special properties as shown in Fig. 15.2.2.
1 2
x
x L x2
1
1 1 2 1
Fig. 15.2.2
Going back to Step 4 in the preceding section, we have no 0 ( x ) 3 and the i ’s are given by Eqn. (15.2.35). Hence for a typical
element
k11 k12 y1 F1
k (15.2.36)
21 k22 y 2 F2
Let us evaluate couple of typical terms.
x2 x
d 1 d 2 2
1 1 1 x1 x 2
k12 x dx dxx dx x L ( x ) L dx 2 L (15.2.37)
1 1
x2
2 2 2 x
F1int 1 dx ln 2 (15.2.38)
x1
x2 x1 x 2 x1 x1
d y
~
F bnd
x (15.2.39)
2
dx
xb
Step 5: With all the other terms evaluated similarly,
2 2 x2
ln
x 1 x 2 y1 x 1 L x 1 |x 1
~
1 x1 x 2
(15.2.40)
2L x1 x 2 x 1 x 2 y 2 2 2 ln x 2 ~ |
x 2
x 2 L x 1
These are the element equations similar to Eqn. (T2L2-4) etc. The flux expression is of the form
~
~ d y x
x ( y1 y 2 ) (15.2.41)
dx x 2 x 1
Step 6: Substituting the numerical values
x1 1 x2 2
we have
~
1 3 3 y1 2 2 ln 2 |x 1
~ (15.2.42)
2 3 3 y 2 1 2 ln 2
|x 2
Step 7: Applying the boundary conditions
1
First we will apply the natural boundary condition x 2 . This results in
2
~
1 3 3 y1 2 2 ln 2 |x 1
(15.2.43)
2 3 3 y 2 1 2 ln 2 1
2
Now applying the EBC y( x 1) y1 2 is done in a manner described in the section containing Eqn. (T2L2-39). The
equations reduce to
3 It is not necessary to have this term since the BCs will be imposed numerically.
1 0 2
y1
7 (15.2.43)
0 3 y 2 2 ln 2
2
2
Step 8: Solving
y1 2 y 2 1.409 (15.2.44)
Hence, the approximate solution over the element is found by substituting these values in Eqns. (15.2.34) and (15.2.41) yielding
~
y 2.591 0.591x (15.2.45)
and
~
0.591x (15.2.46)
Comparing this solution to the exact solution shows that the results are quite in error. This is because of the linear trial solution
that was assumed and because of the fact that we used only one element.
Two-Element Solution
The finite element mesh for the two-element solution is shown in Fig. 15.2.3.
1 1 2 2 3
x
x =1 x =1.5 x =2
y y y
1 2 3
Fig. 15.2.3
This is a uniform mesh since both the elements are geometrically identical. The unknowns at the three nodes are labeled
y1 , y 2 , y 3 . Just as in the Direct Stiffness Method, we will generate the element equations.
Element 1: x 1 1 and x 2 1.5
2 2 1.5 ~ |
ln x 1
1 1 1.5 1 1.5 y1 1 0.5 1 1
~ (15.2.47)
2(0.5) 1 1.5 1 1.5 y 2 2 2 ln 1.5 |
x 1.5
1.5 0.5 1 1
Element 2: x 1 1.5 and x 2 2
2 2 2 ~
ln |x 1.5
1 1.5 2 1.5 2 y 2 1.5 0.5 1.5
2
(15.2.48)
2(0.5) 1.5 2 1.5 2 y 3 2 2 ln 2 ~ |
2 0.5 1.5
x 2
2
We need to expand the notation to differentiate between the two elements, i.e. |x 1 represents the flux at x 1 for
~
1
element 1. Assembling the two equations to create the system equations, we obtain
3 ~
2 4 ln |x 1
2.5 2.5 2
0 y1
1
4 This is similar to the imposition of the natural boundary condition at the system level.
1.5
y(x) and (x)
1.0
0.5
1.0 1.2 1.4 1.6 1.8 2.0
x
Fig. 15.2.4
(1) Both the solutions satisfy the EBC at x 1. They should, as we have imposed the EBC on the final system equations.
(2) There is an improvement in the two-element solution with respect to y( x ) (An error of 4.6% and 1.3% for the one-
element and two-element solutions, respectively at x 2 ).
(3) The error in the one-element flux is large throughout the domain. The error in the flux from the two-element solution is
much less (An error of 136% and 49% for the one-element and two-element solutions, respectively at x 2 ).
(4) While the function is continuous throughout the domain, the flux is discontinuous for the two-element model at the
element interface at x 1.5 .
where n is equal to the number of DOF in the element, y i are the nodal values and i ( x ) are the shape functions. Construct
the residual and substitute the trial solution in the residual equations. Note that there are as many residual equations as there are
DOF in the element.
Step 2: Integrate by parts the highest derivative term. Integrating by parts will not only lower the highest derivative term but also
generate a boundary (flux) term.
Step 3: Rewrite the equations so that the stiffness related terms are on the left and the load terms (interior and boundary) are on
the right.
Step 4: To generate the equations completely we need to assume the exact form of the trial solution, i.e. fix the value of n . This
will enable us to generate the terms in the stiffness matrix and the load vector. We can now generate the element equations in
the form
k n n u n 1 fn 1
We also need to generate the expression for the flux.
Step 5: Using the problem data, we can generate the element equations for all the elements in the model. These equations are
then assembled into the system or global equations of the form
K N N U N 1 FN 1
for a model with N system DOF.
Step 6: Using the problem data, we impose the NBCs first. Then we impose the EBCs resulting in equations of the form
K N N U N 1 FN 1
Step 7: Solve the system equations for the primary nodal unknowns U .
Step 8: Using the primary unknowns we can compute the flux in each element.
Exercises
In the following problems the quadratic element is to be used. To obtain the shape functions for the quadratic element refer to
Section 15.6.
Problem 15.2.1
Consider the differential equation
d dy( x )
x 1 0 1 x 2
dx dx
dy
with y( x 1) 1 ( x 1) 1
dx x 2
(a) Using the element concept, use the linear polynomial as the trial solution. Apply the steps outlined in this section to obtain
an approximate solution for both the function and the flux.
(b) Repeat the problem but now use a quadratic polynomial. Compare the two solutions.
Problem 15.2.2
Consider the differential equation
d 2 dy 1
30x 204 x 351x 110x 0x 4
4 3 2
x
dx dx 12
with y(0) 1 and y(4) 0 . Using a quadratic polynomial as the (element) trial solution, obtain an approximate solution for
both the function and the flux. Compare this to the solution obtained earlier.
As in the previous section, we do not have the 0 ( x ) term since the boundary conditions will be imposed numerically. We will
drop the ~(tilde) notation for convenience’s sake. Substituting in the residual equations and integrating over the domain of
the element, we have
d dy( x )
dx ( x )
dx
( x ) y ( x ) f ( x ) i ( x ) dx 0 i 1, 2,.., n
(15.2.59)
f ( x ) ( x ) dx
i i i 1, 2., , n (15.2.61)
Step 4: Let us use the linear interpolation two-node element from the earlier section ( n 2)
x2 x x x1
1 2 (15.2.62)
L L
Substituting Eqn. (15.2.62) in Eqn. (15.2.61), we have the element equations as
k11 k12 y1 F1int F1bnd
k (15.2.63)
21 k22 y 2 F2int F2bnd
Let us look at a typical stiffness term first.
x x
2
1 1 2
x2 x x2 x
k11 ( x ) dx (x ) dx (15.2.64)
x1
x 2 x 1 2 x x 1 x
x1 2
x 1 x 2 x1
To evaluate the above equation, we need to know ( x ) and ( x ) . The functions can then be substituted and the integral
evaluated. Instead, we will assume that we know the numerical value of these two functions at the centroid of the element, i.e.
x x2
at x 1 . For this element in which the solution is assumed to vary linearly, this is the most accurate point to evaluate
2
the integral. Let us denote the centroidal values (constants) as and . Now, integrating
L
k11 (15.2.65)
L 3
We can use a similar strategy with the load terms. When all the terms are evaluated, we have,
L L fL
y
6 1 2 1
L 3 L
(15.2.66)
L L y 2 f L 2
L 6 L 3 2
The term that requires special treatment here is the last term. In general
i i x i x
2 1
(15.2.67)
From Eqn. (15.2.57), we have
( gy c ) for x x a and ( hy c ) for x x b .
Substituting this in Eqn. (15.2.67) for i 1
1 1 x 1 x
2 1
0 x gy c x g 1 y1 c 1
1 1
(15.2.68)
Similarly, for i 2
2 2 x 2 x
2 1
x 2 0 hy c x h2 y 2 c 2
2
(15.2.69)
Note that the unknown y appears in the two equations and must be moved to the LHS. Therefore, the modified element
equations are
L L
y
L 3 L 6 1 0 0 0 1
g1 h 2
L L 0 0 0 1 y 2
L 6 L 3
fL
2 c 1
(15.2.70)
f L c 2
2
The g 1 term exists provided x 1 x a . Similarly, h2 term exists provided x 2 x b . The flux at the center of the element is
given by
dy
y 2 y1 (15.2.71)
dx L
The element equations are ready, and we now need to look at engineering problems that are described as 1D BVP to illustrate
these steps and the rest of the solution.
Finally, a warning – boundary conditions can be dangerous to your health! Applying the BCs incorrectly is one of the most
common forms of errors.
u=0
F=0
w
u, x
Fig. 15.4.1
The governing differential equation is given as
d du( x )
A( x )E( x ) w( x ) A( x ) (15.4.1)
dx dx
with the boundary conditions as
At x x a , u( x x a ) u a or NBC (15.4.2)
At x x b , u( x x b ) ub or NBC (15.4.3)
The natural BC is of the form
X nx x (15.4.3a)
where n x is the direction cosine of the outward normal, X is the force per unit area and is positive if it acts in the positive x
direction. Let us look at some possibilities with respect to the boundary conditions.
Rod has a known displacement u a (incl. zero) at the left end
u ua
There is a concentrated force Fa applied at the left end in the positive x direction
du du
Fa AE or a E
dx x xa dx x xa
There is a concentrated force Fb applied at the right end in the positive x direction
du du
Fb AE or b E
dx x xb dx x xb
Using the process discussed in the previous lesson, Eqn. (15.2.70) reduces to
wAL
AE 1 1 1 2 F1
u
(15.4.4)
L 1 1 u 2 wAL F2
2
2 2
250 mm 400 mm
P=300 kN
Fig. 15.4.2(a)
Solution: Let us use m and N as the problem units. Let us use a three-element model placing a node where the concentrated
force acts5. The FE model is shown in Fig. 15.4.2(b). We have EBCs at the left and the right ends.
1 1 2 3 3 4
2
X
0.15 m 0.15 m 0.3 m
Fig. 15.4.2(b)
5This is not necessary since we can use a Dirac Delta function w ( x ) to define the concentrated force and then compute the equivalent forces
acting at the nodes of the element.
N
Element 1: E 200(109 ) , A 250(10 6 ) m 2 , L 0.15m , w 0 . Hence the element equations are
m2
3.333(108 ) 3.333(108 ) U1 F11
8 1
3.333(10 ) 3.333(10 ) U 2 F2
8
N
Element 2: E 200(109 ) , A 250(10 6 ) m 2 , L 0.15m , w 0 . Hence the element equations are
m2
3.333(108 ) 3.333(108 ) U 2 F12
8 2
3.333(10 ) 3.333(10 ) U 3 F2
8
N
Element 3: E 200(109 ) 2
, A 400(10 6 ) m 2 , L 0.30m , w 0 . Hence the element equations are
m
2.667(108 ) 2.667(108 ) U 3 F13
8 3
2.667(10 ) 2.667(10 ) U 4 F2
8
A
A+dA T=T0
qa q Q q+dq x
dx perimeter, l
Fig. 15.5.1
The governing differential equation is given as
d dT ( x )
A( x )k( x ) h( x )l ( x )T ( x ) Q( x ) A( x ) h( x )l ( x )T (15.5.1a)
dx dx
or, for a cylindrical rod with a constant cross-sectional area
d dT ( x ) hl hl
k( x ) T ( x ) Q ( x ) T (15.5.1a)
dx dx A A
with the boundary conditions as
At x x a , T Ta or q c a or q ha ( T Ta ) (15.5.2a)
At x x b , T Tb or q c b or q hb ( T T ) b
(15.5.2b)
Looking ahead to two and three-dimensional problems, these boundary conditions are special cases of the general form (for
specified heat flow)
q x n x q y n y qz nz q S (15.5.3a)
if heat qS is flowing into the surface S , and n x , n y , nz are the direction cosines of the outward normal from the surface.
Similarly, for free convection from surface S , we have
q x n x q y n y qz nz h ( TS T ) (15.5.3b)
Let us look at some possibilities with respect to the boundary conditions at the left end.
Left end is at a known temperature, T Ta
T Ta
Heat ( qa ) is flowing into the left end
q qa
Left end is insulated
q0
Free convection is taking place at the left end (ambient temperature is Ta and the convective coef is ha , T Ta )
q ha T ha Ta or q ha T ha Ta
Let us look at some possibilities with respect to the boundary conditions at the right end.
Right end is at a known temperature, T Tb
T Tb
Heat ( qb ) is flowing out of the right end
q qb
Right end is insulated
q0
Free convection is taking place at the right end (ambient temperature is Tb and the convective coef is hb , T Tb )
q hb T hb Tb
Comparing these equations to the general form of the 1D BVP6,
hl
y( x ) T ( x ) ( x ) k( x ) (x ) (15.5.4a)
A
hl dT
f ( x ) Q( x ) T k q (15.5.4b)
A dx
g 1 ha h2 hb (15.5.4c)
Using the natural and mixed boundary conditions shown above Eqn. (15.2.70) reduces to
k hl L k hl L
T
L 3A L 6A 1 0 0 0 1
h1 h2 0 1 T
k hl L k hl L 0 0
2
L 6 A L 3A
hl
Q T q h T 1
L A 1 1
q 2 (15.5.5a)
2 hl h2T
Q T
2
A
k 22 u 21 f 21 (15.5.5b)
6 The sign convention is as follows – energy (or, heat flow) into the surface or boundary is positive.
Note that on the LHS and the RHS some of the components will be zero depending on whether the boundary condition is
E E E E
NBC or mixed. A dimensional analysis will show that - T T , A L2 , k , h 2 , l L, Q 3 , c 2 ,
tLT tL T tL tL
E
the LHS and the RHS have the units 2 .
tL
Example 15.5.1
Fig. 15.5.2(a) shows a composite wall made of three materials. The outer temperature is 20 C . Convection heat transfer place
W
on the inner surface of the wall with T 800 C and h 25 2 . The thermal conductivities are
m C
W W W
k1 20 , k2 30 , k3 50 . We need to determine the temperature distribution in the wall.
m C m C m C
h, T 0
T =20 C
0
k k k
1 2 3
Fig. 15.5.2(a)
Solution: We will use a three-element model as shown in Fig. 1.5.5.2(b). Note the following - (a) This is a problem where the
convective heat exchange (gain) takes place from the left end (mixed BC). (b) There is no convective heat exchange from the
top and the bottom of the wall that are assumed to be very tall compared to the thickness of the wall. Hence, h 0 l . We
will assume a unit area for all computations, i.e A 1 m 2 . There is no internal heat generation, i.e. Q 0 . (c) The right end is
at a specified temperature (EBC). For the sake of convenience, we will not include the boundary flux load terms (assuming
inter-element flux continuity).
1 2 3 4
1 2 3
X
0.3 m 0.15 m 0.15 m
Fig. 15.5.2(b)
W W
Element 1: k 20 , L 0.3m , A 1 m 2 , h 0 , l 0 , h1 25 2 , T1 800 C , h2 0 , Q 0 , c 1 0 and
m C
m C
c 2 0 . The element equations are as follows.
20 20
0.3 25
0.3 T1 25(800)
20 20 T2 0
0.3 0.3
W
Element 2: k 30 , L 0.15m , A 1 m 2 , h 0 , l 0 , h1 0 , h2 0 , Q 0 , c 1 0 and c 2 0 . The element
m C
equations are as follows.
30 30
0.15
0.15 T2 0
30 30 T3 0
0.15 0.15
W
Element 3: k 50 , L 0.15m , A 1 m 2 , h 0 , l 0 , h1 0 , h2 0 , Q 0 , c 1 0 and c 2 0 . The element
m C
equations are as follows.
50 50
0.15 0.15 T3 0
50 50 T4 0
0.15 0.15
Assembling the equations
91.6667 66.6667 0 0 T1 20, 000
66.6667 266.6667 200.0 0 T 0
2
0 200.0 533.3333 333.3333 T3 0
0 0 333.3333 333.3333 T4 0
There is no natural boundary condition (the mixed BC was taken care of when the element equations were generated). To
enforce the EBC T4 20 , we use the elimination approach. The modified equations are
91.6667 66.6667 0 0 T1 20, 000
66.6667 266.6667 200.0 0 T 0
2
0 200.0 533.3333 0 T3 6666.6667
0 0 0 1 T4 20
Solving the equations we have (note decreasing temperature from node 1 to 4),
T1 , T2 , T3 , T4 304.76,119.05, 57.14, 20 C
We can now compute the flux in each element.
Element 1:
k1 20 W
1 ( T2 T1 ) (119.05 304.76) 12380.95 2
L1 0.3 m
Element 2:
k2 30 W
2 ( T3 T2 ) (57.14 119.05) 12380.95 2
L2 0.15 m
Element 3:
k3 50 W
3 ( T4 T3 ) (20.0 57.1) 12380.95 2
L3 0.15 m
We will now examine the results. The solution satisfies the EBC at the right end. The flux is constant throughout the domain
as it should.
Example 15.5.2
Fig. 15.5.3 shows a circular cross-section pin fin. It has a diameter of 0.3125” and a length of 5”. At the root, the temperature is
BTU
T0 150 F . The ambient temperature is T 80 F , the convective coefficient is h 6 , and the thermal
h ft 2 F
BTU
conductivity is k 24.8 . Determine the temperature distribution in the fin.
h ft F
d
T
0
Fig. 15.5.3
Solution: We will use a two-element model to solve the problem. The FE model is shown in Fig. 15.5.4.
1 2 3
1 2
X
2.5” 2.5”
Fig. 15.5.4
Note the following - (a) In this problem the left end is tied to an EBC. (b) There is no convective heat exchange from the right
end (NBC with q 0 ). (c) There is no internal heat generation, i.e. Q 0 . We will select ft as the units for length.
BTU d2
Element 1: k 24.8 , L 0.208 ft , A 5.326(10 4 ) ft 2 , T 80 F ,
h ft F 4
BTU
h 6 , l d 0.0818 ft , h1 0 , h2 0 , Q 0 , c 1 0 and c 2 0 . The element equations are as follows.
h ft 2 F
The NBC term is zero. To apply the EBC we use the elimination technique resulting in
1 0 0 T1 150
0 366.245 87.285 T2 28427
0 87.285 183.12 T3 7667
Solving, we have
T1 , T2 , T3 150, 98.82,88.97
F
The temperature decreases from left to right as expected. The element flux is computed as follows.
Element 1:
k1 24.8 BTU
1 ( T2 T1 ) (98.82 150) 6102
L1 0.208 h ft 2
Element 2:
k2 24.8 BTU
2 ( T3 T2 ) (88.97 98.82) 1174
L2 0.208 h ft 2
Let us carry out a few checks. The solution satisfies the EBC of 150 F at x 0 . The flux at the right end must be zero.
However, with this crude two-element model the error is large. To obtain better accuracy we must refine the mesh, i.e. add
more elements to the mesh. We will look at this aspect in the next two sections.
y
y 2
1
x
1 2
x
x L x2
1
1 1
1 2
Fig. 15.6.1
Fig. 15.6.2 shows the quadratic element (a) on which the solution is defined by a quadratic polynomial, (b) is described by three
nodes with nodal values y1 , y 2 and y 3 , and (c) the resulting shape functions that use the nodal values to interpolate the
solution.
~
y( x ) a1 a 2 x a 3 x 2 1 ( x ) y1 2 ( x ) y 2 3 ( x ) y 3 (15.6.1)
y
y
2 y
y 3
1
x
1 2 3
x
x L/2 x L/2 x3
1 2
1 2
1 1 3 1
These equations are similar to Eqn. (15.2.31). Solving for the a i ’s and collecting like terms, we have
~ ( x x 2 )( x x 3 ) ( x x 1 )( x x 3 ) ( x x 1 )( x x 2 )
y( x ) y1 y2 y3 (15.6.4)
( x 1 x 2 )( x 1 x 3 ) ( x 2 x 1 )( x 2 x 3 ) ( x 3 x 1 )( x 3 x 2 )
Hence,
( x x 2 )( x x 3 )
1 ( x ) (15.6.5a)
( x 1 x 2 )( x 1 x 3 )
( x x 1 )( x x 3 )
2 ( x ) (15.6.5b)
( x 2 x 1 )( x 2 x 3 )
( x x 1 )( x x 2 )
3 ( x ) (15.6.5c)
( x 3 x 1 )( x 3 x 2 )
There is an easier way to compute the shape functions that we shall see in the next module. We still have two questions to
answer before we can generate the element equations for the quadratic element. First, “Where is node 2 located within the
element?” Again, a detailed answer will be generated in Module 2. For the time being let us assume that it is located at the center
of the element. Hence,
L
( x 2 x1 ) ( x 3 x 2 ) x 3 x1 L (15.6.6)
2
Second, “How do we handle the ( x ) , ( x ) and f ( x ) terms?” We will develop a sophisticated technique in Module 2. For
the time being let us assume that they are constants within the element. Hence,
7 8 4 L 2 L L
3L
3L 3L 30 30 30
1 0 0
8 16
8 2 L
16 L 2 L g 0 0 0
3L 3L 3L 30 30 30 1
7 L
0 0 0
8
2 L 4 L
3L 3L 3L 30 30 30
fL
0 0 0 y1 6 c 1
4 fL
h3 0 0 0 y 2 0 (15.6.7)
6
0 0 1 y 3 c
fL 3
6
The flux in the element is given by
~
~ d y 2(2 x x 2 x 3 ) ( 4)(2x x 1 x 3 ) 2(2x x 1 x 2 )
( x ) y1 y2 y3
dx L 2
L 2
L2
x 4 y1 8 y 2 4 y 3 x 1 4 y 2 2 y 3 2 x 2 y1 y 3 2 x 3 y1 2 y 2 (15.6.8)
L2
In a similar manner, we can generate higher order elements involving cubic, quartic, quintic etc. trial functions with four, five,
six etc. nodes per element. The manner in which we identify these elements is usually as follows. These elements are designated
as 1D-C m interpolation order element where C m denotes that the problem variable and its derivatives up to order m are continuous
across element boundaries. In the element formulation that we have discussed so far, only the problem variable is continuous
across the element boundaries. Hence the elements are designated 1D-C 0 linear element, or 1D-C 0 quadratic element etc.
Example 15.6.1
Let us resolve the last problem from the previous section. This time we will use the quadratic element. Let us assume that the
number of nodes is the same. The FE model is shown in Fig. 15.6.3.
1 2 3
2.5” 2.5”
X
1
Fig. 15.6.3
2
BTU d
Element 1: k 24.8 , L 0.416 ft , A 5.326(10 4 ) ft 2 , T 80 F ,
h ft F
4
BTU
h 6 , l d 0.0818 ft , h1 0 , h2 0 , Q 0 , c 1 0 and c 2 0 . The element equations are as follows (
h ft 2 F
hl hlT
k, , f )
A A
7(24.8) 8(24.8) (24.8)
3(0.416) 3(0.416) 3(0.416)
8(24.8) 16(24.8) 8(24.8)
+
3(0.416) 3(0.416) 3(0.416)
(24.8) 8(24.8) 7(24.8)
3(0.416) 3(0.416) 3(0.416)
(73721.4)(0.416)
6
4(73721.4)(0.416)
6
(73721.4)(0.416)
6
181.7 116.38 1.4256 T1 5111.3
Or, 116.38 488.33 116.38 T 20445
2
1.4256 116.38 181.7 T3 5111.3
Applying the boundary conditions (EBC for T1 ), we have
1 0 0 T1 150
0 488.33 116.38 T2 37902
0 116.38 181.7 T3 5325.2
Solving,
1 150 150
2 98.92 99.84 6102 6382
3 88.97 93.26 1174 383.1
The nodal temperatures are close, but the flux values are quite different.
Example 15.7.1
Let solve Example 15.5.2 using the concepts discussed earlier. We will monitor three response quantities – temperature at the
right end, the flux at the left end that is the highest flux in the model and the flux at the right end that should converge to zero.
The results are summarized below.
Model ID Number of Element Type Temperature at Flux at right Flux at left end
elements right end end
1 2 Linear 89 1 174 6 102
2 4 - do - 90.5 542 7 804
3 8 - do - 90.9 266 8 970
4 16 - do - 91 132 9 664
5 32 - do - 91 66 10 044
6 64 - do - 91 33 10 244
7 128 - do - 91 17 10 346
The temperature at the left end converges very rapidly to 91 F . The flux at the right end appears to converge linearly. The flux
at the left end converges much more slowly for the linear element compared to the quadratic element. Figs. 15.7.1 and 15.7.2
show the plots for the flux at the left and right ends for the linear and quadratic elements.
Flux at Right End Flux at the Left End
1000 10000
750 9000
Flux
Flux
250 7000
0 6000
0 15 30 45 60 75 0 15 30 45 60 75
We achieved two primary goals with the chapter. First, we generalized the Galerkin’s Method by tying it to the element concept.
This made it easier to generate the trial solutions that are valid over an arbitrary problem subdomain (the element!). The side
effect is that we can handle problems with known discontinuities in the solution, e.g. the flux. Second, we looked at the manner
in which the boundary conditions could be applied more easily. While the problem’s essential boundary conditions were satisfied
exactly, the higher-order boundary conditions were satisfied only in the limit.
The derivation of the element equations is perhaps the most important step when implementing the FE method. We saw how
we could generate a family of trial solutions and the associated elements. There are two important issues that we will deal with
in the next module that will make it easier to generate the element equations – a more rational way to generate the shape
functions, and the isoparametric formulation.
It should be pointed out that seemingly different engineering problems are all governed by the same differential equation.
However, one still needs to contend with the physics behind the parameters and the equations.
Better solutions are obtained by (a) increasing the number of elements in the mesh or, (b) by increasing the order of the element
(trial solution), or (c) both. With the easy availability of FE computer programs, it is now possible to obtain accurate solutions
in a relatively short period of time. The 1DBVP program illustrates (in its own manner) the aspects associated with pre-
processing, solution and post-processing.
Summary
As we saw in this chapter, there are a number of engineering problems described by partial differential equations. The finite
element method is a powerful numerical technique that is routinely used in solving PDEs in a variety of engineering and scientific
areas. Very powerful computer programs capable of solving multi-dimensional, multi-physics problems are used routinely in
the design of airplanes, automobiles, electronic components, consumer goods etc.
Exercises
In all the problems below, generate all the steps by hand except solve the systems equations using a programmable
calculator or a computer program. Finally check your answers using the 1DBVP© program. Explain the differences
in answers, if any.
Problem 15.1
Consider the 4 " bar shown in Fig. P15.1 that is loaded by an axial surface traction given by the function w ( x ) x 2 lb in .
Fig. P15.1
The bar properties are as follows - E 30(10 ) psi and A 2 in 2 . Determine the stresses in the bar using
6
Comment on the results by comparing the answers with the analytical solution.
Problem 15.2
Resolve Problem 15.1 but use quadratic elements instead.
Problem 15.3
The structure shown in Fig. P15.3 is subjected to an increase in temperature T 80 C . In addition, the loads are given as
follows - P1 60 kN , P2 75 kN . Determine the displacements, stresses, and support reactions using linear elements. The
material properties are as follows.
Material A ( mm 2 ) E ( GPa ) ( mm mm C )
Bronze 2400 83 18.9e-6
Aluminum 1200 70 23e-6
Steel 600 200 11.7e-6
Bronze
Aluminum
Steel
P P
1 2
Fig. P15.3
Problem 15.4
Consider a brick wall (Fig. P15.4.4) of thickness L 30cm , k 0.7 W m C . The inner surface is at 28 C and the outer
surface is exposed to cold air at T 15 C . The heat transfer coefficient associated with the outside surface is
h 40 W m 2 C . Determine the steady-state temperature distribution within the wall and also the heat flux through the wall.
Assume a one-dimensional heat flow. Use a two linear-element model.
30 cm
h, T00
0
28 C
Fig. P15.4
Problem 15.5
Fig. P15.5 shows a thin, cylindrical rod, 1m long, composed of two different materials – the two end sections each 40
cal cm cal cm
cm long, are made of steel k 0.12 , and the center section is made of copper k 0.92
s cm C
2
s cm 2 C
is 20 cm long. The cross section is circular, with a radius of 2 cm. Heat is flowing into the left at a steady rate of
0.1cal s cm 2 . The temperature of the right end is maintained at a constant 0 C . The rod is in contact with air at an
ambient temperature of 20 C so there is free convection from the lateral surface. The convective coefficient is given
cal
as 1.5 10 4 . Determine the temperature and flux distribution in the rod. Use both the linear and quadratic
s cm 2 C
elements.
cal
0.1
s cm 2
Fig. P15.5
Problem 15.6
A variable area rectangular cross-section fin transmits heat away from a mass as indicated in Fig. P15.6. The thickness of the fin
in the direction perpendicular to the paper is ten times that shown in the plane of the paper. There is convection on the entire
lateral surface.
h c , T00
temperature, T0
3h 0 h0 T= T00
Mass at
h c , T00
L1 L2
Fig. P15.6
With h0 5 cm , L1 L 2 10 cm , T0 400 C , T 100 C , hc 10 3 W mm 2 C and k 0.30 W mm 2 C , find the
References
Alan Jeffrey, Applied Partial Differential Equations, Academic Press, 2003.
Subramaniam Rajan, Introduction to Structural Analysis & Design, e-notes, 2017.
David Burnett, Finite Element Analysis: From Concepts to Applications, Addison Wesley, 1987.
Chapter
Eigensystems
“Excellenceisacontinuousprocessand notanaccident.”A.P.J.AbdulKalam
"When a man sits with a pretty girl for an hour, it seems like a minute. But let him sit on a hot stove for a minute and it's
longerthananyhour.That'srelativity." AlbertEinstein
There are several dynamic systems where energy stored in system components gives rise to the dynamic nature of the system.
Examples include rotating masses, compressed springs etc. Without external excitation, the energy within the system will decay
to a minimum state or will oscillate between extreme states. The rate of decay of the natural modes and the frequency of
oscillation are determined by the eigenvalues of the matrix that represents the system. In this chapter, we will examine
eigensystems with the emphasis being on numerical techniques to compute eigenvalues and eigenvectors.
Objectives
To understand what eigenpairs represent and their properties.
To understand how to compute eigenpairs numerically – vector iterations methods and Jacobi Method.
Extend our knowledge of PDEs from Chapter 15 to look at one-dimensional eigenproblem suitable for modeling
engineering problems.
16.1 Eigenproblems
A square matrix A n n has an eigenpair , x consisting of the eigenvalue and the corresponding eigenvector x n1 if
A n n x n1 x n 1 (16.1.1)
From the above equation it should be clear that if x is an eigenvector then any multiple of x is also an eigenvector (though not
a distinct eigenvector). Eqn. (16.1.1) is valid if and only if
det( A I ) 0 (16.1.2)
Hence one way of computing the eigenpairs is to expand Eqn. (16.1.2) – an nth order polynomial whose roots are the eigenvalues
. These roots could be real or complex depending on the properties of A . Finding the roots of an nth order polynomial is
neither easy nor efficient. Hence, in the rest of this chapter, we will see increasingly more efficient techniques to compute the
eigenpairs.
The system of equations described by Eqn. (16.1.1) is called the standard eigenproblem. There is another class of problem
described by
A n n X n n n n Bn n X n n (16.1.3)
that is called the generalized eigenproblem where n n is a diagonal matrix containing the eigenvalues and X n n is a matrix
where every column contains an eigenvector. Note that the generalized eigenproblem is the same as the standard eigenproblem
if B I .
Example 16.1
2 1
Compute the eigenpairs of the matrix .
2 6
Using Eqn. (16.1.2) we have
2 1
2 6 2 8 10 0
2
det
2 6
8 64 40
The quadratic polynomial has the following roots 1,2 1.55051, 6.44949 .
2
For 1 1.55051
2 1 x 1 x1 x1 1
2 6 x 1.55051 x x 1 2.22474 x 2 . Or, x 21 x 0.449491
2 2 2
For 1 6.44949
2 1 x 1 x1 x 1 0.224745
2 6 x 6.44949 x x 1 0.224745x 2 . Or, x 21 x 1
2 2 2
We have normalized both the eigenvectors by making the largest element in the eigenvector equal to 1.
To better understand some of the engineering applications, consider the spring-mass system shown in Fig. 16.1.1 that
is initially at rest and where the displacement u u( t ) .
k
..
mu
m x ku
..
u(t), u(t)
Fig. 16.1.1 System and its free-body diagram
Let us assume that the system is perturbed somehow. From the FBD, using the D’Alembert’s Principle
ku mu 0 (16.1.4)
or, mu ku 0 (16.1.5)
k
Let, 2 . Substituting in the above equation we have
m
u 2 u 0 (16.1.6)
Solution to the above differential equation is of the form (sum of two harmonics)
u( t ) C 1 cos t C 2 sin t (16.1.7)
The term k m is the angular frequency (expressed in rad/s), the term f 2 is the natural frequency (expressed in
Hz) and T 1 f is the natural period (expressed in s). To obtain the constants in Eqn. (16.1.7) we need the initial conditions.
At t t 0 , u u 0 which is the initial displacement, and
u u0 which is the initial velocity.
u0
Using these initial conditions, C 1 u 0 and C 2 . Hence,
u0
u u 0 cos t sin t (16.1.8)
Or, u A cos(t ) (16.1.9)
2
u u
where A u 02 0 which is the amplitude of vibration and tan 1 0 is the phase angle. Eqns. (16.1.8) and
u0
(16.1.9) represent the solution to free, undamped vibration.
Consider a harmonic forcing function
mu ku P sin t (16.1.10)
or, u 2 u pm sin t (16.1.11)
P
where pm (force per unit mass). The total solution consists of a general solution for the homogenous part and a particular
m
solution that satisfies the whole equation. The particular solution is
u C 3 sin t (16.1.12)
pm
Substituting into Eqn. (16.1.11), C 3 . The total solution is then
2 2
The first part of the RHS is the “magnification factor” (1 ) and the second part is the equivalent “static” load, and the solution
represents the steady state forced response as shown in Fig. 16.1.2.
1 2
Resonance state
Fig. 16.1.2 Plot of vs
1The motivation for changing the notation of the matrices from A, B to K, M is that K is normally associated with stiffness and M with
mass both of which are associated with scientific and engineering problems.
x Ti Mx i 1 (16.2.6)
x Kx i i
T
i (16.2.7)
Other normalization schemes are possible such as making the largest entry in the eigenvector equal to unity. We will now look
at two methods to compute the eigenpairs. These methods have limitations in terms of the size of the problem that can be
handled efficiently. It should be noted that solving an eigenproblem is computationally more expensive than solving algebraic
equations.
k k 1
Step 6: Check for convergence using tolerance . If this condition is satisfied, then x k is the eigenvector. Otherwise
k
go to Step 2.
One should be careful in selecting the initial guess for the eigenvector x 0 . Convergence will not be possible if this vector is
orthogonal to the actual eigenvector. Later on, we will see how it may be possible to generate the initial guess knowing the
physics of the problem. Step 3 is the most expensive step in the procedure. However, if we use Cholesky Decomposition, then
only the backward substitution phase needs to be used with each new RHS vector that is created every iteration.
3 2 1
Compute the lowest eigenpairs of the following problem K 33 X 33 Λ 33 M 33 X 33 with K 33 2 2 1 and
1 1 1
1 0 0
M 33 0 1 0 .
0 0 1
Iteration 1 ( k 1) :
0 0 0
xˆ 0
k
yˆ 0
k
1
k
x 0
k
0.133737
Hence the solution (lowest eigenpair) is , x 0.307979, 0.30118
0.242203
Example Program 16.3.1 Inverse Iteration Method
Implement the Inverse Iteration Method and solve Example 16.3.1.
The algorithm presented earlier is implemented in the program. We start by examining the main program.
main.cpp
The public member function in the CInverseIteration class that computes the lowest eigenpair, GoCompute, requires 8
arguments – matrix K, matrix M, the lowest eigenvalue, a vector that goes in as the initial guess for the lowest eigenvector and
is returned as the lowest eigenvector, the maximum number of iterations, convergence tolerance, a variable (bVerbose) that is
set to true if the intermediate calculations during the iterations are to be displayed, and the output stream where the intermediate
calculations are outputted. Note the way the K and M matrices are initialized in lines 22-27. The first two arguments are the
number of rows and columns in the matrix. The values are specified row-wise. The initial guess is specified in line 28 syntactically
in a similar manner to K and M matrices but without the size of the vector being specified.
At the end of the program, we check if the computed values are accurate enough by computing the residual vector
R K1 1M1 (16.3.3)
whose elements should be close to zero. To facilitate these calculations, in lines 36 and 37 copies the K and M matrices are
made. The residual vector is computed in line 46 and displayed in line 47.
Errors arising from the eigenpair calculations are thrown in the CInverseIteration class and caught in the main program.
The program listing of the CInverseIteration class is not shown here. The program output is shown in Fig. 16.3.1.
1 2 Aij
tan 1 or, if Aii A jj (16.4.4)
2 Aii A jj 4
The basic idea then is to apply the rotation matrix to each off-diagonal element in A making it zero one at a time and continuing
the process until all the off-diagonal elements are numerically small. The algorithm is as follows.
Step 1: With k as the iteration counter, set k 0 . Let P I . Set values for and kmax .
Step 2: Scan A and located the element with the largest magnitude, Aij .
Step 3: Compute as per Eqn. (16.4.4). Set the four elements of the R matrix using Eqn. (16.4.2).
Step 4: Update P and A as P PR and A R T AR .
Step 5: Convergence check: If Aij , go to Step 6. Set k k 1 . If k kmax , go to Step 6. Otherwise go to Step 2.
max
Step 6: The eigenvalues are the diagonal elements of A and the columns of P are the eigenvectors that can then be scaled
appropriately.
The public member function in the CJacobiMethod class that computes the lowest eigenpair, GoCompute, requires 7 arguments –
matrix A, a vector to store all the computed eigenvalues, a matrix to store all the computed eigenvectors, convergence tolerance,
the maximum number of iterations, a variable (bVerbose) that is set to true if the intermediate calculations during the iterations
are to be displayed, and the output stream where the intermediate calculations are outputted. Errors arising from the eigenpair
calculations are thrown in the CJacobiMethod class and caught in the main program. The program listing of the CJacobiMethod
class is not shown here. The program output is shown in Fig. 16.4.1.
Kˆ PT KP (16.4.6)
Mˆ PT MP (16.4.7)
X PMˆ 1 2 (16.4.8)
M ˆ 1Kˆ (16.4.9)
where
Mˆ 111 0
Mˆ 221
ˆ 1
M .. (16.4.10)
..
0 ˆ 1
M nn
Mˆ 111 2 0
Mˆ 1 2
22
ˆ 1 2
M .. (16.4.11)
..
0 Mˆ nn1 2
It should be noted that as k , K Λ and M I .
One of more practical methods is the threshold Jacobi Method in which the off-diagonal elements are tested against a cutoff
value before it is determined whether they need to be zeroed out or not. Typically, the coupling between rows and columns i
and j is tested. In other words, if
k kii kij
12
2
ij tol (16.4.12)
is satisfied, then the rotation matrix is not generated and used. As a convergence check we use the following equations where
s represents some tolerance value.
K ii( k 1) K ii( k )
( k 1)
10 s i 1, 2,..., n (16.4.13)
K ii
12
K ( k 1) 2
ij
10 s i , j ; i j (16.4.14)
K ii K jj
( k 1) ( k 1)
The rotation matrix Pk for the kth iteration to zero out the off-diagonal elements K ij and M ij is defined as follows.
1
1
..
i 1
Pk .. (16.4.15)
..
j 1
..
1
Lines 15-28 shows statements similar to Example Program 16.3.1 except that a matrix (PHI) is declared to store the computed
eigenvectors and a vector (LAMBDA) is declared to store the computed eiganvalues. The CGenJacobiMethod class object is declared
in line 39 and the GoCompute member function is called in line 40-41 to compute all the eigenvalues and eigenpairs.
The accuracy of the computations is evaluated by computing the residual vector (stored as a matrix in dMResid) in lines 51-61.
R KΦ MΦ (16.4.21)
dx ( x ) dx dx i ( x ) ( x ) u dx i ( x ) ( x ) u dx
xn
d u
~
( x ) i ( x )
(16.5.4)
dx
x1
where x 1 and x n are the coordinates of the ends of the element. The last term must vanish since the boundary conditions
either are essential or homogenous (meaning zero valued) natural BC’s.
Step 3: Trial solution
~ n
Let the trial solution be represented as u( x , a ) a j j ( x ) . Hence
j 1
n
d i ( x ) d j ( x )
dx
(x )
dx
dx i ( x ) ( x ) j ( x )dx a j
j 1
n
i ( x ) ( x ) j ( x ) dx a j 0 i 1, 2,..., n (16.5.5)
j
Writing the above equation in a compact form
k n n a n 1 mn n a n1 0 (16.5.6)
x2 x x x1
1 ( x ) and 2 ( x ) (16.7.7)
x 2 x1 x 2 x1
The terms in k were evaluated earlier. We will handle the mass matrix here.
x2 x2
x2 x x x L
m11 1 ( x ) ( x )1 ( x ) dx x (x ) 2 dx m 22
x1 x1 2 x 1 x 2 x 1 3
x2
L
m12 1 ( x ) ( x ) 2 ( x ) dx m 21
x1
6
Hence
L 2 1
m 2 2
6 1 2
Once the element equations are assembled into the system equations, we obtain the system eigenproblem as
KΦ ΛMΦ (16.5.8)
that can then be solved for the eigenvalues Λ and the corresponding eigenvectors Φ .
Summary
In this chapter, we extended the ideas from the previous chapter on partial differential equations into one-dimensional
eigenproblem. We tied the solution of eigenproblems, i.e. computing eigenvalues and eigenvectors, to a number of numerical
techniques – Inverse Iteration, Jacobi, and Generalized Jacobi techniques. Larger eigenvalue problems in the area of finite
element analysis are usually solved using the Subspace Iteration and Lanczos methods.
Exercises
Appetizers
Problem 16.1
Compute all the eigenpairs of the following matrices.
1 2 3 1 3 4
(a) A 33 2 3 4 (b) A 33 3 1 2
3 4 5 4 2 1
Problem 16.2
0.0927686
Check and show that , x 0.195126, 0.191219 is the lowest eigenpair to the problem solved in Example 16.3.1.
0.12226
Main Course
Problem 16.3
Develop a class to implement the Inverse Iteration Method. Use the class to find the lowest eigenvalue and eigenvector for the
following generalized eigenproblem.
2 1 0 0 1 0 1
1 2 1 0 2
2 2
0 1 2 1 3 4 4
0 3
0 0 1 1 44 4 41 1 44 4 41
Problem 16.4
Develop a class to implement the Generalized Jacobi Method. Use the class to find all the eigenvalues and eigenvectors for the
following generalized eigenproblem.
2 1 0 0.5
1 4 1 1
3 3 3 3 3 3
Chapter
Numerical Optimization
“Behappyforthis moment.Thismomentisyourlife.”OmarKhayyam
“Happiness liesinthejoyofachievementandthethrillofcreativeeffort.”FranklinD.Roosevelt
Engineers and scientists are quite often interested in finding a solution to problems that are described by a central goal or
objective, and whose solution is restricted by constraints. Engineering design problems are examples of this scenario. When
these problems are defined in terms of a small number of variables, answers can be obtained using a paper-and-pencil approach.
However, for most practical problems, it is necessary to resort to numerical techniques. The area of numerical optimization is
vast. We will deal with introductory material in this chapter with particular emphasis on nonlinear programming (NLP)
problems.
Objectives
Become familiar with the language of design optimization.
Understand how to formulate and analytically solve simple unconstrained and constrained minimization problems.
Learn the basics of Genetic Algorithms (GA).
Learn how to formulate and solve nonlinear programming problems using GA.
x=x *
Fig. 17.1.1
From an earlier calculus course, we know for continuous differentiable functions, the necessary condition to find the minimum
df df
of a function, f ( x ) is 0 . This condition yields the stationary point(s). Therefore, 0 2( x 10) . Solving we have
dx dx
d 2 f (x x * )
x 10 . The sufficient condition that this point, x x * corresponds to a minimum is 0 . Checking,
dx 2
d 2 f (x x * )
2
2 0 . Hence, the solution to the above problem is as follows – the optimal solution is at x * 10 and the lowest
dx
value of the objective function is f ( x * ) 1 . Fig. 17.1.1 shows the problem and the solution graphically.
Local
maximum
Local
Local and global minimum
minimum
x
Fig. 17.1.2
There are other characteristics of even simple problems that we must be aware of. For example, in Fig. 17.1.3, the function
f ( x ) x 4 3x 2 x has multiple local minima. Such a function is called a multimodal function (the function in Fig. 17.1.1 is a
unimodal function). Each trough or valley captures a minimum that is local. Among all these local minima, there are one or more
points that have the absolute minimum value. These points are called the global minima as identified in Fig. 17.1.2.
x^4-3x^2+x
0
-3 -2 -1 0 1 2 3
-2
-4
x
Find x1 , x 2
To Minimize1 f ( x 1 , x 2 ) 100( x 2 x 12 )2 (1 x 1 )2
The function is shown in Fig. 17.1.4. To find the minimum, we can use the first order condition2 as
f
0 400x 1 ( x 2 x 12 ) 2(1 x 1 )
x 1
f
0 200( x 2 x 12 )
x 2
Solving, x 1 , x 2 {1,1} . The second-order condition involves computing H the Hessian matrix that contains the second-
order (partial) derivatives of f ( x ) .
2 f 2 f
x 12 x 1x 2 1200x 12 400 x 2 2 400x 1
H 2 2 2
f 2 f 400x 1 200
x 2 x 1 x 22
At the point (1,1)
802 400
H
400 200
This matrix is positive definite (all the eigenvalues are positive) implying that the point (1,1) is a minimum point.
The graphical solution while providing a visual look of the design space is difficult to use in locating the precise minimum. The
purpose of this example is to show that the complexity of a problem increases exponentially with increasing dimension of the
design space and that obtaining an analytical solution is cumbersome if not impractical.
It is difficult to formulate most engineering problems as unconstrained minimization problems. Typical engineering design
problems posed in the mathematical programming format are usually of the following form.
Find x Rn (17.1-2a)
To minimize f (x) (17.1-2b)
Subject to g i ( x ) 0 , i 1, 2,..., l (17.1-2c)
h j ( x ) 0 , j 1, 2,..., m (17.1-2d)
x kL x k x kU , k 1, 2,..., n (17.1-2e)
Performance requirements, manufacturing constraints or even the permissible range of values for the design variables can be
specified only through constraints. The constraints g i ( x ) are inequality constraints while h j ( x ) are equality constraints. The constraint
functions, similar to the objective function, can be linear or nonlinear, continuous, discontinuous or piecewise continuous,
differentiable or non-differentiable. Eqns. (17.1-2e) establish the lower and upper bounds on the permissible values of the n
design variables. These constraints are usually referred to as bound constraints or side constraints. A problem posed in the above form
is called a constrained minimization problem. An example of the design space for a two-variable constrained problem is shown in
Fig. 17.1.5.
The design variables are x 1 , x 2 . There are two inequality constraints g 1 and g 2 . The side constraints are x 1 0 and
x 2 0. The hatch marks on the lines and curves representing these four constraints indicate the constraint boundary marking
the barrier between the feasible domain and the infeasible domain. A feasible domain contains all the design points that satisfy all
the constraints. In Fig. 17.1.5, the feasible domain is bounded by these four lines. The vertices of the feasible domain are labeled
A-B-C-D. The rest of the design space is infeasible. The objective function, f is represented in the figure as isocost contours
or curves that have a constant f value. As this example indicates, by sliding the isocost contours in the direction of decreasing
objective function value, we can locate the optimal solution. The constraint g 1 controls the design and the optimal solution is
indicated with an x. The objective function has a value c 2 at this point.
x2
g
1 Isocost
contours
Decreasing
f values
x1=0
g
C 2
D
f=c
1
f=c
Feasible 2
f=c
Domain 3
A B x1
x2=0
Fig. 17.1.5 Constrained design problem
Some of the different techniques to solve simple constrained minimization problems are – exhaustive search, graphical, trial
and error and ‘constraint controlled’ optimal design. Exhaustive search is too expensive for solving practical problems. The
graphical technique can work at most for two-variable problems. The trial-and-error approach is tedious and unlikely to find a
local optimum. The ‘constraint controlled’ approach was ad hoc. We will formalize the approach in Section 17.3. Before we
look at formalizing the approach, we will look at different types of constrained minimization problems.
3 A Markovian property is encompassed in a process if the decisions for optimal return at a stage in the process depend only on the current
Special cases of this formulation include those where the design variables are integers (Integer Programming Problem), or
boolean (Zero-One Programming Problem) or discrete (Discrete Programming Problem). Perhaps the biggest difference
between the NLP and LP problems is the likelihood of multiple solutions with NLP problems and the difficulty of finding the
solution effectively.
Our focus in this chapter is to look at NLP problems. Simple NLP problems can be solved ‘by hand’. However, most
engineering problems must be solved numerically.
f f
For example, let f ( x ) x 12 2 x 1 x 2 x 23 . Then 2x 1 2x 2 and 2 x 1 3x 22 , and we can write the gradient of
x 1 x 2
T
f ( x ) as f ( x )21 2 x 1 2 x 2 2 x 1 3x 22 .
Consider the problem of minimizing f ( x ) subject to h j ( x ) 0 . A point x * is a regular point provided h(x * ) 0 and the
gradients of all the constraints at x * are linearly independent. Linear independence arises from the fact that no two gradients
are parallel to each other nor is it possible to write a gradient as a linear combination of two or more of the other gradients. For
example, let h1 ( x 1 , x 2 ) 2 x 12 3x 2 and h2 ( x 1 , x 2 ) 12 x 12 18 x 2 . Then h 1 4 x 1 3 and h 2 24 x 1 18
T T
. Let x * 2,1 . Then h 1 ( x * ) 4 3 and h 2 ( x * ) 24 18 . The two vectors h1 and h 2 are parallel (or,
T T
m
L f (x) j h j (x) (17.3.1-2)
j 1
where j is called the Lagrange multiplier. The regular point x * is a stationary point if
L f m h j ( x )
0 j i 1, 2,..., n (17.3.1-3)
x i x i j 1 x i
and h j (x) 0 j 1, 2..., m (17.3.1-4)
In other words, solving Eqns. (17.3.1-3) and (17.3.1-4) is equivalent to solving the original problem given by Eqns. (17.3.1-1a)-
(17.3.1-1c). Note that there are ( m n ) unknowns in these ( m n ) equations. However, the form of these equations is
dependent on the form of the objective function and the equality constraints, and the equations are usually not linear equations.
These conditions are known as first-order necessary conditions. The second-order sufficient conditions establish whether x *
is a local minimum or not. Treatment of these second-order conditions is outside the scope of this text4.
Example 17.3-1 Constrained Minimization with Equality Constraint
Find x1 , x 2
To minimize 4x 1 x 2
Subject to 2 x 12 x 22 1
Solution
Step 1: Form the Lagrangian
L 4 x 1 x 2 1 2 x 12 x 22 1
Step 2: Using Eqn. (17.3.1-3)
L 1
0 4 4 1x 1 x 1
x 1 1
L 1
0 1 21 x 2 x 2
x 2 21
Substituting in Eqn. (17.3.1-4) or the constraint equation, we have
2 2
1 1
2 1
1 21
3
Solving, 1 . For each root, we have
2
3 2 1 7
1 : x 1* and x 2* . f ( x * )
2 3 3 3
3 2 1 7
1 : x 1* and x 2* . f ( x * )
2 3 3 3
2 1 7
Obviously, the minimum is at x 1* , x 2* and f ( x * ) .
3 3 3
Now let us consider the following problem with equality and inequality constraints.
Find x Rn (17.3.1-5a)
4 See a book on optimal design. A comprehensive list is provided in the Bibliography at the end of the text.
where j and i are the Lagrange multipliers. The regular point x * is a stationary point if
L f m h j ( x ) l g ( x )
0 j i i k 1, 2,..., n (17.3.1-7)
x k x k j 1 x k i 1 x k
h j (x) 0 j 1, 2..., m (17.3.1-8)
gi (x) 0 i 1, 2..., l (17.3.1-9)
i g i ( x ) 0 i 1, 2..., l (17.3.1-10)
i 0 i 1, 2..., l (17.3.1-11)
In other words, solving Eqns. (17.3.1-7)-(17.3.1-11) is equivalent to solving the original problem given by Eqns. (17.3.1-5a)-
(17.3.1-5d). We will illustrate the usage of these conditions using an example.
Example 17.3-2 Constrained Minimization with inequality constraint
Find x1 , x 2
To minimize ( x 1 3)2 ( x 2 4)2
Subject to x1 x 2 5 0
x 1 0, x 2 0
Solution
Step 1: We will not use the last two constraints requiring that the design variable be positive but will use them to select the
optimal solution. Form the Lagrangian as
L ( x 1 3)2 ( x 2 4)2 1 x 1 x 2 5
Step 2: Hence, the necessary conditions to be satisfied are
L
0 2 x 1 6 1 (1)
x 1
L
0 2x 2 8 1 (2)
x 2
1 x 1 x 2 5 =0 (3)
x1 x 2 5 0 (4)
1 0 (5)
Step 3: The key to solving these conditions is to recognize the importance of Eqn. (3). Eqn. (17.3.1-10) represents the switching
conditions since from those equations either i 0 or g i 0 . Using the switching conditions, the possibilities for this
problem are discussed below.
Case 1: 1 0 (meaning that the inequality constraint, g 1 is not active)
From (1) and (2), 2 x 1 6 0 and 2x 2 8 0 , yielding x 1 3 and x 2 4 . However, this solution does not satisfy Eqn. (4).
Hence this is an unacceptable case.
Case 2: g 1 0 (meaning that the inequality constraint, g 1 is active)
Hence,
2x 1 6 1 0
2x 2 8 1 0
x1 x 2 5 0
Solving these linear simultaneous equations, x 1 2 , x 2 3 and 1 2 . These values satisfy Eqns. (4), (5) and the
requirements that x 1 0, x 2 0 . Hence x * 2, 3 . For these values, the objective function is f ( x * ) 2 .
Example 17.3-3 Constrained Minimization with inequality constraints
Find x1 , x 2
To maximize 4 x 12 2 x 2
Subject to 2x 1 x 2 4
x 1 2x 2 2
x 1 0, x 2 0
Solution
Step 1: First thing to notice about the problem is that it is not in the standard form. We must first convert the problem to a
minimization problem. Maximizing f ( x ) is equivalent to minimizing f ( x ) . Hence, f ( x ) 4 x 12 2 x 2 . Second, the second
constraint must be transformed to a 0 constraint. Note that g ( x ) a is equivalent to g ( x ) a . Hence
g 2 x 1 2x 2 2 0 . As with the previous problem, we will not use the last two constraints requiring that the design
variable be positive but will use them to select the optimal solution. Form the Lagrangian as
L 4 x 12 2 x 2 1 ( 2 x 1 x 2 4 ) 2 ( x 1 2 x 2 2)
Step 2: Hence, the necessary conditions to be satisfied are
L
0 8 x 1 2 1 x 1 2 (1)
x 1
L
0 2 1 2 2 (2)
x 2
1 2x 1 x 2 4 =0 (3)
2 x 1 2x 2 2 =0 (4)
2x 1 x 2 4 0 (5)
x 1 2x 2 2 0 (6)
1 0 , 2 0 (7)
Step 3: Using the switching conditions, there are four possibilities.
Case 1: 1 0 , 2 0 (meaning that the inequality constraints are active, g 1 0 and g 2 0 )
Hence, 2x 1 x 2 4 0
x 1 2x 2 2 0
8x 1 2 1 x 1 2 0
2 1 2 2 0
From the first two equations, x 1 6 5 . This is an unacceptable solution.
Case 2: 1 0 , 2 0 (meaning that the inequality constraint g 1 0 )
Hence, 2x 1 x 2 4 0
8x 1 2 1x 1 0
2 1 0
From the last equation, 1 2 0 , and is an unacceptable solution.
Case 3: 1 0 , 2 0 (meaning that the inequality constraints are inactive)
8x 1 0
20
The second equation expresses an invalid condition, and the solution is unacceptable.
Case 4: 1 0 , 2 0 (meaning that the inequality constraint g 2 0 )
Hence, x 1 2x 2 2 0
8x 1 2 0
2 2 2 0
1 15
Solving, x 1 1 8 , x 2 15 16 and 2 1 . The constraint g 1 is satisfied. Hence, x * , and f ( x * ) 1.9375 .
8 16
The two examples illustrate the strengths and weaknesses of the Kuhn-Tucker (K-T) conditions. Now, a few notes about the
first-order K-T conditions.
(a) A regular point is a candidate K-T point. In other words, a non-regular point cannot be used to satisfy the K-T conditions.
(b) Points that satisfy K-T conditions are local minimum points. However, all local minimum points are not K-T points. In
other words, K-T conditions may not be able to locate all local minimum points.
(c) It may not be possible to obtain an analytical solution to K-T conditions. This is because the resulting equations may be
nonlinear equations requiring a numerical method.
Exercises
Appetizers
Problem 17.3.1
An aluminum can manufacturer needs to design a (closed) cylindrical soft drink can. The volume of the can must be 400 ml.
The height, h of the can must be at least twice the diameter d and cannot be more than thrice the diameter. Packaging
considerations restrict the height to 25 cm and the usage of the least amount of sheet metal. Formulate the optimal design
problem by clearly identifying the design variables, objective function and constraints.
Problem 17.3.2
A box frame is made of steel angle and hollow tube sections as shown in Fig. P17.3-2. It must be designed for least cost. The
angle sections cost $200/m and the tube sections cost $500/m. The box frame must enclose a space of 1000 m3. A side of the
box cannot be less than 2 m. Formulate the optimal design problem by clearly identifying the design variables, objective function
and constraints.
Vertical
hollow tube
members
l
b h Horizontal
angle section
members
Fig. P17.3.2
Problem 17.3.3
A factory makes two products labeled A and B. The manufacturing of these products takes place in two stages – Stage One and
Stage Two. Departments 1 and 2 handle Stage One and Stage Two respectively. Product A requires 2 hours for Stage One and
2.25 hours for Stage Two. Product B requires 2 hour for Stage One and 1.75 hours for Stage Two. Each of the two Departments
can be operational for a total of 20 hours per day, seven days a week. The profit from the sale of product A is $1.35 per unit
and for product B is $1.2 per unit. How many units of A and B should be produced per week so as to maximize the profit?
Main Course
Problem 17.3.4
Solve Problem 17.3.1 using a graphical technique. Verify your answer using K-T conditions.
Problem 17.3.5
Solve Problem 17.3.2 using the K-T conditions.
Problem 17.3.6
Solve Problem 17.3.3 using a graphical technique. Verify your answer using K-T conditions.
Structural Concepts
Problem 17.3.7
Answer True or False. If False, state the reason(s) why.
(a) A feasible design is the one with the lowest objective function value.
(b) An optimal design problem must have constraints.
(c) The number of inequality constraints in a problem cannot be greater than the number of design variables for a problem to
be well-posed.
(d) Every optimal design problem has one or more optimum solutions.
(e) An inequality constraint h( x ) 0 can be expressed as two inequality constraints h( x ) 0 and h( x ) 0 .
Initial Randomly
Generated Population
Fitness
Evaluation
Competition
(Fitter individuals survive)
Mating Pool
(Reproduction phase)
Offsprings
(New generation)
Fitness
Evaluation
No Stop?
The process of taking a decimal number and constructing its binary representation (not value) is called encoding. Decoding is
the inverse process of taking the binary encoded value and constructing its decimal equivalent.
Continuous Design Variables: A design variable x i is between x iL and x Ui . Note that x i is a decimal number. If m bits are
available to represent x i , then the precision pi with which the number is represented is given by
x Ui x iL
pi (17.4.1-3)
2m 1
To understand the term in the denominator, let us look at the example with 3 bits. The possible binary representations with 3
bits are 000, 001, 010, 011, 100, 101, 110 and 111. Or, 8 possible combinations. Similarly, with 4 bits we have 0000, 0001, 0010,
0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110 and 1111. Or, 16 possible combinations. In other
words, if there are m bits then there are 2 m combinations or 2 m 1 intervals. The range of values between x iL and x Ui is
divided into 2 m 1 intervals.
8 1
For example, if x L 1 , x U 8 and m 3 , then p 1 . The following table shows the relationship between the
7
binary representation and their decimal equivalents with this example.
Binary Representation Decimal Equivalent
( b2b1b0 )2 x x L p( b2 b1b0 )2
000 1.0
001 2.0
010 3.0
011 4.0
100 5.0
101 6.0
110 7.0
111 17.0
000 1.0
001 2.25571
010 3.57143
011 4.85714
100 6.14286
101 7.42857
110 8.71429
111 10.0
Integer Design Variable: The above example illustrates the encoding and decoding problems when the range is not a multiple of
2 m 1 . With integer design variables, one approach is to apply Eqn. (17.4.1-3) with the precision p being 1 and compute the
least number of bits required to achieve the precision. The number of bits obviously is an integer. For example, let x L 1 ,
x U 10 and p 1 . Using
x Ui x iL
2m 1 m log x Ui x iL 1 log(2) (17.4.1-5)
p
we have m log(10 1 1)/ log(2) 3.32193 . We will round 3.32193 to the next highest integer, 4. Hence, the new precision
10 1
with 4 bits is p 4 0.6 . Once we compute the decimal equivalent, we can either truncate or round the value to an
2 1
integer.
Binary Representation Decimal Equivalent Integer Equivalent
( b3b2b1b0 )2 x x p( b3b2 b1b0 )2
L (rounded value)
0000 1.0 1
0001 1.6 2
0010 2.2 2
0011 2.8 3
0100 3.4 3
0101 4.0 4
0110 4.6 5
0111 5.2 5
1000 5.8 6
1001 6.4 6
1010 7.0 7
1011 7.6 8
1100 8.2 8
1101 8.8 9
1110 9.4 9
1111 10.0 10
While problems exist with the procedure (note that some integers - 2,3,5,6,8,9 appear more than once), experience has shown
that the procedure works quite well for most problems.
Discrete Design Variable: The representation is similar to integer design variables with x L 1 and x U q where there are q
possible discrete values. The discrete values are usually stored in a table in some sorted manner and the integer value between 1
and q is used as an index to obtain the corresponding value(s) from the table.
Zero-One (Binary) Design Variable: There is nothing special that needs to be done since we need exactly one bit to represent a zero-
one design variable.
Chromosome: To represent all the design variables in a problem, we need to create the chromosome for the problem. A
chromosome is a concatenated binary string of all the binary representations of the design variables. If there are n design
variables with m 3 to represent each design variable, then the chromosome looks as shown in Fig. 17.4.1.1 with x being 0 or
1.
xxx xxx xxx ...... xxx
x1 x2 x3 xn
Fig. 17.4.1.1 Possible chromosome (or, gene)
The number of bits do not have to be equal for all the design variables nor do the design variables have to ordered from 1 to n
in the chromosome.
The basic steps in the algorithm are discussed next.
Initial Population: The first step is to create the initial population. Unlike gradient-based methods where the search for the
optimal solution takes place by moving from one point to the next, in a GA the traits of a population (of members) are used to
move from one generation to the next. Fig. 17.4.1.2 shows an initial population consisting of z members. The initial population
is usually created randomly.
...........
i 1 fˆi i 1
fˆ
The selection probability, pi is . As can be seen from the table, the length of the segment is more for lower fitness values
SS
than for larger fitness values.
5 To generate a general procedure the fitness values are transformed, if necessary, so that all the values are greater than zero.
0 0.18 0.32 0.45 0.54 0.63 0.71 0.81 0.90 0.97 1.0
For selecting the individual into the mating pool, a random number between 0 and 1.0 is generated. For example, if 7 random
numbers are generated as 0.79, 0.10, 0.33, 0.01, 0.99, 0.51, 0.83 , then the individuals selected are 8,1, 3,1,13, 4, 9 .
In the tournament selection method, using a random number generator, two members of the population are selected. Their fitness
values are compared head-to-head and the one with the lower fitness value is put into the mating pool. This is done z times to
create the mating pool of size z . In a “double elimination” tournament selection method, all the individuals in the population
are placed in a bag. Two individuals are chosen at random. Their fitness values are compared head-to-head and the one with
the lower fitness value is put into the mating pool. These two individuals are then eliminated from the bag and the process is
repeated until the bag is empty. This will occur when the mating pool is half full. To complete the mating pool, the process is
repeated once again.
In a simple GA, once the mating pool is constructed, two parents are selected, and the reproduction process is carried out using
the crossover and mutation operators.
Crossover: There are several types of crossover operators. We will illustrate the three most commonly discussed operators.
One-point crossover. Consider two chromosomes selected randomly from the mating pool. They are labeled Parent 1 and
Parent 2 in Fig. 17.4.1.3.
Parent 1 10001001
Parent 2 00110111
Fig. 17.4.1.3 Parents selected for the crossover operation
Based on a predetermined probability a single crossover point is chosen. If the length of the chromosome is nc bits, then a
random number is generated between 1 and nc . This point or location is used as the crossover point. Two offspring are formed
and they become part of the next generation. The first offspring is formed by taking the front or left section of Parent 1 and
the rear or right section of Parent 2. The second offspring is formed by taking the front or left section of Parent 2 and the rear
or right section of Parent 1. The results are shown in Fig. 17.4.1.4.
Parent 1 10001001
Parent 2 00110111
Offspring 1 10010111
Offspring 2 00101001
Fig. 17.4.1.4 Offspring resulting from one-point crossover operation occurring at location 3
Two-point crossover. The idea of the single point crossover can be extended to include multi-point crossover locations. The
section between the first variable and the first crossover point is not exchanged. However, the bits between every other
successive crossover point are exchanged between the two parents. This process is illustrated with a two-point crossover
example (Fig. 17.4.1.5).
Parent 1 10001001
Parent 2 00110111
Offspring 1 10110001
Offspring 2 00001111
Fig. 17.4.1.5 Offspring resulting from two-point crossover operation occurring at locations 2 and 5
Uniform crossover. In uniform crossover, every location is a potential crossover point. First, a crossover mask is created
randomly. This mask has the same length as the chromosome and the bit value (parity) is used to select which parent will supply
the offspring with the bit. If the mask value is 0 then the bit is taken from the first parent; otherwise, the bit is taken from parent
2 as shown in Fig. 17.4.1.6.
Parent 1 10001001
Parent 2 00110111
Mask 00101011
Inverse mask 11010100
Offspring 1 10100011
Offspring 2 00011101
Fig. 17.4.1.6 Example showing uniform crossover
If two offspring are needed, the mask is used with the parents to create the first offspring and the inverse of the mask is used
to create the second offspring.
Mutation: This operator occurs much less frequently both in nature and in GA. Offspring variables are mutated by the small
random changes with a low probability. The basic idea is to introduce some diversity into the population. In other words, delay
the situation where all the population becomes so homogenous that no further improvement is possible. If the length of the
chromosome is nc bits, then a random number is generated between 1 and nc . The bit at that location is switched. An example
is shown in Fig. 17.4.1.7.
Before 10010111
After 10000111
Fig. 17.4.1.7 Example showing mutation taking place at location 4
Next Generation
The new generation is formed when sufficient offspring are generated in the reproduction phase. The whole process of fitness
evaluation and reproduction starts all over again with this new population. Obviously, somewhere along the evolutionary
procedure the iterative process is stopped. Typically, this is done if a predetermined number of iterations have been completed
or if the fitness function does not change appreciably. Unlike most gradient-based techniques, there is no convergence criterion
for the iterative process associated with the GA.
17.4.2 Problem Formulation
GA’s were developed to tackle unconstrained optimization problems. However, as was mentioned before, most engineering
and structural design problems are constrained optimization problems. The standard approach is to transform the original
constrained problem to an unconstrained problem as follows.
Find x
l m
To minimize fˆ ( x ) f ( x ) c i max(0, g i ) c j hj (17.4.2-1)
i 1 j 1
Subject to x kL x k x kU , k 1, 2,..., n
where c i and c j are penalty parameters. The selection of appropriate penalty weights c i and c j is always problematic even in
traditional NLP schemes. Typically, a large value is used initially for these parameters. These values are then reduced as the
design iterations continue. This feature is implemented in the EDO-GUIWB program.
6 The EDO-GUIWB Tutorial and User’s Manual is in the manual folder of the directory where the program is installed.
We will present a few guidelines for formulating and solving problems using GA’s.
(1) It is a good idea to start solving a problem with as few design variables as possible. It is easier to debug the problem
formulation with a manageable number of design variables.
(2) The selection of the lower and upper bounds must be done with care. It is necessary to have some prior knowledge of the
possible range of values that the design variables can assume. One approach is to start with a wide range and obtain the
solution. Once a solution is obtained, one can reduce the range by increasing the lower bound or decreasing the upper
bound or both.
(3) The penalty approach to handling constrained optimization problems works best if the constraints are normalized. For
example, consider a problem where 0 x 1 10 and 10 x 2 5 . Instead of writing the following two constraints as
g 1 ( x ) x 12 4 x 23 12000 0
1000
g 2 ( x ) 40 x 1 x 22 0
x 2 20
one can rewrite them as
x 12 4 x 23
g1( x ) 1 0
12000
g (x)
x x x
1
2
2 2 1 20
0
2 4
10 400
The basic idea is to avoid very large positive and negative values.
(4) Avoid using equality constraints. More often than not, equality constraints can be rewritten expressing one design variable
in terms of the others. In other words, a design variable can be eliminated from the problem. Consider a problem where
an equality constraint is
24 x 3 4 x 4 36 0
The constraint can be rewritten as
x 4 6x 3 9
and x 4 can be eliminated as a problem parameter.
(5) Changing the default GA parameters: For those occasions when the GA does not lead to a feasible solution or leads to an
unsatisfactory solution, it may be worthwhile changing the default values of the GA parameters.
Example 17.5.1 Constrained Minimization (Box Design Problem)
Find the optimal solution to the following problem.
Find x 1 , x 2 , x 3
To minimize f ( x ) x 1x 2 x 3
Subject to g 1 ( x ) 42 x 1 0
g 2 ( x ) 42 x 2 0
g 3 ( x ) 42 x 3 0
g 4 ( x ) x 1 2x 2 2x 3 0
g 5 ( x ) 72 x 1 2x 2 2x 3 0
0 x i 100 i 1, 2, 3
Solution
Step 1: We will rewrite the problem as follows, normalizing the constraints.
Find x 1 , x 2 , x 3
To minimize f ( x ) x 1x 2 x 3
Subject to g 1 ( x ) 1 x 1 42 0
g 2 ( x ) 1 x 2 42 0
g 3 ( x ) 1 x 3 42 0
x 1 2x 2 2x 3
g4 (x) 0
500
x 1 2x 2 2x 3
g5 ( x ) 1 0
72
0 x i 100 i 1, 2, 3
Step 2: Based on the problem formulation, the following variables and functions are necessary.
Step 3: The result of executing the GA option in the EDO-GUIWB program is shown in Fig. E17.5.1(a) after 70 generations.
The obtained solution is
x 24.1954,12.547,11.7152
The constraint values are
g 0.423919, 0.701262, 0.721067, 0.14544, 0.00999722
To see whether we can obtain a better solution, we will reduce the upper bound of all the design variables to 30.
Step 4: With the new upper bound, the result of executing the GA option in the EDO-GUIWB program is shown in Fig.
E17.5.1(b) after another 70 generations. The obtained solution is f ( x ) 3559.7 . The values of the design variables are
x 24.188,12.3136,11.9516
and the values of the constraints are
g 0.424095, 0.706819, 0.715438, 0.145437, 0.00997778
The optimal solution is x 24,12,12 with f ( x ) 3456 . The 5th constraint g 5 controls the design, i.e., is active at the
optimum.
200 cm
Solution
Step 1: The design problem can be formulated as follows. Converting all quantities to cm and kN,
(i) the volume of material can be expressed as 200bh cm 3 ,
100
(ii) the axial stress in the member is 20 kN cm 2 , and
bh
2 EI
(iii) the Euler buckling requirement can be stated as Pcr 100 kN .
L2
Hence,
Find x {b , h}
to minimize f ( x ) 200bh
5
subject to g1( x ) 1 0
bh
g 2 ( x ) 1 2.056(10 5 )bh 3 0
b
g3(x) 1 0
h
b, h 0
Note that all constraints are normalized.
Step 2: Based on the problem formulation, the following variables and functions are necessary.
The choice of lower and upper bounds should be based on some knowledge of the problem. In this example, the precision is
the smallest (or, finest) precision that the program will allow. One should also ask the question – When the column is fabricated
or constructed, what is the precision (or, tolerance) with which it will be made? The basic strategy is to solve the problem in
stages. If at the end of the first stage, a refined solution is needed, then you can increase the lower bound, or decrease the upper
bound or both. The net effect is that you can then reduce the precision value and (hopefully) obtain a better solution.
Step 3: The result of executing the GA option in the EDO-GUIWB program is shown in Fig. 17.5.2(b) after 70 generations.
The obtained solution is b 4.6 cm and h 22 cm and f ( x ) 20240 cm 3 . This solution is very close to the optimal solution.
Fig. E17.5.2(b)
The 2nd constraint g 2 controls the design, i.e., it is active at the optimum.
15 ft
A B
Fig. E17.5.3(a)
Solution
Step 1: Using lb , in as the units, the design problem can be formulated as follows.
Find b , h
To minimize f ( b , h ) 180bh
1.0085(10 3 )
g1( x ) 1 0
bh 2
112.05
g2(x) 1 0
bh
2b
g3(x) 1 0
h
7in h , b 15in
Step 2: Based on the problem formulation, the following variables and functions are necessary.
Step 3: The result of executing the GA option in the EDO-GUIWB program is shown in Fig. E17.5.3(b) after 50 generations.
Fig. E17.5.3(b)
The obtained solution is b 14.3in and h 7.79in and f ( x ) 20051 in 3 . The 2nd constraint g 2 and the 3rd constraint
g 3 control the design.
A 120 in B
45 0 120 in
Solution
Step 1: Using lb and in as the problem units, the optimal design problem can be stated as follows.
Find x {a AB , a BC }
to minimize f ( x ) V 120( a 2AB a BC
2
)
PAB
subject to g1( x ) 2
1 0
10000 a AB
PBC
g2(x) 2
1 0
10000 a BC
( Pcr ) AB
g3(x) 1 0
PAB
( Pcr )BC
g4 (x) 1 0
PBC
a AB , a BC 0
where a AB , a BC cross-section sides for members AB and BC
PAB , PBC magnitude of the axial forces in members AB and BC
2 EI
Pcr represents the Euler buckling capacity of the member in compression
L2
Using method of joints, PAB 10, 000 lb and PBC 14,150 lb with both the members in compression.
Step 2: Based on the problem formulation, the following variables and functions are necessary.
This example illustrates how to use simple and derived variables to reduce the amount of hand-calculations necessary to
formulate the design problem. We could have used more derived variables than shown above.
Step 3: The result of executing the GA option in the EDO-GUIWB program is shown in Fig. 17.5.4(b) after 70 generations.
Fig. E17.5.4(b)
The obtained solution is a1 1.55in and a 2 1.69in and f ( x ) 631.8 in 3 . The 3rd constraint g 3 and the 4th constraint
g4 control the design.
Summary
Numerical optimization is increasingly becoming an integral part of engineering design software systems. Design engineers find
numerical optimization tools to be of immense help in the design of next generation phones, automobiles, aircrafts,
transportation systems, and so on. In this chapter, we examined in great detail one of the population-based techniques to solve
NLP problems – Genetic Algorithm (GA). While GA can be used to solve a wide variety of problems, there are other numerical
techniques that are perhaps better at obtaining the solution to a narrower class of problems. For example, gradient-based
techniques are excellent in obtaining the local minima for problems that are continuous and differentiable.
Exercises
Appetizers
Solve the following problems using pencil and paper.
Problem 17.5.1
Fig. P17.5.1 shows a determinate planar beam. The cross-section is rectangular (height h and width w ). The allowable normal
stress is 12 MPa and the shear stress is 5 MPa. Design the minimum volume beam so that the height is not more than twice the
width.
10 kN
30
5m 5m 3m
C
A B
Fig. P17.5.1
Problem 17.5.2
How would you formulate and solve the previous problem if the weight density of the beam material was given as 6000 N m 2
?
Main Course
Problem 17.5.3
A steel pipe is moved to place by a crane using the system shown in Fig. P17.5-3. The inner pipe diameter is 40 in and the wall
thickness is 0.375 in. The length of the pipe is 24 ft. The maximum axial load capacity of the cable is 200 000 lb. Find the
distance d between the lifting points to minimize the maximum bending stress in the pipe.
d
24 ft
Fig. P17.5.3
Problem 17.5.4
Fig. P17.5.5 shows a planar two-bar truss. Support C is located directly above A. The allowable stress in member AB is 100
MPa and in member 2 is 200 MPa. Find the cross-sectional areas of members AB and BC, and the distance to design the
minimum volume truss.
A B
4m
10 kN
Fig. P17.5.4
C++ Concepts
Problem 17.5.5
Develop a class CSGA that implements a simple GA, using object-oriented concepts.
Solve the following problems using computer software such as EDO-GUIWB©.
Problem 17.5.6
It is required to design a support bracket as shown in Fig. P17.5-5. Member ADC is W16x31. Member BD has a circular hollow
cross-section. Supports A and B are pin supports and connection at D is a pin connection. Design the lightest steel member
BD so that the normal stress in the member is less than 10 ksi and Euler buckling is prevented with a safety factor of 2. The
wall thickness of the pipe cannot be less than 15% of the inner radius. Also find the optimal values for x and d .
10 lb/in
A C
D
x
B
d
10 ft
Fig. P17.5.5
Chapter
Computer Graphics
“Ambition isa lustthatisneverquenched,butgrowsmore inflamedandmadderbyenjoyment.” ThomasOtway
“Buildabettermousetrapandtheworldwillbeatapathto yourdoor.”RalphWaldoEmerson
“It is not the greatness of a man's means that makes him independent, so much as the smallness of his wants”. William
Cobbett
Simply defined, computer graphics implies the use of computers in creating graphical images. For engineers and scientists, these
could be as simple as an x-y graph all the way to a rendition of virtual reality scenes. In this chapter we will study some of the
basics of computer graphics using Microsoft’s Microsoft Foundation Classes (MFC)1 and Fast Light Toolkit (FLTK)2.
Objectives
To understand and practice the concepts associated with computer graphics.
To draw and manipulate three-dimensional wireframe images using MFC.
To draw using FLTK.
1 https://learn.microsoft.com/en-us/cpp/mfc/mfc-desktop-applications?view=msvc-170
2 https://www.fltk.org/
18.1 Introduction
Computer-generated images are of immense help to engineers and scientists as they help in the visualization and understanding
of concepts. A simple x-y graph shows the relationship between two variables. Fig. 18.1.1 shows the relationship between the
length of a cantilever beam and its tip displacement for a given tip load and beam.
(a) (b)
Fig. 18.1.3 Rendition of 3D bracket using OpenGL (a) Hidden lines removed (b) Shading with one light source
TM
In the rest of the chapter, we will examine how to display wireframe images on a computer screen. Most computer screen
displays are either liquid crystal displays (LCDs) or organic light-emitting diodes (OLED). The resolution of these devices
(display resolution) is expressed in pixels and dot pitch. For example, today the high-end displays can have a resolution of 3440
x 1440 pixels at 60 Hz, and a pixel pitch of 0.233 mm x 0.233 mm. These displays are driven by video cards that support the
display resolution and other characteristics.
Displaying text: A string of characters is displayed at a specified location x , y using specified font characteristics.
These operations require several lower-level instructions to generate the required information.
Obtain Drawing Area Dimensions: Query the system to obtain the dimensions of the rectangular area in which the display will be
generated.
Select Pen: Define a pen with a drawing style (solid, dashed etc.), a width (number of pixels) and a color, and select it as the current
pen to use.
Move: The instruction is to move the current position to the point specified by its x and y coordinates.
Draw: The instruction is to draw a line from the current position to the point specified by its x and y coordinates using the
currently selected pen.
Output text: The instruction is to write a character string at the specified location using the currently selected font, background
color, and text color.
In the last section of this chapter, we will see how to use these functions from the Microsoft Foundation Classes.
Translation
Translation of a point (x,y,z) through (a,b,c) can be simply achieved by computing the new coordinates as
x x a
y
y b (18.3.1)
z
new z c
Z X
yd
(1000, 1000)
(1024, 768)
Fig. 18.4.3 An example device coordinate system
Fig. 18.4.2 Virtual coordinate system
Here are the key concepts as the display of the object moves from MCS to VCS to DCS. The range of x, y, z values must be
scaled so that the initial display of the object fits into the square VCS. This ensures that initially, the entire object will fit into the
screen with the center of the object centered in the VCS coordinate system. The final step is then to transform the (x, y)
coordinate of each key point in the VCS to its appropriate location in the DCS. Usually, the display is not a square and hence,
the shorter dimension (usually the height) is used in computing the scaling factor from VCS to DCS so that the object is centered
and fully visible on the screen. As we will show in the wireframe display algorithm, the scaling from VCS to DCS can be isotropic
so that there is no distortion (a square in the VCS remains a square in the DCS but there may be a large dead area on the left
and right sides of the display), or anisotropic (so that the object completely occupies the screen).
The algorithm for the wireframe display of a space truss is given below. Note that A is the dimension of the VCS and a is the
border on the four sides in which no drawing takes place. For the DCS, x drange , y drange represent the range of device coordinates
(pixels) available in the x and y directions.
Algorithm
Step 1 Compute model limits in physical or model coordinates
Loop thro’ all nodes to compute X min , X max , Ymin , Ymax , Z min , Z max
Apply the above formula for the 8 corners of the viewing box and compute X v
min
v
, X max , Yminv , Ymax
v
,
Z v
min
v
, Z max using those 8 transformed coordinates.
Compute the coordinates of the center of the model as
Xb X min
v
X max
v
1 v
Yb Ymin Ymax
v
(18.4.3)
Z 2 Zv Zv
b min max 31
Aa Aa
Compute scaling factor s min v , v v
(18.4.4)
X max X min Ymax Ymin
v
x v X i X b A 2
s (18.4.6)
yv Yi Yb A 2
Now compute the device coordinates using one of the following formulae:
x d x v 0.5
yd yv 0.5
(18.4.6)
x d x x v 0.5
y d y yv 0.5
where
x x drange A , y ydrange A (18.4.7a)
and min( x , y ) (18.4.7b)
Move to x d , yd .
4m
1 Y 2
1 3
X
3m 3m
Fig. E18.4.1.1
Solution
Step 1: Here are the results.
X min , X max ( 3, 3) , Ymin , Ymax (0, 4) and Z min , Zmax (0, 0) .
Hence, X min
v v
, X max ( 3, 3) , Yminv , Ymax
v
( 2, 2) and Zmin
v v
, Z max (0, 0) . Also, X b , Yb , Zb (0, 0, 0) . The scaling
factor can then be computed as
1000 100 1000 100
s min , 150
33 22
Isotropic Mapping
x d x v 0.5 50 0.5 39
0.768
yd yv 0.5 200 0.5 154
Anisotropic Mapping
x d x x v 0.5 1.024 50 0.5 52
y 0.5
yd y v 0.768 200 0.5 154
Isotropic Mapping
x d x v 0.5 500 0.5 385
0.768
y
d y
v 0.5 800 0.5 615
Anisotropic Mapping
x d x x v 0.5 1.024 500 0.5 513
y 0.5
yd y v 0.768 800 0.5 615
Isotropic Mapping
x d x v 0.5 950 0.5 730
0.768
yd yv 0.5 200 0.5 154
Anisotropic Mapping
x d x x v 0.5 1.024 950 0.5 973
y 0.5
yd y v 0.768 200 0.5 154
(0, 0) (0, 0)
(730, 154) (973, 154)
(39, 154) (52, 154)
1 3 1 3
2 2
(385, 615) (513, 615)
the window, clicking the left mouse button, clicking the right mouse button, typing on the keyboard etc. generate a message
that is sent to the program. The program responds to these events through message handlers. One of the key functions in any
CView derived class is the OnDraw function. The OnDraw function is called whenever the program window needs to be redrawn,
e.g., if the window is resized, or if the underlying object that is being displayed has changed, or if a window covering the program
window has closed, etc.
Now, click File, then Open and select the file star.dat. This will read the contents of the file and display the object defined in the
file star.dat – Figure 18.5.1(b). Use the Visual Studio editor to examine the contents of the file. You will see the file define the
number of points and lines, followed by the (x, y) coordinates of the five points (in pixels), and the (start point, end point) of
the five lines. If you are wondering where the file is read and the data is stored, look for the function void
CGraphicsView::OnFileOpen().
(a) (b)
Fig. 18.5.1 Program screen after adjusting the window height and width a few times (a) before the input file is read
(b) after the file star.dat is read
Let’s look at the source code for the void CGraphicsView::OnDraw(CDC* pDCX) function (open the file graphicsView.cpp) in
bits and pieces. The first part is shown in Fig. 18.5.2.
rectangular dimensions of the left pane in Fig. 18.5.1. The background is set to black in line 125. In lines 128-129 a white pen
is created. The current pen is replaced with the white pen in lines 132-134. The text color is also set to white – line 137.
Display
Screen
Program Popup
Window 2
Window
Fig. 18.6.1 Schematic diagram showing how a typical FLTK program displays widgets
3 F. Costantini, D. Gibson, M. Melcher, A. Schlosser, B. Spitzak, and M. Sweet, FLTK 1.3.8 Programming Manual, Rev. 9.8, 2021.
The Program Window is created when the program execution starts. From within the program, multiple Popup Windows can
be launched. Withing each Popup Window the widgets supported by FLTK can be displayed as we will show in Example
18.6.1. Note that the origin of the Popup Window coordinate system is at the top left similar to what is shown in Fig. 18.4.3.
Fig. 18.6.2 Source code in main () showing how the first Popup Window is displayed
Lines 168-169 are used to obtain the height and the width of the available screen using Fl::h() and Fl::w(). The display strategy
is to launch the Popup Windows with their initial size being half the screen width and height. In line 175, the first Popup
Window is defined. The term Fl_Double_Window denotes a class where double-buffering is implemented to minimize screen
flickering. In lines 178-179 the first widget is defined. The first widget is a triangle. The DrawTriangle object is used to define a
triangle with vertices at (50, 10), (250, 10), and (150, 300). Similarly, the DrawCircle object is used to define a circle with the
center at (400, 400) with a radius of 150. Note that the values are in integer device coordinates. The cautionary note on line 170
should indicate that the (x, y) coordinates cannot have any value. Lines 180 and 185 are used to make the Popup Window
resizable using the mouse to resize the height and width of the window. Finally, line 188 is used to show the widgets drawn in
the first Popup Window on the screen.
Fig. 18.6.3 Source code in main () showing how the second Popup Window is displayed
Line 195 is used to define the second Popup Window (Fig. 18.6.3). Eleven (x, y) data points are defined and stored in lines 197-
198. In lines 199-200, the DrawXY object is used to define a series of straight lines that connect these eleven points with the first
point being at (0, 0) and the last point at (1000, 225). The entire window is used, and the color of the line is red. The default line
type and thickness are used. To change the line type and thickness, the following function needs to be called after setting the
color.
void fl_line_style (int style, int width=0, char* dashes=0);
Finally, Fig. 18.6.4 shows the source code to display the third window. The graph of sin( t ), 0 t 15 is created with an
increment of 0.5 (line 216) and is scaled by a factor of 125.0 (line 218) so that the entire range of values is between -125.0 and
125.0 (line 212).
Line 229 shows the call to Fl::run() function that puts the program into a loop waiting for user input either from the keyboard
or from the mouse. In this example program, execution can be terminated simply by closing the Program Window.
Fig. 18.6.4 Source code in main () showing how the third Popup Window is displayed
The actual drawing and display of the widgets takes place in the three classes – DrawTriangle that draws a triangle, DrawCircle
that draws a circle, and DrawXY that draws a series of straight lines connecting n points. Each of these classes is publicly derived
from the Fl_Widget and has two important functions – the overloaded constructor and the draw() function. The reader is urged
to review the code, e.g., lines 40-79 where the triangle is defined and drawn on demand. Finally, the overall screen display is
shown in Fig. 18.6.5.
Summary
The basics of computer graphics were covered in this chapter. Functionalities in the Microsoft Foundation Classes were used
to display the wireframe image of a three-dimensional truss. The examples in this chapter illustrate the basic ideas behind a
GUI-program. MFC is an older technology that is still supported by Microsoft for a variety of reasons. There are excellent
toolboxes available that support MFC with the most popular one being the one sold by BCGSoft4. In addition, we also looked
at open-source library, Fast Light Toolkit (FLTK) that provides excellent integration with C++ source code.
4 https://www.bcgsoft.com/
Exercises
Appetizers
Problem 18.1
Use the truss shown in Example 18.4.1 and rotate the display by 300 about the z-axis. Compute the new screen coordinates and
draw the truss.
Main Course
Problem 18.2
Consider the space truss shown in Fig. P18.2. Draw the initial view of the truss assuming that A 1000, a 100 and
x range
d , y drange (1024, 768) .
4
7’
6’ 2 3
1 3’
1 3
x 9’
y
2
Fig. P18.2
C++ Concepts
Problem 18.3
Develop a MFC program to display an x-y graph given a set of (x, y) data from an external file.
Problem 18.4
Modify the program ASUTruss© by adding the following capabilities.
(1) Carry out finite element analysis of the truss.
(2) Display both the undeformed and deformed truss using a specified magnification factor.
(3) Support rotation using the mouse left button.
(4) Support zoom in and zoom out using the mouse left button.
References
C. Pokorny, Computer Graphics – An Object-Oriented Approach to the Art and Science, Franklin, Beedle & Associates, Inc., Wilsonville,
OR, 1994.
J. Prosise, Programming with MFC, Microsoft Press, 1999.
E. Kain, The MFC Answer Book: Solutions for Effective Visual C++ Applications, Addison-Wesley Professional, 1998.
Bibliography
C++ Programming Language
Gary Bronson, C++ for Engineers and Scientists, Brooks/Cole Publishing Company, 1999.
Gary Bronson, Program Development and Design Using C++, Brooks/Cole, 2000.
Dietel and Dietel, C++ How to Program, 3rd Edition, Prentice-Hall, 2000.
Stanley Lipmann, Josee Lajoie and Barbara Moo, C++ Primer, Addison-Wesley, 2005.
Rick Mercer, Computing Fundamentals with C++, 2nd Edition, Franklin, Beedle and Associates, 1998.
Data Structures
Gilberg and Forouzan, Data Structures: A Pseudocode Approach with C++, Brooks/Cole, 2001.
Kruse and Ryba, Data Structures and Program Design in C++, Prentice Hall, 1999.
Numerical Analysis
Press, Flannery, Teukolsky and Vetterling, Numerical Recipes in C, Cambridge Press, 1988.
Schilling and Harris, Applied Numerical Methods for Engineers Using Matlab and C, Brooks/Cole, 2000.
Rao, Applied Numerical Methods for Engineers and Scientists, Prentice Hall, 2002.
Cockburn, Writing Effective Use Cases (The Crystal Collection for Software Professionals), Addison Wesley, 2000.
Bergin, Data Abstraction – The Object-Oriented Approach Using C++, McGraw-Hill, 1994.
Schach, Classical and Object-Oriented Software Engineering with UML and C++, McGraw-Hill, 1999.
Lee and Tepfenhart, UML and C++: A Practical Guide to Object-Oriented Development, Prentice-Hall, 2001.
Appendix
1.0 Introduction
Microsoft Visual Studio (MSVS) is an Integrated Development Environment (IDE) that programmers use to develop (create,
debug and maintain) and launch computer programs written in a number of high-level languages. In this document we will see
how two types of computer programs – console applications and Microsoft Foundation Classes (MFC) applications can be
developed using C++. We will also see how to use the debugging features of MSVS.
Console applications are programs where user input is text-based typically via a keyboard and/or external file and output is also
text-based typically to the console (computer monitor) and/or external file. As we saw in Chapter 18, MFC (Microsoft
Foundation Class) is a collection of classes that is provided with VS 2022 to help developers build graphical-user interface (GUI)
programs to work with various flavors of Windows OS.
There are several differences between a console application and an MFC application that the reader must be aware of. Here is
a short but important list.
(1) The usual std::cout and std::cin will not work with an MFC application.
(2) There is no main program – MFC applications are event-driven.
(3) MFC provides resources normally associated with Windows applications – toolbar, menu, status bar etc.
(4) MFC applications use precompiled headers.
Click File, New and then Project. Make sure that C++, Windows and Empty Project are selected as shown in Fig. A2.1.2(a).
Fig. A2.1.5 VS2022 user interface showing the system ready for program development. Note that the current
version is Debug and x64 (64-bit application). Both these options can be changed by a different selection from the
dropdown combo-box menu
editor window. The line number is shown in the editor pane by turning that option on (Tools-Options-Text Editor-All
Languages-General, then check Line numbers).
Fig. A2.1.7 Program interface showing C++ editor ready to accept creation of file called main.cpp
Type in the C++ statements as shown in Fig. A2.1.8. Note that we have not saved the file yet (Hint: main.cpp* indicates that
the file contents have changed and that the file has not been saved).
We will use the default options for the rest of the selection. Click the Finish button and the MFC project is created (Fig. A2.2.5).
Output Window
As we have seen before the output window normally appears at the bottom right corner. It is in this window that compile, link,
and possibly, error messages appear.
Breakpoints
The breakpoints set the locations where execution should be stopped to allow the user to examine the program code, variables,
and to make changes, continue execution, or terminate execution. The breakpoints can be toggled on and off by the F9 key.
The debugging stage requires that at least one breakpoint is set in the code.
Action: Build the program as we discussed in Section 2.1. First, edit the files main.cpp, showmessage.h and showmessage.cpp
as shown in Fig. A3.1. Then, build the solution.
Action: Position the cursor at line 9 in the file main.cpp and press the F9 key. A red dot appears in the grey border at the
beginning of the line as shown in Fig. A3.2.
Action: Press the F5 key. The debugger launches our program, and the execution is suspended once execution reaches line 9.
The resulting screen is shown in Fig. A3.4. Note that at any stage pressing the F5 key continues the debugging process until the
next breakpoint is reached, or an error is encountered, or the program terminates.
Call Stack
This window shows all the functions in the stack starting with the function where execution is currently suspended.
Autos Window
The Autos window (Debug -> Windows -> Autos) will display variables and expressions from the current line of code, and the
preceding line of code. Note the values of variables nA, nB and nSum (the variable is uninitialized!).
Locals Window
You can open the locals window from the Debug menu (Windows -> Locals). The locals window will automatically display all
local variables in the current block of code. If you are inside of a method, the locals window will display the method parameters
and locally defined variables.
Watch Window
The Watch window is used to examine user specify variables and expressions while debugging your program. You can also
modify the value of a variable using this window. Just double-click on the value and type in another value.
Action: We will step into the AddThem function. “Step into” means that the debugging execution will step into the function
and pause before the first statement in that function is executed. Press the F11 key.
The resulting screen is shown in Fig. A3.5.
Appendix
“Itisa miraclethatcuriositysurvivesformaleducation.”AlbertEinstein
“Teachingisaverynobleprofessionthatshapesthecharacter,caliber,andfutureofanindividual.Ifthepeopleremember
measagoodteacher,thatwillbethebiggesthonourforme.”A.P.J.AbdulKalam
1.0 Introduction
Standard Template Library, or STL for short, is a library of data structures and generic algorithms. The initial ideas of STL was
drafted and created by Alexander Stepanov back in 1979. Over the intervening years, Stepanov collaborated with a number of
researchers at various locations – AT&T Bell Labs and HP Research Labs. Finally, working with David Musser and Meng Lee
Stepanov presented the first draft of the STL to the ANSI/ISO committee for C++ standardization in 1993. Since then almost
all C++ compiler developers have included STL as a part of the C++ compiler suite. STL functionalities can be divided into
three parts – containers, iterators, and algorithms. We will discuss these three elements in some more detail next.
Containers
A container is an object that stores data. There are 4 major container classes in STL – sequence containers, container adaptors,
associative containers, and unordered associative containers.
Sequence Container: A sequence container contains arbitrary elements (int, float, user-defined objects, etc.) that are ordered, i.e.,
their location in the container is a function of when and how they are created but not on their value. Examples of sequence
containers include vector, deque, and list.
Associative Container: An associative container contains arbitrary elements that are sorted, i.e., their location in the container is a
function of their value. Examples of associative containers include set, multiset, map, and multimap.
Unordered Associative Container: An unordered associative container is like an associative container except as the name implies, the
elements are not ordered. Examples of unordered associative containers include set unordered_set, unordered_multiset,
unordered_map, and unordered_multimap.
Container Adaptor: A container adaptor can be used to change the interface of another component. For example, the stack
container is an adaptation of deque to provide last-in, first-out behavior.
All container classes have member functions that allow them to work with iterators.
begin() Returns an iterator that points to the beginning of the container.
end() Returns an iterator that points past the end of the elements in the container.
Iterators
An iterator is an object that can be used to move through all or part of the elements in a container and represents a specific
position in the container. A pointer is the most widely used form of the iterator. The following operations are supported.
Operator *
Returns the element at the current position. Depending on the type of the element, the ‐> can also be used.
Operator ++
Is used by the iterator to position to the next element.
Operator == and !=
These are boolean operators used to check if two iterators represent the same location or not.
Operator =
Is used to assign a position to an iterator.
For example, to traverse through all the elements in a deque object storing float values, the following loop can be used
std::deque<float>::const_iterator dqitAny;
for (dqitAny = dqAny.begin(); dqitAny < dqAny.end(); ++dqitAny)
{
std::cout << *dqitAny << ", ";
}
Algorithms
An algorithm, as we saw first in Chapter 1, is a set of clearly defined steps that defines how a particular problem can be solved
and once that can be readily translated to a computer program. Data stored in containers can be processed via several useful
algorithms such as copying, modifying, reordering, searching, sorting, etc. Since the processing functionalities are available as a
template function they can be called from any part of a program. As we will see in Section 4, sorting the elements in a vector
can be carried out simply by
std::sort (x.begin(), x.end(), std::greater<int>());
where x is a vector and std::greater is a binary function object that returns true or false depending on the values of the two
arguments compared via the > operator. A function object is an instance of a class where the operator () is defined as a member
function. No additional code needs to be written if the data type is a C++ standard data type – int, float, double, etc. Additional
code can be written, customized for other data types.
2.0 Deque
An example of a sequence container is deque (pronounced deck) that is short for double-ended queue. Its size can be
dynamically varied by expanding or contracting at its front or back end. While conceptually, a deque appears to be like a vector,
the internal implementations of the two data structures are very different. The biggest difference is that unlike a vector, elements
in a deque may not occupy adjacent or contiguous memory locations. Typically, data stored in a deque can be inserted or
removed more efficiently if these operations take place at the two ends. Here is a list of functionalities associated with the deque
sequence container template.
Example STL_DQ_Advanced
This example illustrates most of the functionalities of the deque container. Note that #include <deque> is needed to use this
template. Snippets of the example code are shown and discussed below.
The objects – a deque container storing float values and an iterator to refer to the deque object are declared in lines 61-62.
Initially 4 values (1.1, 2.0, 3.3, 4.4) are stored at the end of the deque by using the push_back function in lines 65-68. The function
display_deque is then called to show the current contents.
Note the use of a const iterator to iterate through the contents and dereferencing the pointer to access the value (line 54).
The size member function is used to display the current size of the deque object. Several operations are shown below to
manipulate the contents.
The [] operators can be used to access known
elements or locations (line 73).
New values can be inserted at specified locations
using the insert member function (line 78). This
function returns a pointer to the first location of
newly inserted element(s).
3.0 List
So far, we have seen two containers – vector and deque. Unlike these two, a list container does not allow random access, but
insertions and deletions are extremely fast. Here is a list of functionalities associated with the list container template.
Example STL_List_Example
This example illustrates some of the functionalities of the list container. Note that #include <list> is needed to use this
template. Snippets of the example code are shown and discussed below.
In the example, two integer lists are constructed. They are then merged into a single list and sorted. A copy of the merged list is
created. One of the instances of the list is then checked and all the duplicate values are removed. The main program is shown
below.
The function to help sort the values in descending order is shown below.
4.0 Sorting
There are several algorithms that are supported in STL and we will look at one of the most widely useful one – sorting. Sorting
can be done on C++ data types or on user-defined objects. In the example shown below, we look at using sorting with C++
data type, integers.
Example STL_Sorting
This example shows how the sorting functionality within STL can be used. Note that #include <algorithm> and #include
<functional> are needed.
5.0 Map
A map is a sorted associative container where a unique key is associated with a “value” like an array. For example, in the US,
the social security number (SSN) is a 9-digit number that is an identification number – no two people in the US have the same
number. Hence if the problem is to store information such as name, age, date of birth, gender and so on, associated with each
person, the SSN can be used as the key and the associated information as the “value”. Here is a list of functionalities associated
with the map container template.
Example STL_MAP_Example
There are two examples in the program, but we will focus only on Example 1. Note that #include <map> is needed to use the
map container. The data handled in the program deals with storing and manipulating the max. temperature in various cities on
a given day. The interactive program is used to store, delete, find, and list the <city, temperature> data. The main function
associated with this example is shown below.
Line 17 shows the declaration of the map container.
Initially CityData contains no data. The AddData function is called to obtain the name of the city and the max. temperature for
the data.
Lines 35-36 show how the name of the city and the
temperature values are obtained.
The reader is urged to look at Example 2 in the program for an enlightening usage of the map container in the context of a
finite element program.
6.0 Multimap
The multimap is an associative container where duplicate keys are allowed. The elements are formed by a combination of a key
value and a mapped value. For example, one could store the names of all the members in a family tree where the last name is
the key, and the first name is the value. Here is a list of functionalities associated with the multimap container template.
Example STL_MultiMap_Example
Note that #include <multimap> is needed to use the map container. In this example program, we will expand on the map
example discussed in the previous section, by storing the temperature at various locations in a given city. The underlying data is
shown below.
The struct tl_data is used to store the temperature at a
specified location (intersection of two roads).
References
Nicolai M. Josuttis, The C++ Standard Library – A Tutorial and Reference, Addison-Wesley, 1999.
Herbet Schildt, STL Programming from the Ground Up, Osborne/McGraw-Hill, 1999.
David Musser, Gillmer Derge and Atul Saini, STL Tutorial and Reference Guide, 2nd Edition, Addison-Wesley, 2001.
Appendix
The table below shows a comprehensive but not complete list of all major C++ keywords and identifies the first time the
keyword is introduced and discussed in the book. A blank entry indicates that the keyword is not discussed in this version of
the book simply because there is no need for the keyword to be used in the context of material in the text.
Keyword Remarks Introduced
in Chapter
and Same as &&
asm To declare that a block of code is to be passed to the assembler
auto A storage class specifier that is used to define objects in a block 4
bitand Same as &
bitor Same as |
bool Boolean false-true type that can hold either the false or true literals 2
break Terminates a switch statement or a loop 3
case Used specifically within a switch statement to specify a match for the statement's 3
expression
catch Specifies actions taken when an exception occurs 4
char Fundamental data type that defines character objects 2
class To declare a user-defined type that encapsulates data members and operations or 7
member functions
const To define objects whose value will not alter throughout the lifetime of program 2
execution
consteval Used with a function that is executed at compile time
constexpr Used to declare that a function is fit for use in constant expressions.
const_cast Used to cast away the constness of variables
constinit Used with a variable that are initialized at compile time
continue Transfers control to the start of a loop 3
decltype Asks the compiler to determine the type of a specified expression
default Handles expression values in a switch statement that are not handled by case 3
delete Memory deallocation operator 8
do Indicates the start of a do-while statement in which the sub-statement is executed 3
repeatedly until the value of the expression is logical-false
double Fundamental data type used to define a floating-point number 2
dynamic_cast Casts a datum from one pointer or reference type to another, performing a runtime 13
check to ensure the validity of the cast.
else Used specifically in if-else statement 3
enum To declare a user-defined enumeration data type 3
explicit To declare an explicit constructor
export Allows a template definition to be accessible from another translation unit.
extern An identifier specified as extern has external linkage to the block 4
false Boolean literal of value zero 2
float Fundamental data type used to define a floating-point number 2
for Indicates the start of a for statement to achieve repetitive control 3
Most of the C++ keywords not appearing in this text will be covered in the follow up text tentatively titled Software Development
for Engineers using C++ that is currently under preparation.