Fortran Book
Fortran Book
Chirila · Gerrit Lohmann
Introduction to
Modern Fortran
for the Earth
System Sciences
Introduction to Modern Fortran for the Earth
System Sciences
Dragos B. Chirila Gerrit Lohmann
•
Introduction to Modern
Fortran for the Earth
System Sciences
123
Dragos B. Chirila
Gerrit Lohmann
Climate Sciences, Paleo-climate Dynamics
Alfred-Wegener-Institute
Bremerhaven
Germany
“Consistently separating words by spaces became a general custom about the tenth century
A.D., and lasted until about 1957, when FORTRAN 77 abandoned the practice.”
Since the beginning of the computing age, researchers have been developing
numerical Earth system models. Such tools, which are now used for the study of
climate dynamics on decadal- to multi-millennial timescales, provide a virtual
laboratory for the numerical simulation of past, present, and future climate transi-
tions and ecosystems. In a way, the models bridge the gap between theoretical
science (where simplifications are necessary to make the equations tractable) and
the experimental science (where the full complexity of nature manifests itself, as
multiple phenomena often interact in nonlinear ways, to form the final signal
measured by the apparatus). Models provide intermediate subdivisions between
these two extremes, allowing the scientist to choose a level of detail that (ideally)
strikes a balance between accuracy and computational effort.
The development of models has accelerated in the last 50 years, largely due to
decreasing costs of computing hardware and emergence of programming languages
accessible to the non-specialist. Fortran, in particular, was the first such language
targeting scientists and engineers, therefore it is not surprising that many models
were written using this technology. To many, however, this long history also causes
Fortran to be associated with the punched cards of yesteryear and obsolete software
practices (hence the quotation above). A programming language, however, evolves
to meet the demands of its community, and such was also the case with Fortran:
object-oriented and generic programming, a rich array language, standardized
interoperability with the C-language, free-format (!), and many more features are
now available to Fortran programmers who are willing to take notice.
Unfortunately, many of the newer features and software engineering practices
that we consider important are only discussed in advanced books or in specialized
reference documentation. We believe this unnecessarily limits (or delays) the
exposure of beginning scientific programmers to tools, which were ultimately
designed to make their work more manageable. This observation motivated us to
vii
viii Preface
write the present book, which provides a short “getting started” guide to modern
Fortran, hopefully useful to newcomers to the field of numerical computing within
Earth system science (ESS) (although we believe that the discussion and code
examples can also be followed by practitioners from other fields). At the same time,
we hope that readers familiar with other programming languages (or with earlier
revisions of the Fortran-standard) will find here useful answers for the “How do I do
X in modern Fortran?” types of questions.
Chapters Outline
In Chap. 1, we start with a brief history of Fortran, and succinctly describe the basic
tools necessary for working with this book. In Chap. 2, we expose the fundamental
elements of programming in Fortran (variables, I/O, flow-control constructs, the
Fortran array language, and some useful intrinsic procedures). In Chap. 3, we
discuss the two main approaches supported by modern Fortran for structuring code:
structured programming (SP) and object-oriented programming (OOP). The latter
in particular is a relative newcomer in the Fortran world.
The example-programs (of which there are many in the book) accompanying the
first three chapters are intentionally simple (but hopefully still not completely unin-
teresting), to avoid obfuscating the basic language elements. After practicing with
these, the reader should be well equipped to follow Chap. 4, where we illustrate how
the techniques from the previous chapters may be used for writing more complex
applications. Although restricted to elementary numerical methods, the case studies
therein should resemble more of what can be encountered in actual ESS models.
Finally, in Chap. 5 we present additional techniques, which are especially rel-
evant in ESS. Some of these (e.g., namelists, interoperability with C, interacting
with the operating system (OS)) are Fortran features. Other topics (I/O with
NETwork Common Data Format (netCDF), shared-memory parallelization, build
systems, etc.) are outside the scope of the Fortran language-standard, but none-
theless essential to any Fortran programmer (the netCDF is ESS-specific).
Language-Standards Covered
The core of the book is based on Fortran 95.1 Building upon this basis, we also
introduce many newer additions (from Fortran 2003 and Fortran 20082), which
complete the discussion or are simply “too good to miss”—for example OOP,
1
This was, at the time of writing, the most recent version with ubiquitous compiler support.
2
Many compilers nowadays have complete or nearly complete support for these newer language-
standard revisions.
Preface ix
Disclaimers
• Given the wide range of topics covered and the aim to keep our text brief, it
is obvious that we cannot claim to be comprehensive. Indeed, good
monographs exist for many topics, which we only superficially mention
(many further references are cited in this text).
• Finally, we often provide advice related to what we consider good software
practices. This selection is, of course, subjective, and influenced by our
background and experiences. Specific project conventions may require the
reader to adapt/ignore some of our recommendations.
Being primarily a compact guide to modern Fortran for beginners, this book is
intended to be read from start to finish. However, one cannot learn to program
effectively in a new language just by reading a text—as in any other “craft”,
practice is the best way to improve. In programming, this implies reading and
writing/testing as much code as possible. We hope the reader will start applying this
philosophy while reading this book, by typing, compiling, and extending the code
samples provided.3
Readers with programming experience may also use “random access,” to select
the topics that interest them most—the chapters are largely independent, with the
exception of Chap. 5, where several techniques are demonstrated by extending
examples from Chap. 4.
Due to the “breadth” of the book, many technical aspects are covered only
superficially. To keep the main text brief, we opted to provide as footnotes sug-
gestions for further exploration. Unfortunately, this led to a significant number of
footnotes at times; the reader is encouraged to ignore these, at least during a first
reading, if they prove to be a distraction.
3
Nonetheless, the programs are also available for download from SpringerLink. The authors
also provide a code repository on GitHub: assuming a working installation of the git version-
control system is available, the code repository can be “cloned” with the command:
.
x Preface
Acknowledgments
The idea of writing this book crystallized in the spring of 2012. Almost 2 years
later, we have the final manuscript in front of us. Contributions from many people
were essential during this period. They all helped in various ways, through dis-
cussions about the book and related topics, requests for clarifications, ideas for
topics to include, and corrections of our English and of other mistakes, greatly
improving the end result. In particular, we acknowledge the help of many (past and
present) colleagues from the Climate Sciences division at Alfred-Wegener-Institut,
Helmholtz-Zentrum für Polar- und Meeresforschung (AWI)—especially Manfred
Mudelsee, Malte Thoma, Tilman Hesse, Veronika Emetc, Sebastian Hinck,
Christian Stepanek, Dirk Barbi, Mathias van Caspel, Sergey Danilov, and Dmitry
Sidorenko. We thank Stefanie Klebe for a very thorough reading of the final draft,
which significantly improved the quality of the book.
In addition to our AWI colleagues, we received valuable feedback from Li-Shi
Luo, Miguel A. Bermejo, and Dag Lohmann.
Our editors from Springer were very helpful during the writing of this book. In
particular, we thank Marion Schneider, Johanna Schwarz, Carlo Schneider, Marcus
Arul Johny, Ashok Arumairaj, Janet Sterritt, Agata Oelschlaeger, Dhanusha M. and
Janani J. for kindly answering our questions and for their support.
Finally, we would like to thank our families and friends, who contributed with
encouragement, support, and patience while we worked on this project.
1 General Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 History and Evolution of the Language . . . . . . . . . . . . . . . . . . 1
1.2 Essential Toolkit (Compilers). . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Basic Programming Workflow . . . . . . . . . . . . . . . . . . . . . . . . 3
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Fortran Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Program Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Keywords, Identifiers and Code Formatting . . . . . . . . . . . . . . . 8
2.3 Scalar Values and Constants . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Declarations for Scalars of Numeric Types . . . . . . . . . 11
2.3.2 Representation of Numbers and Limitations
of Computer Arithmetic . . . . . . . . . . . . . . . . . . . . . . 12
2.3.3 Working with Scalars of Numeric Types . . . . . . . . . . . 14
2.3.4 The Kind type-parameter . . . . . . . . . . . . . . . . . . . . . 15
2.3.5 Some Numeric Intrinsic Functions . . . . . . . . . . . . . . . 18
2.3.6 Scalars of Non-numeric Types . . . . . . . . . . . . . . . . . . 18
2.4 Input/Output (I/O) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.1 List-Directed Formatted I/O to Screen/from
Keyboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.2 Customizing Format-Specifications . . . . . . . . . . . . . . . 25
2.4.3 Information Pathways: Customizing I/O Channels . . . . 30
2.4.4 The Need for More Advanced I/O Facilities . . . . . . . . 36
2.5 Program Flow-Control Elements (if, case, Loops, etc.) . . . . . 37
2.5.1 if Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5.2 case Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.5.3 do Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.6 Arrays and Array Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.6.1 Declaring Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.6.2 Layout of Elements in Memory . . . . . . . . . . . . . . . . . 50
2.6.3 Selecting Array Elements . . . . . . . . . . . . . . . . . . . . . 51
xi
xii Contents
4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.1 Heat Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.1.1 Formulation in the Dimensionless System . . . . . . . . . . 119
4.1.2 Numerical Discretization of the Problem . . . . . . . . . . . 120
4.1.3 Implementation (Using OOP). . . . . . . . . . . . . . . . . . . 123
4.2 Climate Box Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.2.1 Numerical Discretization . . . . . . . . . . . . . . . . . . . . . . 131
4.2.2 Implementation (OOP/SP Hybrid). . . . . . . . . . . . . . . . 132
4.3 Rayleigh-Bénard (RB) Convection in 2D. . . . . . . . . . . . . . . . . 138
4.3.1 Governing Equations . . . . . . . . . . . . . . . . . . . . . . . . 139
4.3.2 Problem Formulation in Dimensionless Form. . . . . . . . 141
4.3.3 Numerical Algorithm Using the Lattice Boltzmann
Method (LBM) . . . . . . . . . . . . . . . . . . . . . . . . . ... 144
Contents xiii
xv
xvi Acronyms
Fortran Compilers
gfortran GNU Fortran Compiler (see entry on gcc)
ifort lntel Fortran Compiler® (http://software.intel.com/en-
us/fortran-compilers)
Profiling Tools
gprof GNU Profiler (part of binutils) (http://www.gnu.org/
software/binutils/)
VTune Intel VTune Amplifier XE 2013 (https://software.intel.
com/en-us/intel-vtune-amplifier-xe)
Cygwin http://www.cygwin.com/index.html
gcc GNU Compiler Collection (http://gcc.gnu.org)
ld GNU linker (http://www.gnu.org/software/binutils)
gmake GNU Make (http://www.gnu.org/software/make)
MinGW Minimalist GNU for Windows (http://www.mingw.org)
SCons Software Construction tool (http://www.scons.org)
Visualization/Post-processing Tools
CDO Climate Data Operators (https://code.zmaw.de/projects/
cdo)
GMT Generic Mapping Tools (http://gmt.soest.hawaii.edu)
gnuplot http://www.gnuplot.info
NCO netCDF Operators (http://nco.sourceforge.net)
ParaView Parallel Visualization Application (http://www.paraview.org)
Operating Systems
AIX IBM Advanced Interactive eXecutive
Linux GNU/Linux
OSX Mac OS X®
Windows Microsoft Windows®
Unix Unix® (http://www.unix.org)
Text Editors
Emacs GNU Emacs text editor (http://www.gnu.org/software/emacs)
gedit Gedit text editor (http://projects.gnome.org/gedit)
joe Joe’s Own Editor (http://joe-editor.sourceforge.net)
Kate Kate text editor (http://kate-editor.org)
Vim Vim text editor (http://www.vim.org)
Software Libraries
ACML Core Math Library (http://developer.amd.com/
tools/cpu-development/amd-core-math-
library-acml)
ATLAS Automatically Tuned Linear Algebra Software (http://
math-atlas.sourceforge.net)
BLAS Basic Linear Algebra Subprograms
Boost. http://www.boost.org/libs/program_
Program_Options options
ESSL Engineering Scientific Subroutine Library
xviii Acronyms
4
We choose to typeset Fortran keywords with lowercase letters, although the language is case-
insensitive everywhere except inside character strings (so PROGRAM, program or PrOgRaM is
all the same to the compiler).
xx Acronyms
It should be easy to infer from the context what these angle bracket expressions
should be replaced with.
• Combinations of optional and mandatory items are sometimes highlighted by
nesting of square and angle brackets, to distinguish the fact that including some
items may unlock additional possible combinations.
3. With the exception of small snippets, code listings are accompanied by a cap-
tion, indicating the corresponding file in the source code tree available for
download. Line numbers are only shown when they are specifically referenced
in the text.
4. Where interaction with the Operating System (OS) is illustrated, we describe the
process for the GNU/Linux (Linux) platform, using Bourne-again shell
(bash), since this environment is easily accessible. Commands corresponding
to such tasks are marked by a leading $-character (default shell-prompt in
Linux); only the part after this marker should be typed.
5. Exercises are typeset on a dark-gray background, to distinguish them from the
rest of the text.
6. Several notes appear as framed boxes on a light gray background.
7. Naming conventions It is usually considered good practice to adopt some rules
for naming entities that are part of the program code. Although different
developers may prefer a different set of such rules, it is generally a good idea to
use a single convention consistently within a project, to reduce the effort
required for understanding the code. Our particular conventions are explained
below.
• Variables that are part of a user-defined type follow the same rules as above,
except that the first letter is always a lowercase “m”:
Acronyms xxi
• Constants are written in uppercase, and when they are composed of multiple
words they are separated by underscores:
• Normal procedures (i.e., those which are not bound to a specific user-defined
type) look similar to usual variables, except that they contain verbs, to emphasize
the function of the procedure:
• Procedures that are bound to a specific type follow the rules above, but also have
the name of the type at the end:
This rule is introduced mostly to avoid naming collisions, when the same type-
bound procedure name makes sense for several types (but their implementation
differs). For simplifying the calling of these procedures, we usually define shorter
aliases (which omit the type-name), as explained in Chap. 3.
• Fortran modules that also encapsulate a user-defined type are named after the
type, with the added prefix :
When these are placed in distinct files, the filename is composed of the module
name, with the added extension . For example, the modules above would
be placed in files and
Chapter 1
General Concepts
This chapter introduces the Fortran programming language in the context of numeri-
cal modeling, and in relation to other languages that the reader may have experience
with. Also, we discuss some technical requirements for making the best use of this
book, and provide a brief overview of the typical workflow for writing programs in
Fortran.
In the 1950s, a team from International Business Machines Inc. (IBM) labs led by
John Backus created the Fortran (“mathematical FORmula TRANslation system”)1
language. This was the first high-level language (HLL) to become popular, especially
in the domain of numerical and scientific computing, for which it was primarily
designed. Prior to this development, most computer systems were programmed in
assembly languages, which only add a thin wrapper on top of raw machine language
(generally leading to software which is not portable and more difficult to maintain).
Fortran was widely adopted due to its increased level of abstraction, which made
Fortran programs orders of magnitude more compact than corresponding assembly
programs. This popularity, combined with intentional simplifications of the language
(for example, lack of pointer type in earlier versions), encouraged the development
of excellent optimizing compilers, making Fortran the language of choice for many
demanding scientific applications.
This is also the case for Earth system science (ESS), where Fortran is to date
the most used programming language. The reasons are simple: there is a huge body
of tested Fortran routines, and the language is very suitable for coding physical
1The reader may sometimes encounter the name of the language spelled in all capitalized (as in
FORTRAN), usually referring to the early versions of the language, which officially supported only
uppercase letters to be used in programs. This shortcoming was corrected by the later revisions,
with which we are concerned in the present text.
© Springer-Verlag Berlin Heidelberg 2015 1
D.B. Chirila and G. Lohmann, Introduction to Modern Fortran
for the Earth System Sciences, DOI 10.1007/978-3-642-37009-0_1
2 1 General Concepts
equations. Early model implementations based on Fortran started in the mid of last
century (see e.g. Bryan [1], Platzman [4], Lynch [3] and references therein). The
models predicted how changes in the natural factors that control climate, such as
ocean and atmospheric currents and temperature, could lead to climate change. Cli-
mate models are intended to provide a user-friendly and powerful framework for
simulating real or idealized flows over wide-ranging scales and boundary condi-
tions. With its good support for modular programming, Fortran proved to be well
suited for these tasks.
Certainly, many other languages were introduced over the last 60 years (such as
the COBOL, Pascal, C, C++, Java, etc.), some offering innovative facilities for
expressing algorithm abstractions (such as object-oriented or generic programming).
Interestingly, these languages did not supersede Fortran (at least not in the ESS
community); instead, they inspired the Fortran language-standardization committee
to incorporate such facilities through incremental revisions (Fortran 90, Fortran 95,
Fortran 2003, and Fortran 2008 at the time of writing).
2 For most supercomputers, the compilers are usually provided by the hardware vendor, which
allows better tuning of the code to the features of the underlying machine.
1.3 Basic Programming Workflow 3
Text Editor
"EDIT"
myProgram.f90
Compiler
"COMPILE"
myProgram.o
(myProgram.obj)
library code
Linker
start−up code
"LINK"
myProgram
(myProgram.exe)
Fig. 1.1 Schematic of programming workflow in Fortran. Files are represented as white rounded
boxes, and external programs as green boxes
From a low-level perspective (i.e. leaving more abstract issues such as program design
aside), development of Fortran programs3 is represented schematically in Fig. 1.1.
3 The terms “program” and “(source) code” are used interchangeably within this book; however,
strictly speaking, “code” can also refer to program sub-modules, such as functions, while “program”
usually refers to a complete application, which yields an executable file when processed by a
compiler.
4 1 General Concepts
In the figure, the utilities are shown as green boxes. The process starts with a
text editor,4 where the user enters the program code.5 Then, the compiler is invoked,
passing the created file as an argument. In Linux, using gfortran, this would be
achieved by typing the following command in a terminal window:
$ g f o r t r a n - c m y P r o g r a m . f90
At this point, an additional file (myProgram.o) will be created. This contains
machine code generated from myProgram.f90 which does not contain, however,
any code for libraries that may be needed by your program. It is the job of the linker
to find the missing pieces and to produce the final, executable file. In Linux, the
GNU linker (ld) is normally used for this purpose. For simplicity, it is better to
perform the linking stage also through the compiler, which will call the linker with
the appropriate options in the background:
$ gfortran -o myProgram myProgram .o
(in Windows, replace myProgram.o with myProgram.obj, and myProgram
with myProgram.exe).
This step will create the executable program, which can be run with the command:
$ ./ m y P r o g r a m
The entire workflow seems deceivingly simple.6 In reality, problems can appear
at any stage (especially in nontrivial programs), which trigger the need to revise the
program. These iterative improvements of the code are suggested by the dashed lines
in Fig. 1.1. First, the compiler may refuse to produce object-code if the program
does not follow the syntax of the language. Then, the linker may be unable to find
the appropriate libraries to include. Finally, the program may crash, or it may run
but produce unacceptable results. The beginner will usually encounter problems
across all of these ranges. Fortunately, with some practice, the frequency of the (less
interesting) compilation/linking errors decreases.
Compiling and linking in one step. So far, we separated the two phases for pro-
ducing the program executable, to make the reader aware of the distinction (when
4 Word processors are a poor choice here, since they focus on features like advanced formatting,
which the compiler does not understand anyway; instead, a “bare bones” text editor, but with
programming-related features like syntax highlighting and auto-completion, is recommended, for
example: gedit text editor (gedit) or Kate text editor (Kate) are good starting points; Vim text
editor (Vim), GNU Emacs text editor (Emacs) or Joe’s Own Editor (joe) are more advanced
choices, that may pay off on the longer term.
5 Files containing modern Fortran source code usually have the extension .f90, but the reader
may also encounter extensions .f77, .f, or .for, which correspond to older standards; likewise,
some developers may use the extensions .f95, .f03, or .f08, to highlight use of features
present in the latest revisions of the language—but this practice is discouraged by some authors
(e.g. Lionel [2]). To avoid problems, filenames should also not contain whitespace.
6 Indeed, this resembles the Feynman problem-solving algorithm: (a) write down the problem,
(b) think very hard, and (c) write down the answer.
1.3 Basic Programming Workflow 5
References
1. Bryan, K.: A numerical method for the study of the circulation of the world ocean. J. Comput.
Phys. 4(3), 347–376 (1969)
2. Lionel, S.: Doctor Fortran in “Source Form Just Wants to be Free” (2013). http://software.intel.
com/en-us/blogs/2013/01/11/doctor-fortran-in-source-form-just-wants-to-be-free
3. Lynch, P.: The origins of computer weather prediction and climate modeling. J. Comput. Phys.
227(7), 3431–3444 (2008)
4. Platzman, G.W.: The ENIAC computations of 1950—gateway to numerical weather prediction.
Bull. Am. Meteorol. Soc. 60(4), 302–312 (1979)
Chapter 2
Fortran Basics
Every programming language imposes some precise syntax rules, and Fortran is no
exception. These rules are formally grouped in what is denoted as a “context-free
grammar”,1 which precisely defines what represents a valid program. This helps the
compiler to unambiguously interpret the programmer’s source code,2 and to detect
sections of source code which do not follow the rules of the language. For readability,
we will illustrate some of these rules through code examples instead of the formal
notation.
Below, we show the basic layout of a single-file Fortran program, with no proce-
dures (these will be discussed later):
p r o g r a m [ p r o g r a m name ]
i m p l i c i t none
[ variable declarations [ initializations ] ]
[ code for the p r o g r a m ]
end p r o g r a m [ p r o g r a m name ]
Any respectable language tutorial needs the classical “Hello World” example.
Here is the Fortran equivalent:
program hello_world
i m p l i c i t none
p r i n t * , " Hello , w o r l d of M o d e r n F o r t r a n ! "
end p r o g r a m h e l l o _ w o r l d
Listing 2.1 src/Chapter2/hello_world.f90
This should be self-explanatory, except maybe for the implicit none entry,
which instructs the compiler to ensure all used variables are of an explicitly defined
type. It is strongly recommended to include this statement at the beginning of each
program.3 The same advice will apply to modules and procedures (discussed later).
Exercise 1 (Testing your setup) Use the instructions from Sect. 1.3 (adapting
commands and compiler flags as necessary for your system) to edit, compile
and execute the program above. Try separate compilation and linking first, then
combine the two stages.
All Fortran programs consist of several types of tokens: keywords (reserved words
of the language), special characters,4 identifiers and constant literals (i.e. numbers,
characters, or character strings). We will encounter some of the keywords soon, as
we discuss basic program constructs. Identifiers are the names we assign to variables
or constants. The first character of an identifier should be a letter (the rest can be
3 This is related to a legacy feature, which could lead to insidious bugs. The take-home message
for new programmers is to always use implicit none . The −fimplicit−none
flag can be used, in principle, in gfortran, but this is also discouraged because it introduces an
unnecessary dependency on compiler behavior.
4 The special characters are (framed by boxes): = , + , - , * , / , ( , ) , , , . , $ , ’ , : ,
letters, digits or underscores _ ). The length of the identifiers should not exceed 63
characters (Fortran 2003 or newer).5
Comments: Commenting the nontrivial aspects of your code is highly recom-
mended.6 In Fortran, this is achieved by typing an exclamation mark ( ! ), either
on a line of its own, or within another line which also contains program code. In
either case, an ! will cause the compiler/preprocessor to ignore the rest of the line.7
Multi-line statements: Unlike languages from the C-family, in Fortran the semicolon
; for marking the end of a statement is optional (although it is still used sometimes,
to pack several short statements on the same line). By default, the end of the line
is also considered to be the end of the statement. A line of code in Fortran should
be at most 132 characters long. If a statement is so long that this is not sufficient
(for example, a long formula for evaluating derivatives in finite-difference numerical
schemes), we can choose to continue it on the following line(s), by inserting an
ampersand & at the end of each line that is continued. Since Fortran 2003, up to
2558 continuation lines are allowed for any statement.
It can happen (although it should be avoided when possible) that the line break in
a multi-line statement occurs at the middle of a token. In that case, using a single &
will probably not give the expected result. This can be overcome by typing another
& as the first character on the continued line, which contains the remainder of the
divided token.
The two possible uses of continuation lines are shown in the example below:
1 program continuation_lines
2 i m p l i c i t none
3 integer : : seconds_in_a_day = 0
4
5 ! Normal continuation - lines
6 seconds_in_a_day = &
7 2 4 * 6 0 * 6 0 ! 86400
8
9 print *, seconds_in_a_day
10
11 ! C o n t i n u a t i o n - lines with a split integer - l i t e r a l t o k e n
12 s e c o n d s _ i n _ a _ d a y = 2&
13 & 4 * 6 0 * 6 0 ! still 86400. In this case , s p l i t t i n g the ’24 ’
14 ! is unwise , b e c a u s e it makes code u n r e a d a b l e .
15 ! However , for long c h a r a c t e r s t r i n g s this can be
16 ! useful ( see below ).
17 print *, seconds_in_a_day
18
19 ! Continuation - lines with a split string token .
20 p r i n t * , " This is a r e a l l y l o n g string , that n o r m a l &
21 & lly w o u l d not fit on a s i n g l e l i n e . "
end p r o g r a m
22
ments that communicate additional information to the compiler; examples will be shown in Sect. 5.3,
when we will discuss how to specify, using the Open MultiProcessing (OpenMP) extensions, which
portions of the code should be attempted to be run in parallel.
8 The previous limit (according to the Fortran 95 standard) was of up to 39 continuation lines.
10 2 Fortran Basics
Spaces and indentation: Whitespace can be freely used to separate program tokens,
without changing the meaning of the program. For example, as far as the compiler
is concerned, line 3 in the previous listing could also have been written as:
integer : :seconds_in_a_day = 0
9 Note that some text editors feature automatic indentation, which makes this easier.
10 Other languages, such as Matrix Laboratory (MATLAB) or The R Project for Statistical Com-
puting (R), support dynamic typing, so the type of a variable can change during the execution of the
program.
11 It is also possible to define custom types, enabling data-encapsulation techniques similar to C++
NOTES
• Position of declarations in (sub)programs: All declarations for constants
and variables need to be included at the beginning of the (sub)program,
before any executable statement. However, as of Fortran 2008 it is possible
to overcome this limitation, by surrounding variable declarations with a
block - end block construct, as follows:
! variable declaration (s)
integer : : length
! e x e c u t a b l e s t a t e m e n t s ( normally , not p o s s i b l e to s p e c i f y a d d i t i o n a l
! v a r i a b l e d e c l a r a t i o n s after the first such e x e c u t a b l e s t a t e m e n t )
l e n g t h = 10
block
! block - c o n s t r u c t ( F o r t r a n 2 0 0 8 + ) e n a b l e s us to o v e r c o m e that
! limitation
real : : x
end block
with an initialization. These two steps can be merged into a one-liner (see examples
below). Finally, we also show how to define constants of each type.
integer type: valid values of this type are, e.g. −42, 24, 123. In general,
any integer is accepted, as long as it resides within a certain range. The length of the
range is determined by the kind parameter (if that is explicitly specified), or by the
machine architecture (32 or 64 bit) and compiler (if no kind is specified, as in our
present examples). Example declarations:
integer i ! plain d e c l a r a t i o n ...
i = 10 ! ... with c o r r e s p o n d i n g i n i t i a l i z a t i o n
! ( would be in the e x e c u t a b l e s e c t i o n of
! the ( sub ) p r o g r a m )
integer :: j = 20 ! d e c l a r a t i o n with i n i t i a l i z a t i o n
integer , p a r a m e t e r : : K = 30 ! c o n s t a n t ( i n i t i a l i z a t i o n m a n d a t o r y )
complex type: complex numbers are often needed in scientific and engineer-
ing applications, thus Fortran supports them natively. They can be specified as a pair
of integer or real values (however, even if both components are specified as
integers, they will be stored as a pair of reals, of default kind). Example declarations:
c o m p l e x c1 ! s i m p l e d e c l a r a t i o n
c o m p l e x : : c2 = (1.0 , 7.19 e23 ) ! d e c l a r a t i o n with i n i t i a l i z a t i o n
complex , p a r a m e t e r : : C3 = (2. , 3 . 3 7 ) ! c o n s t a n t
While internally all data is stored by computers as a sequence of bits (zeroes and ones),
the concept of types (also known as “data types”) binds the byte sequences to spe-
cific interpretations and manipulation rules. For example, addition of integer -
versus that of real -numbers is very different at the bit-level. The number of bits
used for a value of each type is particularly important: the more bits are used, the
Modern Fortran has a very convenient mechanism for specifying the numerical
requirements of a program in a portable way, without forcing developers (or, worse,
users) to study each CPU in-depth. We discuss this feature in Sect. 2.3.4.
It is important that programmers keep in mind the limitations of the internal
representations, since these are an endless source of bugs. A tutorial on these issues
is outside the scope of our text (a very readable introduction to these issues and their
implications is Overton [11]). For example, some of the facts to keep in mind for the
integer and real types are:
13 This is not a “hard” rule, however, because many factors enter the performance equation—
e.g. specialized hardware units within the central processing unit (CPU), the memory hierarchy,
vectorization, etc.
14 An intuitive approach would be to reserve one bit for the sign, and use the rest for the modulus.
However, to reduce hardware complexity most systems use another convention (“two’s comple-
ment”).
15 Note that enabling such options will most probably make the program slower too, so they are not
numbers using a bit-field of finite size. This causes most nontrivial calculations
with real-values in Fortran to be approximate (so there is always some “noise”
in the numerical results).
To complicate matters more, note that many numbers which are exactly rep-
resentable in the familiar decimal floating-point notation cannot be represented
exactly when translated to one of the binary floating-point formats. A common
example is the number 0.1, which on our system becomes 0.100000001490116
when translated to 32 bit floating-point and back to decimal, and 0.10000000
0000000005551115123125783 when 64 bit floating-point is used. This can lead
to subtle bugs—for example, when two variables which were both assigned the
value 0.1 are compared for equality, the result may be false if the two variables
are of different floating-point type. In this case, the compiler will promote the
lower-precision value to the higher-precision type (so that it can perform the com-
parison). However, this cannot bring back the bits that were discarded when 0.1
was converted to fit inside the lower-precision type. For this reason, it is often
a good idea to avoid such comparisons as long as possible, or to include some
tolerances when this operation is necessary nonetheless. For more information on
floating-point arithmetic and advice for good practices, the reader can also consult
Goldberg [4], as well as Overton [11].
The three numeric types share some characteristics, so it makes sense to discuss
their usage simultaneously, highlighting any exceptions. This is the purpose of this
section.
Standalone numerical expressions do not make much sense (hence the language does
not allow them): what we actually want is to assign the result of the expressions to
variables (with the = assignment operator16 ), or to pass the result to some function
(e.g. to display it). This is another point where loss of precision can occur, if the
expression is of a stronger type/kind than the variable to which it is assigned:
integer : : i = 0
real : : m = 3.14 , n = 2.0
i = m / n ! i will become 1 , NOT 1.57 ( r o u n d i n g t o w a r d s 0)
m = -m ! negate m with unary o p e r a t o r
i = m / n ! i will become -1 ( r o u n d i n g also t o w a r d s 0)
p r i n t * , m / n ! e x p r e s s i o n p a s s e d to ’ print ’ - f u n c t i o n
For applications that need to work with data of complex type, note that it is possible
(since Fortran 2008) to conveniently refer to the real and imaginary components:
c o m p l e x : : z1 (1.0 , 2.0)
z1 % im = 3.0 ! m o d i f y the i m a g i n a r y part
p r i n t * , " real part of z1 = " , z1 % re
Most of the numerical algorithms encountered in ESS need some assumptions regard-
ing properties of the types used to represent the quantities they manipulate. For exam-
ple, if integers are used to represent simulation time in seconds, we need to ensure
the type can support the maximum number of seconds the model will be run for. The
16 Not to be confused with an equivalence in the mathematical sense. In Fortran, that is represented
demands are more complex for reals, which are always stored with finite precision:
since each result needs to be truncated to fit the representation, numerical “noise” is
ever-present, and needs to be constrained.
One way to improve17 the situation is to increase the accuracy of the represen-
tation, by using more bits to represent each value. In older versions of Fortran, the
double precision type (real-variant) was introduced exactly for this. The
problem, however, lies in the fact that the actual number of bits is still system- and
compiler-dependent: switching hardware vendors and compilers is normal, and sur-
prises due to improper number representation (which can often go unnoticed) are
better avoided when possible.
The concept of kind is the modern Fortran response to this problem, and it
deprecates the double precision type. kind acts as a parameter to the type,
allowing the programmer to select a specific type variant from the multitude that may
be supported by the platform.18 Even better, the programmer need not be concerned
with the lower-level details, since two special intrinsic functions (discussed shortly)
allow querying for the most economic types that meet some natural requirements.
We only discuss kind for numeric types, although the concept also applies to non-
numeric types (for storing characters of non-European languages and for efficiently
packing arrays of logical type—for details consult, e.g. Metcalf et al. [10]).
Kinds are indexed with positive integer values. If we know these indices for
selecting numbers with the desired properties on the current platform, they can be
used to parameterize types, as in:
i n t e g e r ( kind =4) : : i
real ( kind =16) : : x
However, this feature alone does not solve the portability problem, since the index
values themselves are not standardized. The intended usage, instead, is through two
intrinsic functions which return the appropriate kind-values, given some constraints
requested by the developer:
1. selected_int_kind (requestedExponentRange), where request-
edExponentRange is an integer, returns the appropriate kind-parameter for
representing integer numbers in the range:
For example,19
integer , p a r a m e t e r : : L A R G E _ I N T = s e l e c t e d _ i n t _ k i n d (18)
i n t e g e r ( kind = L A R G E _ I N T ) : : t = -123456 _ L A R G E _ I N T
17 This is not to be seen as a “silver bullet”, since numerical noise will still corrupt the results, if
the algorithm is inherently unstable.
18 Compilers are required to provide a default kind for each of the 5 intrinsic types, but they may
will guarantee that the compiler selects a suitable type of integer to fit values
of t in the interval (−1018 , 1018 ).
2. selected_real_kind(requestedPrecision, requestedExpo-
nentRange), where both arguments are integers, returns the appropriate kind-
parameter for representing numbers with a decimal exponent range of at least
requestedExponentRange, and a decimal precision of at least reques-
tedPrecision20 after the decimal point.
Example:
integer , p a r a m e t e r : : M Y _ R E A L = s e l e c t e d _ r e a l _ k i n d (18 ,200)
real ( kind = M Y _ R E A L ) : : x = 1.7 _ M Y _ R E A L
Note that, since the exact data type needs to be revealed to the compiler, results of
the kind-inquiries need to be stored into constants (which are initialized at compile-
time).
By increasing the values of the requestedExponentRange and/or
requestedPrecision parameters, it is easily possible to ask for numbers
beyond the limits of the platform (you will get the chance to test this in Exercise 7).
In such situations, the inquiry functions will return a negative number. This fits with
the way kind type-parameters are used, since trying to specify a negative kind
value will cause the compilation to fail:
integer , p a r a m e t e r : : N O N S E N S E _ K I N D = -1
i n t e g e r ( kind = N O N S E N S E _ K I N D ) : : s ! will fail to c o m p i l e
integer , p a r a m e t e r : : U N R E A S O N A B L E _ R A N G E = s e l e c t e d _ i n t _ k i n d ( 3 0 0 0 0 )
! will also fail to c o m p i l e ( at least , in 2013) , b e c a u s e a too a m b i t i o u s range
! of values was requested , c a u s i n g the i n t r i n s i c f u n c t i o n to
! return a negative number
i n t e g e r ( kind = U N R E A S O N A B L E _ R A N G E ) : : u
In closing of our discussion on kind, we have to admit that inferring the type-
parameters in each (sub)program, while viable for simple examples, can become
tedious and, worse, leads to much duplication of code. An elegant solution to
this problem is to package this logic inside a module, which is then included in
(sub)programs.21 We defer the discussion of this mechanism to Sect. 3.2.7, after
covering the concept of modules.
20 The situation is more complex for this type, because some values which are exactly representable
using the decimal floating-point notation can only be approximated in the binary floating-point
notation.
21 We encountered this mechanism in the Fortran distribution of the popular Numerical Recipes
logical type: allows variables to take only two values: .true. or .false.
(dots are mandatory). They can be declared similarly to the other types:
logical activated ! plain d e c l a r a t i o n ...
a c t i v a t e d = . true . ! ... with c o r r e s p o n d i n g i n i t i a l i z a t i o n
l o g i c a l : : c o n d i t i o n S a t i s f i e d = . f a l s e . ! d e c l a r a t i o n with init
logical , p a r a m e t e r : : ON = . true . ! c o n s t a n t ( init m a n d a t o r y )
logical expressions: As for numeric types, logical values can be used, together
with specific operators (unary: .not. ; binary: .and. , .or. , .eqv. (equality)
and .neqv. ), to construct expressions, as in (using the previous declarations):
. not . c o n d i t i o n S a t i s f i e d ! . eqv . . true .
c o n d i t i o n S a t i s f i e d . and . ON ! . eqv . . false .
character type: variables and constants of this type are used for storing text
characters, similar to other programming languages. In Fortran, characters and char-
acter strings are marked by a pair of single quotes (as in ‘some text’), or a pair of
double quotes (as in ‘‘some more text’’). These can be used interachangeably,
both for single- and multi-character values.
A text character is said to belong to a character set. A very influential such character
set is ASCII, which can be used to represent English-language text. Ours being an
English text, we devote more space to this character set.
Many modern Fortran implementations currently use ASCII by default. For
example, this is the case on our test system (64bit Linux, gfortran v4.8.2),
when we declare variables such as:
character char1 ! plain d e c l a r a t i o n ( to be i n i t i a l i z e d later )
c h a r a c t e r : : c h a r 2 = ’ a ’ ! d e c l a r a t i o n with i m m e d i a t e i n i t i a l i z a t i o n
We discussed earlier (in the context of numeric types) the concept of type-
parameters. The character type actually accepts two such parameters: len
(for controlling the length of the string) and kind (for selecting the character set).
Let us focus on the first parameter (len) for now. It exists because most of
the times developers want to store sequences of characters (strings). If (as in the
previous listing) len is not explicitly mentioned, it implicitly has the value fixed to
“1” (reserving space for just one ASCII-character). To store wider strings, we can
declare a sufficiently-large value for len, e.g.:
c h a r a c t e r ( len = 1 0 0 ) m y N a m e ! fixed - size s t r i n g
Note the type-parameter len=∗ (line 4), which causes the string to have what
is known as assumed-length.
Another common scenario is when the strings to operate on are not constant, with
their lengths only becoming known during the execution of the program. This is the
case, for example, if we want to make the previous listing more flexible, by asking
20 2 Fortran Basics
the user to provide a filename.22 For such a situation, we can use deferred-length
strings, which are marked by the type-parameter len=: , in conjunction with the
specifier allocatable . For example:
1 program deferred_length_strng
2 i m p l i c i t none
3
4 c h a r a c t e r ( len =256) : : b u f f e r ! fixed - l e n g t h b u f f e r
5 c h a r a c t e r ( len =:) , a l l o c a t a b l e : : f i l e n a m e ! deferred - l e n g t h s t r i n g
6
7 print * , ’ P l e a s e enter f i l e n a m e ( less than 256 c h a r a c t e r s ): ’
8 read * , b u f f e r ! place user - input into fixed buffer
9
10 filename = & ! copy from b u f f e r to dynamic - size string
11 trim ( a dj us t l ( b u f f e r )) ! ’ trim ’ and ’ adjustl ’ e x a p l a i n e d later
12
13 print * , f i l e n a m e ! some f e e d b a c k ...
14 end p r o g r a m d e f e r r e d _ l e n g t h _ s t r n g
Listing 2.4 src/Chapter2/deferred_length_strng.f90
22 This approach is more convenient, in the sense that the user does not have to re-compile the
program every time the filename changes. For real-world software, we prefer to minimize interaction
with users, and allow specification of filenames (e.g. model input) at the invocation command line
instead (Sect. 5.5.1), which facilitates unattended runs.
23 If you try this example, you may notice that an additional space is printed at the beginning of
every line. This is the default behavior, related to some legacy output devices. We will discuss how
to avoid this in Sect. 2.4.
2.3 Scalar Values and Constants 21
In ESS models, characters and strings are often secondary to the core numerics.
They are, however, useful for manipulating model-related metadata. To cater for
such needs, Fortran provides several intrinsic functions that take strings arguments
(see Table 2.1 for a basic selection, or Metcalf et al. [10] for detailed information).
At the end of Sect. 2.5, after introducing more language elements, we use some
of these intrinsic functions, to solve a common pattern in ESS (creation of unique
filenames for transient-phenomena model output, based on the index of the time step).
The I/O system is an essential part of any programming language, as it defines ways
in which our programs can interact with other programs and with users.
For example, models in ESS typically read files (input) for setting-up the geometry
of the problem and/or for loading initial conditions. Then, as the prognostic variables
are calculated for the subsequent time step, the new model state is regularly written
to other files (output). Frequently, the input files are created in an automatic fashion,
22 2 Fortran Basics
using other programs; likewise, the output of the model may be passed to post-
processing/visualization tools, for analysis.24
External files are not the only medium for performing I/O; other interfaces include
the usual interaction with the user via the terminal, or communication with the oper-
ating system (OS) (which allows the program to become aware of command line
arguments passed to it, and of environment variables—see Sect. 5.5 for some exam-
ples). It is also possible to construct graphical user-interface (GUI)-based I/O appli-
cations, using third-party libraries.25 but, in ESS, models providing such features26
are still the exceptions rather than the rule.27
We already used some simple I/O-constructs, in the code samples presented so far.
In this section, we provide the background for these constructs, and also discuss other
aspects of formatted 28 I/O (such as controlling the I/O commands, or working with
files). Finally, we provide a hierarchical overview of the I/O facilities used in ESS.
NOTE
A distinguishing characteristic of Fortran is that, by default, its I/O subsystem
is record-based (unlike languages like C or C++, which treat I/O as a stream
of bytesa ).
a This difference can cause problems while exchanging files across languages. Such problems
can be avoided by using portable formats like NETwork Common Data Format (netCDF)
(Sect. 5.2.2) or, when the file format cannot be changed, by using the new stream I/O capa-
bilities of Fortran 2003 (see Metcalf et al. [10]).
24 The complete network of tasks for obtaining the final data products can become quite complex.
In such cases, it often pays off to automatize the entire process, using shell scripts (see Sect. 5.6.1
for a brief overview of the options available, and some suggestions for further reading).
25 See, for example, Java Application Programming Interface (JAPI) for an open-source solution;
et al. [7]).
27 A lack of graphical interfaces does not imply obsolete software practices: textual, command line
interfaces can be readily used to automate complete workflows. This paradigm is suitable for ESS
models, which usually need a long time to run. However, GUI-based systems are often suitable for
steering operations which complete very fast, such as low-resolution models or tools in exploratory
data analysis.
28 In Fortran, formatted I/O means ASCII-text form; conversely, un-formatted I/O means binary
form. We do not cover binary I/O in this text, even if it is more space-efficient, due to possible
portability issues (we highlight an alternative form of efficient I/O in Sect. 5.2.2).
2.4 Input/Output (I/O) 23
character strings recognized by the terminal. The programmer would often want to
control this conversion process, to achieve the desired formatting.29 However, for
testing purposes, the read∗ and print∗ forms can be used, known as list-directed
I/O. These are demonstrated in the following program, which expects the user to
enter a name and date of birth (year, month, day), and returns the corresponding day
of the week:
program birthday_day_of_week
i m p l i c i t none
c h a r a c t e r ( len =20) : : name
i n t e g e r : : b i r t h D a t e (3) , year , month , day , d a y O f W e e k
integer , d i m e n s i o n (12) : : t = &
[ 0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4 ]
p r i n t * , " E n t e r name ( i n s i d e a p o s t r o p h e s / q u o t e s ): "
read * , name
p r i n t * , " Now , enter your birth date ( year , month , day ): "
read * , b i r t h D a t e
year = b i r t h D a t e (1); m o n t h = b i r t h D a t e (2); day = b i r t h D a t e (3)
if ( m o n t h < 3 ) then
year = year - 1
end if
! F o r m u l a of T o m o h i k o S a k a m o t o ( 1 9 9 3 )
! I n t e r p r e t a t i o n of result : Sunday = 0 , Monday = 1 , ...
dayOfWeek = &
mod ( ( year + year /4 - year /100 + year /400 + t ( month ) + day ) , 7)
p r i n t * , name , " was born on a "
s e l e c t case ( d a y O f W e e k )
case (0)
print *," Sunday "
case (1)
print *," Monday "
case (2)
print *," Tuesday "
case (3)
print *," Wednesday "
case (4)
print *," Thursday "
case (5)
print *," Friday "
case (6)
print *," Saturday "
end s e l e c t
end p r o g r a m b i r t h d a y _ d a y _ o f _ w e e k
Listing 2.6 src/Chapter2/birthday_day_of_week.f90
The part of the I/O statements following the comma is called an I/O list. For input,
this needs to consist of variables (also arrays), while for output any expression can
be used.
Previously, we mentioned the record metaphor used by Fortran; this needs to be
considered while feeding input at the terminal for a read-statement: each statement
expects its input to span (at least one) distinct line (=record), so before any subsequent
read∗-statement is executed, the file “cursor” would be advanced to the next record,
making it impossible to enter on a single line input for adjacent read∗-statements.
29This is discussed later, in Sect. 2.4.2; the process is controlled via an edit descriptor, which is
embedded in a format specification.
24 2 Fortran Basics
providing as input:
we need to split the data for the three variables over three lines (records), as in:
’Bremerhaven/Germany’<Enter>
125<Enter>
8<Enter>
As previously mentioned, this form of I/O is not recommended for anything but
quick testing, because it is limited from two points of view:
1. system-dependent format: the system will ensure that all data is visible, but the
outcome is frequently not satisfying, due to generous whitespace-padding, which
may often decrease readability; we discuss how to resolve this issue in Sect. 2.4.2.
2. fixed I/O-channels: input is only accepted from the keyboard, and output will
be re-directed to the screen.30 This becomes counter-productive as soon as the
volume of I/O increases; we discuss how to route the I/O-channels to files in
Sect. 2.4.3.
30 For C/C++ programmers: this is the Fortran equivalent to the stdin, stdout and stderr
streams.
2.4 Input/Output (I/O) 25
where σ = 5.67 × 10−8 Wm−2 K−4 is the Stefan-Boltzmann constant and, for
present-day, the average Earth albedo α = 0.3 and the annualy-averaged flux
of solar energy incident on the Earth is S0 = 1367 Wm−2 .
Write a program which evaluates this equation, computing Te . How does
the result change if S0 is 30 % lower? What about increasing α by 30 %?
Fortran allows precise control on how data is converted between the internal rep-
resentation and the textual representation used for formatted I/O. This is achieved
by specifying a format specification. In fact, the language provides three ways of
specifying the format:
1. asterisk ( ∗ ): this is what we have used so far. The effective format is platform-
dependent.
2. a character string expression (of default kind): this consists of a list of edit descrip-
tors, as will be explained in this section.
3. a statement label: this allows writing the format on a separate, labeled statement—
a technique that may be useful for structuring I/O statements better. However, we
do not emphasize this option, since the same effect can be obtained with character
strings.
The basic form of the output statement is:
p r i n t < format > [ , < I / O list >]
’( e d i t _ d e s c r i p t o r _ 1 , e d i t _ d e s c r i p t o r _ 2 , ... ) ’
OR
" ( e d i t _ d e s c r i p t o r _ 1 , e d i t _ d e s c r i p t o r _ 2 , ... ) "
where each edit descriptor in the comma-separated list corresponds to one or
more31 item(s) in the I/O list of the statement.
The task of the edit descriptor is to precisely specify how data is to be converted
from the internal representation to the character representation external to the pro-
gram (or the other way around). Fortran supports three types of edit descriptors,
which can be combined freely: data, character string, and control.
Data edit descriptors: This is the most important category, since it refers to the
actual data-conversion process. Such edit descriptors are composed of combinations
of characters and positive integers, as discussed shortly. In general, the numbers
represent the lengths of the different components in the text representation on the
external device side. For output of numeric types, a set of asterisks is printed if these
lengths are too small for representing the value.
Fortran provides different types of edit descriptors, for each of the intrinsic types.32
We present them below, using monospace-font for characters that need to be typed
literally, and italic-font for variables to be replaced by integer values. Note that char-
acters like − (negation), . (decimal point) and e or E (marker for exponent),
when they appear, are also accounted for in the values of the various field-width
variables.
• integer: either iw or iw.m may be used, where w specifies the width of
the field, and m specifies that, on output, at least m digits should be printed even
if they are leading zeroes (on input, the two forms are equivalent). Example:
i n t e g e r : : id = 0 , year =2012 , month =9 , day =1
integer , d i m e n s i o n (40) : : mask = 10
p r i n t * , " E n t e r ID ( i n t e g e r < 1 0 0 0 ) : "
read ’( i3 ) ’ , id
! echo id ( with l e a d i n g z e r o e s if < 100)
print ’( i3 ) ’ , id
! using m u l t i p l e edit d e s c r i p t o r s
print ’( i4 , i2 .2 , i2 .2) ’ , year , month , day
When the magnitude of the integers to be written is not known in advance, it is
convenient to use the i0 edit descriptor, which automatically sets the field-width
to the minimum value that can still contain the output value33 :
31 It is possible, and sometimes useful, to have less edit descriptors than elements in the I/O list. In
such situations, the edit descriptors are reused, while switching to a new record (for examples, see
Sect. 2.6.5).
32 Special facilities also exist for arrays and derived types. We discuss the former in Sect. 2.6.5,
after introducing the corresponding language elements. For the latter, see Metcalf et al. [10].
33 This form is highly recommended, as it relieves the programmer from bugs associated with
manually selecting the field width (corrupted, asterisks-filled fields can occur if the number of
digits in the number exceeds the expected field width). However, this makes the formatting of
values variable, and may not be appropriate for applications where precise control of alignment is
important (like compatibility with other programs, or for improving the clarity of the output). Also,
note that this approach does not work for input (where i0 would cause any input value to be set
to zero).
2.4 Input/Output (I/O) 27
print ’( i0 ) ’ , t e s t I n t ! works c o r r e c t l y for any value
Binary, octal, and hexadecimal (hex) integers: For some applications, it can be
useful to read/write integer-values in a non-decimal numeral system (bases 2, 8,
and 16 being the most frequent). This is easily achieved in Fortran, by replacing the
i with b (for binary), o (for octal) and z (for hexadecimal) respectively. The field-
width can also be specified or auto-determined, just like when using the decimal
base. The following program uses such edit descriptors to convert decimal values
to other bases (some new elements in the program will be covered later):
program int_dec_2_other_bases
i m p l i c i t none
integer : : inputInteger
! e l e m e n t s of this will be c l a r i f i e d later
w r i t e (* , ’( a ) ’ , a d v a n c e = ’ no ’) " E n t e r an i n t e g e r : "
! get number ( field width needs to be manually - s p e c i f i e d )
read ’( i20 ) ’ , i n p u t I n t e g e r
! ( s t r i n g in f o r m a t d i s c u s s e d later ) print ...
print ’( " b i n a r y : " , b0 ) ’ , i n p u t I n t e g e r ! ... min - width binary
print ’( " o c t a l : " , o0 ) ’ , i n p u t I n t e g e r ! ... min - width octal
print ’( " hex : " , z0 ) ’ , i n p u t I n t e g e r ! ... min - width hex
end p r o g r a m i n t _ d e c _ 2 _ o t h e r _ b a s e s
Listing 2.9 src/Chapter2/int_dec_2_other_bases.f90
• real: no less than seven types of edit descriptors are available for this type
(reflecting Fortran’s focus on numerical computing): fw.d , ew.d , ew.dee ,
esw.d , esw.dee , enw.d , enw.dee , where w denotes the total width of
the field, d the number of digits to appear after the decimal point, and e (when
present) the number of digits to appear in the exponent field.
The first type of edit descriptor (based on f ) is appropriate when the domain of
the values includes the origin, and does not span too many orders of magnitude (say,
0.01 x 1000). Otherwise, the e -variants, which use exponential notation,
are usually preferred. The different e -variants were introduced for supporting
various conventions for representing floating-point values used in different fields.
The distinction lies mainly in the way they scale the exponent, which correlates to
the range of the significant (= the rest of the number, after excluding the exponent).
This is summarized in Table 2.2 below.
Table 2.2 Prefixes for Prefix Resulting range for absolute value of
exponential notation in edit
significant
descriptors for real
e [0.1, 1.0)
en (“engineering”) [1, 1000)
es (“scientific”) [1, 10)
28 2 Fortran Basics
• complex: can be formatted using pairs of edit descriptors for real values.
• logical: supports the lw edit descriptor, where w denotes the width of the
field (if w = 1, T or F are supported, while w = 7 enables support for the
expanded notation of boolean values, i.e., .true. and .false. ). According
to the language standard, the width is mandatory.
• character strings: can be used with the a or aw edit descriptors, where the
first form automatically determines the necessary width to contain the string in
the I/O list. The second form allows manual specification of the width but, unlike
the similar mechanism for numbers, the value is not invalidated with asterisks if
the string in the I/O list is larger than w. Instead, the non-fitting part of the string
on the right-hand side is simply truncated. Alternatively, if w is larger than the
length of the string in the I/O list, the string will be right-justified.
All data edit descriptors can be preceded by a positive integer, when more values
for which the same format is appropriate appear in the I/O list. This is particularly
useful when working with arrays, as we will illustrate in Sect. 2.6.5.
Control edit descriptors: these do not assist in data I/O, but allow instructing the
I/O system to perform other operations related to the alignment of output. We only
discuss how to insert spaces and start a new line here (see Metcalf et al. [10] for other
details).
To insert spaces in output, use the nx edit descriptor, where n represents the
number of spaces to be inserted. Similarly, to start a new record (line) without issuing
another output statement, use the n/ edit descriptor, where n represents the number
34 However, unlike integer, the value of d remains important even in this case, since truncation
is usually inevitable when converting floating-point binary numbers to the decimal base.
2.4 Input/Output (I/O) 29
NOTE
The idea of counts (also known as “repeat counts”) in front of edit descriptors is
actually more general, since these can also appear in front of data edit descrip-
tors (e.g. ’(10i0)’), or even in front of groups of edit descriptors, surrounded
by parentheses (e.g. ’(5(f8.2, x))’). These are useful mostly when working
with arrays, therefore we discuss them in more detail in Sect. 2.6.5.
35 Note that, if the current record is not empty, the number of empty records inserted by such an
edit descriptor is n − 1.
36 In C, the equivalent statement would be: printf("An integer: %3d\nA real: %0.2f\n", anInt, aFloat);
30 2 Fortran Basics
• when some facts about the format are not known until actual program execution
(here, the string constant would impose switching between various hard-coded
formats)
Fortunately, format specifications can also be non-constant strings, constructed
dynamically at runtime. This can be used to address both issues above.37 The fol-
lowing program illustrates how such a specification can be used for multiple output
statements:
program string_variable_as_format_spec
i m p l i c i t none
integer : : a = 1, b = 2, c = 3
real : : d = 3.1 , e = 2.2 , f = 1.3
! format - s p e c i f i e r to be reused ( could also use deffered - l e n g t h )
c h a r a c t e r ( len =*) , p a r a m e t e r : : o u t p u t F o r m a t = ’( i0 , 3 x , f0 .10) ’
print outputFormat , a , d
print outputFormat , b , e
print outputFormat , c , f
end p r o g r a m s t r i n g _ v a r i a b l e _ a s _ f o r m a t _ s p e c
Listing 2.13 src/Chapter2/string_variable_as_format_spec.f90
The I/O statements discussed in the previous sections used the standard I/O channels:
we always assumed that input is directed from the keyboard, and output is appearing
on the screen. However, Fortran also allows the use of other channels (files or even
character-strings), as will be discussed in this section.
Any I/O-channel (e.g. keyboard, screen, or a file on disk) is mapped to a unit. To
distinguish between the various channels, each unit is identified by an integer
unit-number, which is either
• selected by the platform (usually “5” represents standard-input from keyboard,
and “6” standard-output to screen), or
• specified by the programmer (examples of this shown later).
General I/O-statements: The simplified forms of the I/O statements discussed pre-
viously (print and read) do not support customization of I/O channels. To gain
more control, the general I/O statements (write and read38 ) need to be used,
which we introduce below:
! g e n e r a l form of input s t a t e m e n t
read ([ unit =] u [ , fmt = fm1 ] [ , i o s t a t = s t a t C o d e ] [ , err = lbl1 ] [ , end = lbl2 ]) &
[ inputList ]
! g e n e r a l form of output s t a t e m e n t
w r i t e ([ unit =] u [ , fmt = fm1 ] [ , i o s t a t = s t a t C o d e ] [ , err = lbl1 ]) [ o u t p u t L i s t ]
37 There is also the option to use format-statements, as we mentioned previously. However, their
usefulness is limited to the first issue, which is why we chose not to describe them—see Metcalf
et al. [10] for details.
38 The general input statement has the same name as the simplified form, but observe the other
differences.
2.4 Input/Output (I/O) 31
As usual, the square brackets denote optional items. The unit-number (u) and
the format specification ( fm1) are the only mandatory items (optionally, they can be
preceded by unit= and fmt= respectively, to improve readability). Both of these
items can be set to ∗ , to recover the particular forms of I/O we already presented:
program general_can_recover_special_io
i m p l i c i t none
integer : : anInteger
! s p e c i a l forms , d e f a u l t f o r m a t t i n g ...
read *, anInteger ! input
p r i n t * , " You e n t e r e d : " , a n I n t e g e r ! o u t p u t
! ... e q u i v a l e n t g e n e r a l forms , d e f a u l t f o r m a t t i n g
read (* , *) a n I n t e g e r ! input
w r i t e (* , *) " You e n t e r e d : " , a n I n t e g e r ! o u t p u t
! s p e c i a l forms , c u s t o m f o r m a t t i n g ...
read ’( i20 ) ’ , a n I n t e g e r ! input
print ’( " You e n t e r e d : " , i0 ) ’ , a n I n t e g e r ! o u t p u t
! ... e q u i v a l e n t g e n e r a l forms , c u s t o m f o r m a t t i n g
read (* , ’( i20 ) ’) a n I n t e g e r ! input
w r i t e (* , ’( " You e n t e r e d : " , i0 ) ’) a n I n t e g e r ! o u t p u t
end p r o g r a m g e n e r a l _ c a n _ r e c o v e r _ s p e c i a l _ i o
Listing 2.14 src/Chapter2/general_can_recover_special_io.f90
39 In Fortran, every statement can be given a label, which is simply a positive integer (of at most
5 digits), written before the statement. These provide “bookmarks” within the code, allowing the
program to “jump” to that statement when necessary—either transparently to the user (when the
jump results from error handling), or explicitly (using the controversial go to statement). Please
note that explicit jumps with go to are strongly discouraged, as they can quickly make programs
difficult to understand!
32 2 Fortran Basics
The following program illustrates how these extra arguments may be used:
program read_with_error_recovery
i m p l i c i t none
i n t e g e r : : s t a t C o d e =0 , x
! The s a f e g u a r d e d READ - s t a t e m e n t
read ( unit =* , fmt =* , i o s t a t = statCode , err =123 , end = 1 2 4 ) x
print ’( a , 1 x , i0 ) ’ , " R e c e i v e d n u m b e r " , x
! N o r m a l p r o g r a m t e r m i n a t i o n - point , when no e x c e p t i o n s occur
stop
123 w r i t e (* , ’( a , 1x , i0 ) ’) &
" READ e n c o u n t e r e d an E R R O R ! i o s t a t = " , s t a t C o d e
! can insert here code to r e c o v e r from error , if p o s s i b l e ...
stop
124 w r i t e (* , ’( a , 1x , i0 ) ’) &
" READ e n c o u n t e r e d an end - of - f i l e ! i o s t a t = " , s t a t C o d e
! can insert here code to r e c o v e r from error , if p o s s i b l e ...
stop
end p r o g r a m r e a d _ w i t h _ e r r o r _ r e c o v e r y
Listing 2.15 src/Chapter2/read_with_error_recovery.f90
Exercise 3 (Testing error recovery) Compile the program listed above, and try
providing different types of input data, to test how the error-handling mecha-
nism works.
Hints: try providing (a) a valid integer-value, (b) a string and (c) an end-
of-file character (on Unix: type CTRL+d).
The three phases of I/O: Working with external data channels in Fortran implies
the following sequence of phases:
1. establishing the link: before the I/O system can use a unit, a link needs to
be established and a unique unit-number assigned. For standard I/O (keyboard/
screen), the channels are pre-connected by the Fortran runtime system, without
any intervention from the programmer.
However, for all other cases the link has to be established explicitly, with the
open-statement. From the programmer’s point of view, the most important effect
of this statement is to associate a unit-number to the actual data channel. This
number is necessary for the next steps (e.g. when the actual I/O takes place).
Currently, there are two methods for performing this association:
a. Until Fortran 2003, the programmer was responsible for explicitly selecting
a positive integer-value for the unit-number. For working with ASCII
files,40 the open-statement would then commonly look like:
40Creating “binary” files is also possible, but we avoid discussing this, in favor of another format
which is more appropriate in ESS, i.e., netCDF (see Sect. 5.2.2).
2.4 Input/Output (I/O) 33
open ([ unit =] u n i t N u m [ , file = f i l e N a m e ] &
[, status = statusString ] [, action = actionString ] &
[ , i o s t a t = s t a t C o d e ] [ , err = l a b e l E r r o r H a n d l i n g ] &
)
where:
• unitNum is a positive integer variable/constant, assigned by the
programmer. This will be used by the actual I/O statements.
• fileName is a character-string, representing the actual name of the
file in the file system.41 This can be omitted only when statusString
=="scratch" (which is useful for creating temporary files, managed by
the system, and usually deleted when the program terminates).
• statusString is one of the following strings: "old", "new", "replace",
"scratch" or "unknown" (= default). This can be used to enforce some
assumptions related to the status of the file prior to opening it.
• actionString is one of the strings: "read", "write" or "readwrite".
This is useful for limiting the set of I/O statements that can be used with
the unit, which can help prevent bugs.
• statCode and labelErrorHandling have the same roles as
statCode and lbl2 in the preceding discussion on error-handling.
The following listing presents some examples:
10 integer : : statCode
11 real : : w i n d U x =1.0 , w i n d U y =2.0 , p r e s s u r e =3.0
12
13 ! a s s u m i n g file " wind . dat " exists , open it for reading , s e l e c t i n g
14 ! the value of 20 as unit - id ; no error - h a n d l i n g
15 open ( unit =20 , file = " wind . dat " , s t a t u s = " old " , a c t i o n = " read " )
16
17 ! open file " p r e s s u r e . dat " for w r i t i n g ( c r e a t i n g it if it does not
18 ! exist , or d e l e t i n g and re - c r e a t i n g it if it e x i s t s ) , s e l e c t i n g
19 ! the value of 21 as unit - id ; place in variable ’ statCode ’ the
20 ! r e s u l t of the open - o p e r a t i o n
21 open ( unit =21 , file = " p r e s s u r e . dat " , s t a t u s = " r e p l a c e " , &
22 a c t i o n = " write " , i o s t a t = s t a t C o d e )
23
24 ! open a scratch - file , for s t o r i n g some i n t e r m e d i a t e - r e s u l t ( w h i c h
25 ! we need to read later ) , that would be too large to keep in memory ;
26 ! no error - h a n d l i n g
27 open ( unit =22 , s t a t u s = " s c r a t c h " , a c t i o n = " r e a d w r i t e " )
41 Note that there might be some system-dependent restrictions on what constitutes a valid filename.
34 2 Fortran Basics
open ([ n e w u n i t =] u n i t V a r i a b l e [ , file = f i l e N a m e ] &
[, status = statusString ] [, action = actionString ] &
[ , i o s t a t = s t a t C o d e ] [ , err = l a b e l E r r o r H a n d l i n g ] &
)
Note that, with this new method, it is not possible anymore to use constants
for the newunit-value—only integer variables are accepted. This is
because, when the open-statement is invoked, the runtime system will need
to update unitVariable.42
With this new method, the examples presented above can be re-written as:
13 i n t e g e r : : statCode , windFileID , p r e s s u r e F i l e I D , s c r a t c h F i l e I D
14 real : : w i n d U x =1.0 , w i n d U y =2.0 , p r e s s u r e =3.0
15 ! a s s u m i n g file " wind . dat " exists , open it for reading , and store an
16 ! ( a u t o m a t i c a l l y - a c q u i r e d ) unit - n u m b e r in variable ’ windFileID ’; no
17 ! error - h a n d l i n g
18 open ( n e w u n i t = windFileID , file = " wind . dat " , s t a t u s = " old " , &
19 a c t i o n = " read " )
20
21 ! open file " p r e s s u r e . dat " for w r i t i n g ( c r e a t i n g it if it does not
22 ! exist , or d e l e t i n g and re - c r e a t i n g it if it e x i s t s ) , w h i l e s t o r i n g
23 ! the ( a u t o m a t i c a l l y - a c q u i r e d ) unit - n u m b e r in variable ’ p r e s s u r e F i l e I D ’;
24 ! place in variable ’ statCode ’ the r e s u l t of the open - o p e r a t i o n
25 open ( n e w u n i t = p r e s s u r e F i l e I D , file = " p r e s s u r e . dat " , s t a t u s = " r e p l a c e " , &
26 a c t i o n = " write " , i o s t a t = s t a t C o d e )
27
28 ! open a scratch - file , s t o r i n g the ( a u t o m a t i c a l l y - a c q u i r e d ) unit - n u m b e r
29 ! in variable ’ s c r a t c h F i l e I D ’; no error - h a n d l i n g
30
open ( n e w u n i t = s c r a t c h F i l e I D , s t a t u s = " s c r a t c h " , a c t i o n = " r e a d w r i t e " )
Good practice
Due to its convenience, we recommend to use this second method (using
newunit) when opening files. We also rely on this technique in the later
examples for this book (especially in Chap. 4).
2. actual I/O calls: the second phase corresponds to issuing the actual I/O-
statements, for the data we want to read or write. We discussed this in the previous
sections; the only change necessary for file I/O is that the ∗ used until now for
the unit-id needs to be replaced by the appropriate variable, as associated in
advance within the open-statement. For example (continuing the example from
the previous listing):
32 ! ... some code to c o m p u t e p r e s s u r e ...
33 read ( windFileID , *) windUx , w i n d U y
34
35 ! d i s p l a y on - s c r e e n the values read from the " wind . dat " - file
36 write (* , ’( " w i n d U x = " , 1x , f0 .8 , 2x , " w i n d U y = " , 1x , f0 .8) ’) &
37 windUx , w i n d U y
42The standard specifies that a negative value (but different from −1 , which signals an error) will
be chosen for unitVariable, to avoid clashes with any existing code that uses the previous
method of assigning unit-numbers, where positive numbers had to be used.
2.4 Input/Output (I/O) 35
38
39 ! write to scratch - file ( here , only for i l l u s t r a t i o n - p u r p o s e ; this makes
40 ! more sense if ’ pressure ’ is a large array , which we would want to modify ,
41 ! or d e a l l o c a t e afterwards , to save memory )
42 write ( s c r a t c h F i l e I D , ’( f10 .6) ’) p r e s s u r e ! w r i t e to s c r a t c h
43 ! re - p o s i t i o n file cursor at b e g i n n i n g of the scratch - file
44 rewind scratchFileID
45 ! ... after some time , re - load the ’ pressure ’ - data from the scratch - file
46 read ( s c r a t c h F i l e I D , ’( f10 .6) ’) p r e s s u r e
47
48 ! write final data to " p r e s s u r e . dat " - file
49
write ( p r e s s u r e F i l e I D , ’( f10 .6) ’) p r e s s u r e *2
3. closing the link: unlike the first phase (establishing the link), the system will
automatically close the link to any active unit, if the program completes nor-
mally. It is, however, still recommended for the programmer to perform this step
manually, to avoid losing data in case an exception occurs.43 To terminate the
link to a unit, the close-statement can be used:
c l o s e ([ unit =] u n i t N u m [ , s t a t u s = s t a t u s S t r i n g ]
[ , i o s t a t = s t a t C o d e ] [ , err = l a b e l E r r o r H a n d l i n g ]
)
Internal files: In addition to units, the general I/O statements in Fortran can also
operate on internal files (which are simply buffers, stored as strings or arrays of
strings).44
Internal files are similar, in a sense, to the scratch files that we described earlier,
since they are normally used for temporarily holding data which need to be manipu-
lated at a later stage of the program’s execution. However, because they are resident in
43 Such data loss can occur when writing to files, since most platforms use buffering mechanisms
for temporarily storing output data, to compensate for the slow speed of the permanent storage
devices (e.g. disks).
44 Strictly speaking, these do not form true I/O operations (the buffers are still memory areas
associated with the program, so no external system is involved), but it is convenient to treat them
as such (as done for the equivalent stringstream class in C++).
36 2 Fortran Basics
memory, they are usable only for smaller amounts of data. One application of internal
files is type conversion between numbers and strings—for example, to dynamically
construct names for the output files of an iterative model, at each time step.45 One
approach to achieve this is shown in the listing below:
1 program timestep_filename_construction
2 i m p l i c i t none
3 c h a r a c t e r (40) : : a u x S t r i n g ! i n t e r n a l file (= string )
4 i n t e g e r : : i , n u m T i m e s t e p s = 10 , s p e e d F i l e I D
5
6 ! do is for l o o p i n g over an i n t e g e r i n t e r v a l ( d i s c u s s e d soon )
7 do i =1 , n u m T i m e s t e p s
8 ! write t i m e s t e p into a u x S t r i n g
9 w r i t e ( auxString , ’( i0 ) ’) i
10 ! open file for writing , with custom f i l e n a m e
11 open ( n e w u n i t = s p e e d F i l e I D , &
12 file = " s p e e d _ " // trim ( a d j u s t l ( a u x S t r i n g )) // " . dat " , &
13 action =" write ")
14
15 ! here , we would have model - code , for c o m p u t i n g the s p e e d and w r i t i n g
16 ! it to file ...
17
18 close ( speedFileID )
19 end do
20 end p r o g r a m t i m e s t e p _ f i l e n a m e _ c o n s t r u c t i o n
Listing 2.20 src/Chapter2/timestep_filename_construction.f90
Non-advancing I/O: We illustrated towards the end of Sect. 2.4.1 how, unlike other
languages, Fortran automatically advances the file-position with each I/O statement,
to the beginning of the next record. However, this can be turned off for a particular
I/O-statement, by setting the optional control specification advance to "no" (default
value is "yes"). This is often used when data is requested from the user, in which case
it is desirable to have the prompt and the user input on the same line. We already
used this technique, in Listings 2.9 and 2.10.
So far, we discussed some basic forms of I/O, which are useful in common practice.
However, these approaches do not scale well to the data throughput of state of the
art ESS models (currently, in the terrabyte range for high-resolution models with
global coverage). Text (“formatted”) files are ineffective for handling such amounts
of data, since each character in the file still occupies a full byte. If we imagine a very
simple file which only contains the number 13, the ASCII-representation will occupy
2 bytes = 16 bits. In addition, to mark the end of each record, a newline character
(Unix) or carriage-return + newline (Windows) needs to be added for every row in
the file. Thus, the total space requirement for storing our number in a file will be of
3 bytes on Unix, and 4 bytes on Windows systems, respectively.
45 Here, we imply there is one output file for each time step, to illustrate the idea. Note, however,
that this may not always be a good approach. In particular, when the number of time steps is large, it
is more convenient to write several time steps in each file (this is supported by the netCDF-format,
which we will describe in Sect. 5.2.2).
2.4 Input/Output (I/O) 37
Alternatively, if we choose to store the data directly in binary form, 4 bits would
already be sufficient in theory to represent the number 13 (however, this is half of
the smallest unit of storage—on most systems, the file would finally occupy 1 byte).
These calculations illustrate that there is a large potential for reducing the final size
of the files, even without advanced compression algorithms, just by storing data in
the binary format instead of the ASCII representation. Other advantages include:
• less CPU-time spent for I/O operations: the conversion to/from ASCII also
increases the execution time of the program, by an amount that can become com-
parable to the time spent for actual computations
• approximation errors: especially when working with floating-point data, approxi-
mation errors can be introduced each time a conversion between binary and ASCII
representations takes place
While the benefits of binary storage are significant, it does have the problem that
interpretation of data is made more difficult.46 The importance of this cannot be
overstated, which is why it is not recommended to use the binary format directly
in most cases: a much more convenient solution in ESS is to use the netCDF data
format, which allows efficient storage in a platform-independent way. We cover this
topic later, in Sect. 5.2.2, after introducing some more language features.
Most programs shown so far consisted of instructions that were executed in sequence.
However, in real applications it is often necessary to break this ordering, as some
blocks of instructions may need to be executed (once or several times) only when
certain conditions are met. The generic name for such constructs is (program) flow-
control, and Fortran has several of them, as we discuss in this section.
Style recommendation: In the examples below, we indent each block of program
instructions, to clearly reflect situations when their execution is conditioned by a
certain flow-control construct. Indentation is not required by the language (the com-
piler eventually removes whitespace anyway), but it greatly improves the clarity of
the code, especially when multiple flow-control constructs are nested. We highly
recommend this practice.
2.5.1 if Construct
The simplest form of flow-control can be achieved with the if-statement which,
in its most basic form, executes a block of code only when a certain scalar logical
condition is satisfied. This is illustrated by the following program, which asks for a
number, and informs the user in case it is odd:
program number_is_odd
i m p l i c i t none
integer : : inputNum
w r i t e (* , ’( a ) ’ , a d v a n c e = " no " ) " E n t e r an i n t e g e r n u m b e r : "
read (* , *) i n p u t N u m
! NOTE : mod is an i n t r i n s i c function , r e t u r n i n g the r e m a i n d e r
! of d i v i d i n g first a r g u m e n t by the s e c o n d one ( both i n t e g e r s )
if ( mod ( inputNum , 2) /= 0 ) then
w r i t e (* , ’( i0 , a ) ’) inputNum , " is odd "
end if
end p r o g r a m n u m b e r _ i s _ o d d
Listing 2.21 src/Chapter2/number_is_odd.f90
In this case (when there is only one branch in the if), the corresponding code
can be made even more compact, on a single line47 :
if ( mod ( num , 2) /= 0 ) w r i t e (* , ’( i0 , a ) ’) num , " is odd "
We may wish to extend the previous example, such that a message is printed also
when the number is even. This can also be achieved with if, which supports an
(optional) else-branch:
if ( mod ( num , 2) /= 0 ) then
w r i t e (* , ’( i0 , a ) ’) num , " is odd "
else
w r i t e (* , ’( i0 , a ) ’) num , " is even "
end if
To illustrate, assume that we need to extend our previous example such that, when
the number is even, we inform the user if it is zero. This can be implemented as in:
if ( mod ( num , 2) /= 0 ) then
w r i t e (* , ’( i0 , a ) ’) num , " is odd "
! num is odd , now check if it is zero
else if ( num == 0 ) then
w r i t e (* , ’( i0 , a ) ’) num , " is zero "
! default ," catch - all " branch , if all tests fail
else
w r i t e (* , ’( i0 , a ) ’) num , " is non - zero and even "
end if
47 Note that the keywords then and end if do not appear in the compact form.
2.5 Program Flow-Control Elements (if, case, Loops, etc.) 39
Other constructs (including other if-statements) can appear within each of the
branches of the conditional.48 It is recommended to moderate this practice (since it
can easily lead to code that is hard to follow), but sometimes it cannot be avoided. In
such cases, proper indentation becomes crucial. Also helpful is the fact that Fortran
allows ifs (as well as the rest of the flow-control constructs) to be named, to make it
clear to which construct a certain branch belongs; when names are used, the branches
need to bear the same name as the parent construct. This is illustrated in the following
(artificial and a little extreme) example, which asks the user for a 3-letter string, and
then reports the corresponding northern hemisphere season49 :
program season_many_nested_ifs
i m p l i c i t none
c h a r a c t e r ( len =30) : : line
w r i t e (* , ’( a ) ’ , a d v a n c e = " no " ) " E n t e r 3 - l e t t e r s e a s o n a c r o n y m : "
read (* , ’( a ) ’) line
if ( l e n _ t r i m ( line ) == 3 ) then
w i n t e r : if ( trim ( line ) == " djf " ) then
w r i t e (* , ’( a ) ’) " S e a s o n is : w i n t e r "
else if ( trim ( line ) == " DJF " ) then w i n t e r
w r i t e (* , ’( a ) ’) " S e a s o n is : w i n t e r "
else w i n t e r
s p r i n g : if ( trim ( line ) == " mam " ) then
w r i t e (* , ’( a ) ’) " S e a s o n is : s p r i n g "
else if ( trim ( line ) == " MAM " ) then s p r i n g
w r i t e (* , ’( a ) ’) " S e a s o n is : s p r i n g "
else s p r i n g
s u m m e r : if ( trim ( line ) == " jja " ) then
w r i t e (* , ’( a ) ’) " S e a s o n is : s u m m e r "
else if ( trim ( line ) == " JJA " ) then s u m m e r
w r i t e (* , ’( a ) ’) " S e a s o n is : s u m m e r "
else s u m m e r
a u t u m n : if ( trim ( line ) == " son " ) then
w r i t e (* , ’( a ) ’) " S e a s o n is : a u t u m n "
else if ( trim ( line ) == " SON " ) then a u t u m n
w r i t e (* , ’( a ) ’) " S e a s o n is : a u t u m n "
else a u t u m n
w r i t e (* , ’(5 a ) ’) &
’ " ’, trim ( line ) , ’" ’, " is not a v a l i d a c r o n y m " , &
" for a s e a s o n ! "
end if a u t u m n
end if s u m m e r
end if s p r i n g
end if w i n t e r
else
w r i t e (* , ’(5 a ) ’) &
’ " ’, trim ( line ) , ’" ’, " is c a n n o t be a v a l i d a c r o n y m " , &
" for a season , b e c a u s e it does not have 3 c h a r a c t e r s ! "
end if
end p r o g r a m s e a s o n _ m a n y _ n e s t e d _ i f s
Listing 2.22 src/Chapter2/season_many_nested_ifs.f90
Note that, while indentation and naming of constructs are helpful, the resulting
code still looks complex, which is why we do not recommend including such extreme
forms of nesting in real applications. For the current example, there is a way to
simplify the logic using the case-construct, discussed next.
Note on spacing: In Fortran, several keywords (especially for marking the termina-
tion of a flow-control construct) can be written with or, equivalently, without spaces
48 The process is called nesting. When used, nesting has to be complete, in the sense that the
“parent”-construct must include the “child”-construct entirely (it is not allowed to have only partial
overlap between the two).
49 This is a common convention in ESS, where DJF = winter, MAM = spring, JJA = summer,
and SON = autumn (for the northern hemisphere). The acronyms are obtained by joining the first
letters of the months in each season.
40 2 Fortran Basics
in between. For example, endif is equivalent to end if, and enddo (discussed
later)—to end do. This is more a matter of developer preferences.
For specifying ranges of values, it is even allowed to omit the lower or the higher
bound (but not both), which allows ranges to extend to the smallest (negative) and
largest (positive) representable integer-value.50 This is used in the next code
listing, which asks the user to enter an integer value, and checks if the number is a
valid index for a calendar month:
program check_month_index_select_case_partial_ranges
i m p l i c i t none
integer : : month
w r i t e (* , ’( a ) ’ , a d v a n c e = " no " ) " E n t e r an integer - v a l u e : "
read (* , *) m o n t h
! check if month is valid month - index , with p a r t i a l
! ( semi - open ) ranges in a select - case c o n s t r u c t
select case ( month )
case ( :0 , 13: )
w r i t e (* , ’( a , i0 , a ) ’) " e r r o r : " , &
month , " is not a v a l i d month - i n d e x "
case d e f a u l t
w r i t e (* , ’( a , i0 , a ) ’) " ok : " , month , &
" is a v a l i d month - i n d e x "
end s e l e c t
end p r o g r a m c h e c k _ m o n t h _ i n d e x _ s e l e c t _ c a s e _ p a r t i a l _ r a n g e s
Listing 2.24 src/Chapter2/check_month_index_select_case_par−
tial_ranges.f90
Using the case-construct can lead to great simplifications of what would other-
wise be complex, nested if-contraptions. For example, the season-acronym match-
ing program, could be re-written as:
program season_select_case
i m p l i c i t none
c h a r a c t e r ( len =30) : : line
w r i t e (* , ’( a ) ’ , a d v a n c e = " no " ) " E n t e r 3 - l e t t e r s e a s o n a c r o n y m : "
read (* , ’( a ) ’) line
if ( l e n _ t r i m ( line ) == 3 ) then
s e a s o n _ m a t c h : s e l e c t c a s e ( trim ( line ) )
case ( " djf " ," DJF " ) s e a s o n _ m a t c h
w r i t e (* , ’( a ) ’) " S e a s o n is : w i n t e r "
case ( " mam " ," MAM " ) s e a s o n _ m a t c h
w r i t e (* , ’( a ) ’) " S e a s o n is : s p r i n g "
case ( " jja " ," JJA " ) s e a s o n _ m a t c h
w r i t e (* , ’( a ) ’) " S e a s o n is : s u m m e r "
case ( " son " ," SON " ) s e a s o n _ m a t c h
w r i t e (* , ’( a ) ’) " S e a s o n is : a u t u m n "
case d e f a u l t s e a s o n _ m a t c h
w r i t e (* , ’(5 a ) ’) &
’" ’, trim ( line ) , ’" ’, " is not a v a l i d a c r o n y m " , &
" for a s e a s o n ! "
end s e l e c t s e a s o n _ m a t c h
else
w r i t e (* , ’(5 a ) ’) &
’" ’, trim ( line ) , ’" ’, " is c a n n o t be a v a l i d a c r o n y m " , &
" for a season , b e c a u s e it does not have 3 c h a r a c t e r s ! "
end if
end p r o g r a m s e a s o n _ s e l e c t _ c a s e
Listing 2.25 src/Chapter2/season_select_case.f90
where we also demonstrated how to assign a name (in this example: season_
match) to the case-construct.
2.5.3 do Construct
The flow-control constructs discussed so far (if and case) allow us to deter-
mine whether blocks of code need to be executed or not. Another pattern, which
is extremely important in modeling, is to execute certain blocks of code repeatedly,
until some termination criterion is satisfied. This pattern (also known as iteration) is
supported in Fortran through the do-construct, which we describe in this section.
The simplest form of iteration uses an integer-counter, as in the following
example:
integer : : i
do i = -15 , 10
! block of statements , to be e x e c u t e d for each i t e r a t i o n
w r i t e (* , ’( i0 ) ’) i
end do
Here, the variable i is also known as the loop counter, and needs to be of integer
type. The numbers on line 2 represent the lower (−15) and upper bound (10). For
each value in this range, the block of statements within the do-loop will be executed.
Within this block, the value of i can be read (e.g. it can appear in expressions), but
it cannot be modified.
By default, the loop counter is incremented by one at the end of each iteration. Fortran
also allows to specify a different increment, as a third number at the beginning of
the do-construct. This allows, for example, incrementing the loop counter in larger
steps, or even decrementing it, to scan the range of numbers backwards. For example:
! i t e r a t e from 0 to 100 , in steps of 25
do i =0 , 100 , 25
! block of s t a t e m e n t s
end do
! i t e r a t e backward , from 8 to -8 , in steps of 2
do i =8 , -8 , -2
! block of s t a t e m e n t s
end do
In our examples so far, we always used integral literals for the start-, end-, and
increment-values of the loop counter. However, the language also allows these to
be integer-variables, or even more complex expressions involving variables. In
such cases, the variables can be altered within the loop, but this has no influence
whatsoever on the progress of the loop, since only the initial values are used for
“planning” the loop. For example, in the following listing, the assignments on lines
6 and 7 have no impact on the loop:
2.5 Program Flow-Control Elements (if, case, Loops, etc.) 43
1 program do_specified_with_expressions
2 i m p l i c i t none
3 i n t e g e r : : t i m e M a x = 10 , step = 1 , i , n u m L o o p T r i p s = 0
4
5 do i =1 , timeMax , step
6 timeMax = timeMax / 2
7 step = step * 2
8 numLoopTrips = numLoopTrips + 1
9 w r i t e (* , ’( a , i0 , a , / , 3( a , i0 , /)) ’) &
10 " Loop body e x e c u t e d " , n u m L o o p T r i p s , " t i m e s " , &
11 "i = ", i, &
12 " t i m e M a x = " , timeMax , &
13 " step = " , step
14 end do
15
end p r o g r a m d o _ s p e c i f i e d _ w i t h _ e x p r e s s i o n s
16
where both latitudes (λ{E,W} ) and longitudes (φ {S,N} ) are given in radians.
2
σθ (y, z) = 0.9184 − G(y, z) + 1 + 0.9184 arccos2
1 y
√ 2− + 26.57 kg/m3 (2.1)
G(y, z) H
y 2 z 2
G(y, z) = 2 − + 0.1 + (2.2)
H H
44 2 Fortran Basics
Fig. 2.1 Idealized profile of potential density (σθ ), based on Eqs. (2.1)–(2.2)
where:
• y ∈ [0, L], with: L = 1000 km
• z ∈ [0, H ], with: H = 4 km
This can be viewed as an idealized profile of the density structure in some
part of the ocean (Fig. 2.1).
Assuming the extent along the x-axis (perpendicular to the figure) is of
100 km, compute the fraction of total volume occupied by water whose poten-
tial density matches the range typical for upper Labrador Sea Water (uLSW),
which is:
σθuLSW ∈ [27.68, 27.74] kg m−3
do
! block of s t a t e m e n t s
end do
This form truly has the tendency to run endlessly,52 and it is the responsibility of
the programmer to devise a suitable termination criterion, and to end the execution
of the loop with the exit-statement. This is illustrated in the following listing,
which demonstrates a way to solve the file-reading problem described above, where
a suitable loop-termination criterion is that the end-of-file was reached while trying
to read-in data:
1 program mean_and_standard_deviation_from_file
2 i m p l i c i t none
3 i n t e g e r : : statCode , n u m V a l s =0 , i n F i l e I D
4 real : : mean =0.0 , v a r i a n c e =0.0 , sd =0.0 , newValue , &
5 s u m V a l s =0.0 , s u m V a l s S q r =0.0
6
7 ! open file for r e a d i n g
8 open ( n e w u n i t = inFileID , file = " t i m e _ s e r i e s . dat " , a c t i o n = " read " )
9
10 ! " i n f i n i t e " DO - loop , to read an u n k n o w n a m o u n t of data - values
11 d a t a _ r e a d i n g _ l o o p : do
12 read ( inFileID , * , i o s t a t = s t a t C o d e ) n e w V a l u e
13 ! check if e x c e p t i o n was raised during read - o p e r a t i o n
14 if ( s t a t C o d e /= 0 ) then ! ** TERMINATION - C R I T E R I O N for DO - loop **
15 exit d a t a _ r e a d i n g _ l o o p
16 else ! datum read s u c c e s s f u l
17 numVals = numVals + 1
18 sumVals = sumVals + newValue
19 s u m V a l s S q r = s u m V a l s S q r + n e w V a l u e **2
20 end if
21 end do d a t a _ r e a d i n g _ l o o p
22
23 ! close file
24 close ( inFileID )
25
26 ! e v a l u a t e mean ( a v o i d i n g d i v i s i o n by zero , when file is empty )
27 if ( n u m V a l s > 0 ) mean = s u m V a l s / n u m V a l s
28 ! e v a l u a t e 2 nd central - moment ( v a r i a n c e )
29 v a r i a n c e = ( s u m V a l s S q r - n u m V a l s * mean **2) / ( n u m V a l s - 1)
30 ! e v a l u a t e standard - d e v i a t i o n from v a r i a n c e
31 sd = sqrt ( v a r i a n c e )
32
33 w r i t e (* , ’(2( a , f10 .6)) ’) " mean = " , mean , &
34 " , sd = " , sd
35 end p r o g r a m m e a n _ a n d _ s t a n d a r d _ d e v i a t i o n _ f r o m _ f i l e
Listing 2.27 src/Chapter2/mean_and_standard_deviation_from_
file.f90
where we used the fact that:
N
N
(x − x̄) 2
1
= ··· =
i=1 i
s{X } ≡ var {X } = xi2 − N x̄ 2
N −1 N −1
i=1
Another pattern that occurs sometimes while working with loops is skipping over
parts of the code within the loop’s body, when certain conditions are met, without
leaving the loop. For example, assume we are writing a program which converts
a given number of seconds into a hierarchical representation (weeks, days, hours,
minutes, and seconds). Clearly, the number of seconds provided by the user should be
positive for the algorithm to work. If the user provides a negative integer, it does not
make sense to try to find a hierarchical representation of the period; instead, it would
be more useful to skip the rest of the code within the loop, and proceed to the next loop
iteration directly, where the user has the opportunity to provide another input value.
This type of behavior is supported in Fortran, using the cycle [loop_name] 53
command, as illustrated in the following example:
program do_loop_using_cycle
i m p l i c i t none
integer , p a r a m e t e r : : S E C _ I N _ M I N = 60 , &
S E C _ I N _ H O U R = 60* S E C _ I N _ M I N , & ! 60 m i n u t e s in hour
S E C _ I N _ D A Y = 24* S E C _ I N _ H O U R , & ! 24 hours in a day
S E C _ I N _ W E E K = 7* S E C _ I N _ D A Y ! 7 days in a week
i n t e g e r : : secIn , weeks , days , hours , minutes , sec
do
w r i t e (* , ’(/ , a ) ’ , a d v a n c e = " no " ) & ! ’/ ’ adds newline , for s e p a r a t i o n
" E n t e r n u m b e r of s e c o n d s ( or 0 to exit the p r o g r a m ): "
read (* , *) s e c I n
if ( s e c I n == 0 ) then ! loop - t e r m i n a t i o n c r i t e r i o n
exit
else if ( s e c I n < 0 ) then ! s k i p p i n g c r i t e r i o n
w r i t e (* , ’( a ) ’) " E r r o r : n u m b e r of s e c o n d s s h o u l d be " // &
" p o s i t i v e . Try a g a i n ! "
c y c l e ! ** c a l c u l a t i o n s k i p p e d with CYCLE **
end if
! c a l c u l a t i o n using the value
sec = s e c I n ! backup value
w e e k s = sec / S E C _ I N _ W E E K ; sec = mod ( sec , S E C _ I N _ W E E K )
days = sec / S E C _ I N _ D A Y ; sec = mod ( sec , S E C _ I N _ D A Y )
h o u r s = sec / S E C _ I N _ H O U R ; sec = mod ( sec , S E C _ I N _ H O U R )
m i n u t e s = sec / S E C _ I N _ M I N ; sec = mod ( sec , S E C _ I N _ M I N )
! d i s p l a y final h i e r a r c h y
w r i t e (* , ’(6( i0 , a )) ’) secIn , " s = { " , &
weeks , " weeks , " , days , " days , " , &
hours , " hours , " , minutes , " minutes , " , &
sec , " s e c o n d s } "
end do
end p r o g r a m d o _ l o o p _ u s i n g _ c y c l e
Listing 2.28 src/Chapter2/do_loop_using_cycle.f90
53 loop_name is an optional name, which allows to clarify to which loop the cycle-
command should be applied, in case of multiple nested do-loops.
2.5 Program Flow-Control Elements (if, case, Loops, etc.) 47
Exercise 8 (Working with another platform) Use the program developed for the
previous exercise to test the kind-values for a different platform (hardware
and/or compiler). Compare the results with those obtained in Exercise 7.
48 2 Fortran Basics
15
10
kind−index
0
0
40
10
20
30
requested exponent range
Fig. 2.2 integer kind indices as a function of requested exponent range (platform: Linux,
64 bit, gfortran compiler)
60
requested precision
50
kind-index
40 -3
-2
-1
30 4
8
20 10
16
10
0
0
2000
3000
4000
1000
5000
Fig. 2.3 real kind indices as a function of requested exponent range and requested precision
(platform: Linux, 64 bit, gfortran compiler)
So far, we used mostly scalar variables for representing entities in our example pro-
grams. This was sufficient, since the number of quantities was rather limited. How-
ever, in most applications (and ESS models in particular), the number of variables
easily exceeds several millions, which is clearly not something that can be managed
with scalars. There is, in fact, a distinct branch in computer science, dealing with data
2.6 Arrays and Array Notation 49
Before working with arrays, we need to create them. This needs to be done explicitly
in Fortran, and it implies declaring and initializing the arrays we want to use (second
step is mandatory for constants, but highly recommended for modifiable arrays too).
In normal usage, there are two ways for declaring arrays, both of which require
specification of the array shape. The first method uses the dimension-keyword,
as in:
54 Because the merits of a data structure can only be proven in the context of the algorithms
applied on them, most references unify these two aspects (e.g. Mehlhorn and Sanders [9] or Cormen
et al. [2]).
55 At the risk of stating the obvious: this should not be confused with dimensionality of the physical
space (if we store the components of a 3D-vector in an array, that array will have rank==1).
56 So an entity with a more irregular shape, such as the set of non-zero elements of a lower-triangular
! both X & Y are rank =1 arrays , with 16 e l e m e n t s each
real , d i m e n s i o n (16) : : X , Y
! A is a rank =3 array , with 520^3 e l e m e n t s
! up ~ to rank =15 is a l l o w e d in F o r t r a n 2008 ( was 7 in F o r t r a n 90)
integer , d i m e n s i o n (520 , 520 , 520) : : A
The second declaration method is to specify the shape of the array after the variable
name, as in:
! X is still a rank =1 array , but Y is a scalar real
real : : X (16) , Y
! same effect as in p r e v i o u s d e c l a r a t i o n of A
i n t e g e r : : A (520 , 520 , 520)
The numbers inside the shape specification actually represent the upper bounds
for the indices along each dimension. An interesting feature in Fortran is that one
can also specify lower bounds, to bring the code closer to the problem-domain:
real , d i m e n s i o n ( - 1 0 0 : 1 0 0 ) : : Z ! rank =1 array , with 201 e l e m e n t s
Notes
• Unlike programming languages from the C -family, the value to which the lower
bound defaults (when it is not specified) is 1 (not 0)!
• Although in the examples here we often specify the shape of the arrays using hard-
coded integer values, it is highly recommended to use named integer constants57
for this in real applications, which saves a lot of work when the size of the arrays
needs to be changed (since only the value of the constant would need to be edited).
We now turn our attention to a seemingly low-level detail which is, however, crucial
for parts of our subsequent discussion: given one of the array declarations above,
how are the array elements actually arranged in the system’s memory58 ?
The memory can be viewed as a very large 1D sequence of bytes, where all the
variables associated to our program are stored. For 1D-arrays, it is only natural to
store the elements of the array contiguously in memory. Things are more complex for
arrays of rank > 1 , where an ordering convention (also known as “array element
order”) for the array elements needs to be adopted (effectively, defining a mapping
from the tuple of coordinates in the array to a linear index in memory).
j+ + j+ +
A(1,1,1) A(520, 1,1) A(1,2,1) A(520,2,1) A(1,520,1) A(520,520,1) k+ +
j+ + j+ +
A(1,1,2) A(520,1,2) A(1,2,2) A(520,2,2) A(1,520,2) A(520,520,2) k+ +
Fig. 2.4 Illustration of element ordering for a 3D array in Fortran. The dashed horizontal black
line represents incrementing in the first dimension, the black vertical lines—incrementing in the
second dimension, and the vertical green lines—incrementing in the third dimension. The blue line
represents the logical ordering of bytes in memory. The figure was split into multiple rows, to fit in
the page
NOTE
In Fortran, the array element order for elements of a multi-dimensional array
is such that the earlier dimensions vary faster than the latter ones.a
This is exactly opposite to the corresponding convention in C and C++, pro-
viding opportunities for bugs to appear while porting applications!
a An alternative way to remember this is relative to how a matrix is stored: since the elements
within a column are adjacent, Fortran (along with other languages like MATLAB and GNU
Octave (octave)) is said to use column-major order (C and C++ use row-major order).
For example, the elements of the A-array declared earlier could be arranged in
memory similarly to Fig. 2.4.
The array element order is important for understanding how several facilities of the
language work with multi-dimensional arrays. It is also very relevant for application
performance,59 as illustrated in Exercise 9.
Since arrays group multiple elements, a crucial feature when working with them is
the ability to select elements based on some pattern, which is usually dictated by
a subtask of the algorithm to be implemented. Fortran supports many methods for
59 This relates to the memory-hierarchy within modern systems. There are usually several layers
of cache-memory (very fast, but with small capacity) between the CPU and RAM, to hide the
relatively high latency for fetching data from RAM. Most caches implement a pre-fetching policy,
and higher performance is achieved when the order in which array elements are processed is close
to the array element order. Note that more details need to be considered, for performance-critical
(sub)programs (for more information, see Hager and Wellein [5]).
52 2 Fortran Basics
outlining such selections. We illustrate these via examples below, assuming we want
to overwrite some parts of an array. However, the same techniques apply for reading
parts of an array, of course.
Given an array declaration like:
integer , p a r a m e t e r : : SZ_X =40 , SZ_Y =80
! Note the use of n a m e d i n t e g e r c o n s t a n t s for s p e c i f y i n g
! the shape of the array ( r e c o m m e n d e d p r a c t i c e ).
real , d i m e n s i o n ( - SZ_X : SZ_X , - SZ_Y : SZ_Y ) : : t e m p e r a t u r e
Fortran allows to select:
• the entire array: by simply specifying the array’s name:
t e m p e r a t u r e = 0. ! s c a l a r w r i t t e n to s e l e c t i o n (= whole array )
• a single element: by specifying the array’s name, followed, within parentheses,
by a list of n indices60 (where n is the rank of the array):
! s c a l a r w r i t t e n to e l e m e n t ( i =1 , j =2)
t e m p e r a t u r e (1 , 2) = 10.
• a sub-array: by specifying the array’s name followed, within parentheses, by a
list of n ranges (n = rank of the array, as before). A range, in this context, is an
integer interval, with an optional step,61 as in:
! s c a l a r w r i t t e n to e l e m e n t ( i =1 , j =2)
t e m p e r a t u r e ( - SZ_X :0 , - SZ_Y : SZ_Y :2) = 20.
• a list of elements: by specifying the array’s name followed, within parentheses,
by one or more array(s) of rank==1 (we call these selection arrays). Each
selection array represents a list of values for a corresponding dimension (so only
one selection array is necessary when the source array is 1D, two when the source
array is 2D, etc.). The elements of the source array which eventually become
selected are those with the coordinate-tuples within the Cartesian product of the
sets represented by the selection arrays. The next listing uses this procedure to
select the corners of the 2D-array temperature:
! only 4 e l e m e n t s are s e l e c t e d ( C a r t e s i a n p r o d u c t ):
! ( - SZ_X , - SZ_Y ) , ( - SZ_X , SZ_Y ) , ( SZ_X , SZ_Y ) , ( SZ_X , - SZ_Y )
t e m p e r a t u r e ( [ - SZ_X , SZ_X ] , [ - SZ_Y , SZ_Y ] ) = 30.
where we used the [ and ] tokens, to create arrays inline.62 We will present
more uses of this technique in the next section.
NOTE
When an array selection is used for writing to an array, it is not recommended
to have, in the selection arrays, elements which are repeated, since this can
lead to attempts to write more than one value to the same array element.a
a Some compilers may allow this without warnings, although the standard declares these
as illegal. In any case, the behavior in such situations is likely platform-dependent, and the
recommendation holds.
As soon as an array is declared, a first concern, before using the values of the array
elements in other statements, is to initialize those values. Unlike other languages,
the Fortran standard does not make any guarantee regarding data initialization (such
as setting them to zero), so explicit action is required from the programmer in this
respect.
Values can be assigned to array elements using several mechanisms, to fit various
scenarios. Just as for scalar variables, these assignments can be combined63 with
the declaration line, as a compact method of initialization (therefore, the techniques
shown in this section apply to initialization, as well as to assignment).
An important notion when writing data to an array is conformability: two data enti-
ties are said to be conformable if they are arrays with the same shape, or if at least one
of them is a scalar. When one entity is assigned to another one, they need to be con-
formable (this is also necessary when forming array expressions, as discussed later).
One of the simplest write operations is to assign a scalar value to an entire array (or
an array section), in which case all elements (selected elements) will be set to that
value:
! e i t h e r : d e c l a r a t i o n , f o l l o w e d by a s s i g n m e n t
! b e f o r e the values are used
real , d i m e n s i o n ( 0 : 2 0 ) : : v e l o c i t y
v e l o c i t y = 0.
! or : i n i t i a l i z a t i o n d i r e c t l y at declaration - time
real , d i m e n s i o n ( 0 : 2 0 ) : : v e l o c i t y = 0.
Another form of writing into an array is the “lower-level” fashion, using element-
based assignments, (optionally) combined with loops. This is the most flexible
method and, perhaps, also the most intuitive. As a simple example, here is a more
verbose (but logically equivalent) version of the assignment for the velocity array
from the previous listing:
integer : : i
! element - based a s s i g n m e n t ( e q u i v a l e n t to : v e l o c i t y = 0.)
do i =0 ,20
v e l o c i t y ( i ) = 0.
end do
An array (or array-section) can also be assigned to another array (or section), as long
as the two entities are conformable. For example:
i n t e g e r : : a r r a y 1 ( -10:10) , a r r a y 2 ( 0 : 2 0 )
! ... some code to c o m p u t e a r r a y 2 ...
array1 = array2 ! whole - array a s s i g n m e n t
Note that the arrays are conformable even if the lower and upper bounds of the
array indices are different for the two arrays, as it was the case here (only the shape
matters): after the assignment, array1(-10) == array2(0) == ... ==
array1(10) == array2(20).
The use of array sections is illustrated in the following listing, which swaps the
value of each odd element with that of the next even element64 :
i n t e g e r : : a r r a y 3 (1:20) , t m p A r r a y ( 1 : 1 0 )
! ... some code to i n i t i a l i z e a r r a y 3 ...
tmpArray = array3 (1:20:2)
array3 (1:20:2) = array3 (2:20:2)
array3 (2:20:2) = tmpArray
64 This assumes the lower bound for the index is odd, and that the upper one is even.
2.6 Arrays and Array Notation 55
We already mentioned that arrays can be initialized based on other arrays, but then
one could ask how are the latter arrays to be initialized. Fortran has a special facility
for this problem—the array constructor. This consists of a list of values, surrounded
by square brackets.65 A common use of this is to define a constant array (with the
parameter-keyword), as in:
integer , d i m e n s i o n (3) , p a r a m e t e r : : m e s h S i z e = [ 213 , 170 , 10 ]
real , d i m e n s i o n (0:8) , p a r a m e t e r : : w e i g h t s = [ 4./9. , &
1./9. , 1./36. , 1./9. , 1./36. , &
1./9. , 1./36. , 1./9. , 1 . / 3 6 . ]
The arrays defined with the constructor syntax can also be used directly in expres-
sions (as long as they are conformable with the other components of the expression),
as any other array, for example:
integer , d i m e n s i o n (10) : : x R a n g e
x R a n g e = [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 ]
A drawback of the weights and xRange examples above (using constructor syn-
tax) is that they tend to be quite verbose. The implied-do loops were introduced in
Fortran to solve this problem, when the values follow a well-defined pattern. They
act as a convenient shorthand notation, with the general form:
! Note : the e x p r e s s i o n below needs to be e m b e d d e d into an
! actual array c o n s t r u c t o r ( see next e x a m p l e s ).
( expr1 , expr2 , ... , i n d e x V a r = exprA , e x p r B [ , e x p r C ] )
where:
• indexVar is a named scalar variable of type integer (usually named i, j,
etc.); note that the scope of this variable is restricted to the implied-do loop, so it
will not affect the value of the variable if used in other parts of the program
• expr1, expr2, …are expressions (not necessarily of integer type), which
may or may not depend on indexVar
• exprA, exprB, and exprC are scalar expressions (of integer type), denoting
the lower bound, upper bound and (optional) increment step for indexVar
To illustrate the implied-do loops, we use them to re-write the operations above
(for weights and xRange) in a more compact (but otherwise equivalent) form:
! index v a r i a b l e for implied - do still needs to be d e c l a r e d
integer : : i
x R a n g e = [ ( i , i =1 ,10) ] ! uses d e c l a r a t i o n above
real , d i m e n s i o n (0:8) , p a r a m e t e r : : w e i g h t s = &
[ 4./9. , (1./9. , 1./36. , i =1 ,4) ]
The implied-do loop is eventually expanded, such that the list { expr1,
expr2,..., } is repeated for each value of the indexVar, using the appro-
priate value of the index variable for each repetition. For instance, in our second
example above, the list {1./9., 1./36.} is repeated 4 times (and the value of the index
variable is not used for computing any component).
So far, we only used array constructors for building 1D arrays. It is also possible,
however, to construct multi-dimensional arrays, with a two-step procedure:
1. construct a 1D-array tmpArray
2. pass tmpArray to the intrinsic function reshape, to obtain a multi-dimen-
sional array
In practice, the two steps are commonly combined into a single statement. The
following example illustrates this, for constructing a 10 × 20 matrix, where each
element ai, j = i ∗ j:
real , d i m e n s i o n (10 , 20) : : a = r e s h a p e ( &
s o u r c e = [ (( i *j , i =1 ,10) , j =1 ,20) ] , &
shape = [ 10 , 20 ] &
)
where we also demonstrated the way in which implied-do loops can be nested
(essentially, by replacing one or more of the expressions expr1, expr2, …,
discussed above by another implied-do loop).
In its basic form,66 the reshape implicit function takes two arguments (denoted
by optional keywords source and shape), both of them being 1D arrays, and
where shape should be constant, and with non-negative elements.
The elements are read, in array element order, from the source-array, and writ-
ten, also in array element order, to the result array.
Just as we demonstrated in Sect. 2.4 for scalar variables, it is also essential to read/
write (parts of) arrays from/to external devices. In principle, the same ideas could
66 Additional arguments are supported, although not discussed here—see, e.g. Metcalf et al. [10].
2.6 Arrays and Array Notation 57
Just as for scalar variables, it is possible to let the system choose a default format,
as in:
1 integer : : i , j ! dummy indices
2 integer , d i m e n s i o n (2 ,3) : : i n A r r a y = 0
3
4 w r i t e (* , ’( a ) ’) " E n t e r a r r a y (2 x3 v a l u e s ): "
5 read (* ,*) i n A r r a y
6 w r i t e (* , ’( a ) ’) " You e n t e r e d : "
7 w r i t e (* ,*) i n A r r a y
The input (provided for line 5 in the listing above) can be provided over multiple
records—the system will keep reading new records, until the elements in the I/O-list
(whole array in our case) are satisfied.
The appearance of the output (generated by line 7) is, as in the case of scalars,
platform-dependent. This was merely an aesthetic issue for scalars, but in the case of
arrays it actually poses a serious problem, since the topological information of the
array is effectively lost 67 (the lines in the output will not correspond, in most cases,
to recognizable features of the array, such as rows and columns for 2D arrays). In
the particular case of the previous listing the 6 array elements would normally fit on
a single line of output.
In the remainder of this section, we discuss several methods for producing higher-
quality output. Related to this, we also illustrate several methods for specifying the
format specification, ranging from verbose to compact.
A first problem with the write-statement at line 7 in the previous listing is that, when
an array appears in the I/O-list, the I/O-system will effectively expand it internally
to a list of array elements, taken in the array element order. We know, based on
the discussion at the beginning of this section, that for a 2D array this order is the
transpose of what would be needed to output the elements (given that Fortran I/O is
record-based). This can be solved by modifying the I/O-list, so that it contains an
implied-do loop instead of the array, as follows:
w r i t e (* , *) ( ( i n A r r a y ( i , j ) , j =1 ,3) , i =1 ,2 )
67 Strictly speaking, it is still possible to deduce the coordinates of a specific element in the output list,
by counting its position, and then comparing this with the expected array element order; however,
this can hardly be called productive use of the programmer’s time.
58 2 Fortran Basics
The previous listing causes the two rows of the array to be written on the same line. To
separate them, we need to control the appearance of the output, using a customized
format specifier, as we illustrated before for scalars. A first option to achieve this is
to specify a verbose list of edit descriptors, as in:
w r i t e (* , ’( x , i0 , x , i0 , x , i0 , / , x , i0 , x , i0 , x , i0 ) ’) &
( ( i n A r r a y (i , j ) , j =1 ,3) , i =1 ,2 )
The previous statement causes the two rows of the matrix to appear on separate
lines, as intended. However, the format specifier is quite verbose, and it would be
impractical to write in this form if the matrix were to be larger. We mentioned below
that Fortran allows repeat counts to be placed in front of edit descriptors, or groups
of edit descriptors within parentheses. In the current case, this can be used to make
the format descriptor more compact, by factoring the x, i0 -pattern:
w r i t e (* , ’(3( x , i0 ) , / , 3( x , i0 )) ’) &
( ( i n A r r a y (i , j ) , j =1 ,3) , i =1 ,2)
Finally, we notice that Fortran has a mechanism for “recycling” edit descriptors, so
that there can be more elements in the I/O-list than edit descriptors in the output
format. When the I/O-subsystem “runs out” of edit descriptors, a new line of output
is started, and the format specifier is re-used for the next elements in the I/O-list.
This is perfect for our current purposes, as the output format can be further simplified
using this feature:
w r i t e (* , ’(3( x , i0 )) ’) &
( ( i n A r r a y (i , j ) , j =1 ,3) , i =1 ,2)
We emphasized above the usefulness of working with whole arrays and array sec-
tions, instead of manually iterating through the array elements with loops. Fortran
allows a similar high level of abstraction for representing computations, with array
2.6 Arrays and Array Notation 59
expressions. Specifically, most unary intrinsic functions and operators can take a
whole array (or an array selection) as an argument, producing another array, with
the same shape, through element-wise application of the operation. The same idea
applies to binary operators, as long as the arguments are conformable. The following
program uses these techniques to evaluate the functions sin(x) and sin(x)+cos(x)/2
on a regular grid, spanning the interval [−π, π ]:
1 program array_expressions1
2 i m p l i c i t none
3 integer , p a r a m e t e r : : N =100
4 real , p a r a m e t e r : : PI = 3 . 1 4 1 5
5 integer : : i
6 real , d i m e n s i o n ( - N : N ) : : &
7 x A x i s = [ ( i *( pi / N ) , i = -N , N ) ] , &
8 a = 0, b = 0
9
10 ! C o m p a c t array - expressions , using e l e m e n t a l f u n c t i o n s .
11 ! a ( i ) == sin ( xAxis ( i ) )
12 a = sin ( x A x i s )
13 ! b ( i ) == sin ( xAxis ( i ) ) + cos ( xAxis ( i ) )/2.
14 b = sin ( xAxis ) + cos ( xAxis )/2.
15
16 w r i t e (* , ’( f8 .4 , 2 x , f8 .4 , 2 x , f8 .4) ’) &
17 [ ( xAxis (i), a(i), b(i), i=-N ,N) ]
18 end p r o g r a m a r r a y _ e x p r e s s i o n s 1
Listing 2.29 src/Chapter2/array_expressions1.f90
Note that the standard does not impose a specific order in which the elements
of the result array for the expression are to be created. This allows compilers to
apply hardware-specific optimizations (e.g. vectorization/parallelization). For this
to be possible, all array expressions are completely evaluated, before the result is
assigned to any variable. This makes array expressions behave differently from do-
loop constructs which superficially seem equivalent to the array expression (so one
needs to carefully examine any data dependencies between the different iterations of
the do-loops when translating between the two forms of syntax). This was not the case
for the two array expression examples above (lines 12 and 14 in the listing), which
could have also been written equivalently with a do-loop (although we recommend
the previous, compact version):
do i = - N , N
a ( i ) = sin ( x A x i s ( i ) )
b ( i ) = sin ( xAxis ( i ) ) + cos ( xAxis ( i ) )/2.
enddo
which assigns to each interior element of a an average value computed using its left
and right neighbours, is not equivalent to the loop:
do i = -( N -1) ,( N -1)
a ( i ) = ( a ( i -1) + a ( i +1) )/2.
enddo
60 2 Fortran Basics
We demonstrated above that some intrinsic functions ( sin , cos , etc.) accept
a scalar, as well as a whole array, as their argument.68 Such functions are known
in Fortran as elemental, and can also be defined by the programmer, for derived
types, or for specific types of arrays. We provide a brief example for this, in Sect. 3.4.
The where construct can be used to restrict an array assignment only to elements
which satisfy a given criterion. It is also known as masked array assignment. In many
ways, it is the array-oriented equivalent of the if-construct, discussed for scalars.
In its basic form, the syntax of where reads:
where ( < logicalArray > )
array1 = < array_expression1 >
array2 = < array_expression2 >
...
end w h e r e
where logicalArray, array1, array2, etc., must have the same shape, and
logicalArray may also be a logical expression (for example, comparing array
elements to some scalar value).
For example, assume we have two arrays a and b, and that we want to copy inside
b the a-values69 that are lower than some scalar value threshold. This can be
easily achieved with the where construct, as follows:
program where_construct1
i m p l i c i t none
integer , p a r a m e t e r : : N = 7
c h a r a c t e r ( len = 1 0 0 ) : : o u t F o r m a t
integer : : i , j
real : : a ( N , N ) = 0 , b (N , N ) = 0 , t h r e s h o l d = 0.5 , &
c ( N , N ) = 0 , d ( N , N ) = 0 ! used in next e x a m p l e s
! write some values in a
call r a n d o m _ n u m b e r ( a )
68 Programmers familiar with C++ can think of this as a restricted form of function overloading.
69 random_number is an intrinsic subroutine, described in Sect. 2.7.2.
2.6 Arrays and Array Notation 61
Next, suppose we also want to copy over to array c the values of a that are
smaller than half the threshold. We can extend the where-construct with an
elsewhere(logicalArray) construct, similar to the elseif-branches we
showed for if:
where ( a > threshold )
b = a
e l s e w h e r e ( a < t h r e s h o l d /2. )
c = a
end w h e r e
As a final extension of our example, let us assume that we want to copy over
to array d the remaining values of a, which satisfy neither of the criteria (like the
else-branch of if). This is achieved again with an elsewhere-branch, which
does not have a logicalArray associated, as in:
where ( a > threshold )
b = a
e l s e w h e r e ( a < t h r e s h o l d /2. )
c = a
elsewhere
d = a
end w h e r e
The logical arrays which define the masks (for the where- or elsewhere-
branches) are first evaluated, and then the array assignments are performed in
sequence, masked by the logical arrays (i.e. no assignment is performed for ele-
ments where the mask is .false. ). This implies that, even if some assignments
would alter the data used for evaluating the mask array,70 such changes will not affect
the remainder of the where-construct, for which the initially evaluated mask will
be used.
The do concurrent construct (introduced in Fortran 2008) can also be used for
improving the performance and conciseness of array operations. Strictly speaking,
the construct is more general, as it can also be used to work with scalar data. However,
we discuss it here, as it is particularly useful for arrays, and also because it effectively
supersedes another array-oriented construct (forall), which we do not cover in
this text.71
We begin our brief discussion of this construct with a warning: as for many
Fortran 2008 features, support for do concurrent was, at the time of writing, still
incipient.72
The syntax of the construct is as follows:
do c o n c u r r e n t ( [ t y p e _ s p e c : :] l i s t _ o f _ i n d i c e s _ w i t h _ r a n g e s &
[, scalar_mask_expression ] )
statement1
statement2
. . .
end do
where list_of_indices_with_ranges can be an index range specifica-
tion (as would appear after a normal do-loop), or a comma-separated list of
such specifications (in which case, the construct is equivalent to a set of nested
loops). We discuss the optional type_spec at the end of this section. The
scalar_mask_expression, when present, is useful for restricting the state-
ment application only to values of indices for which the expression evaluates to
.true. . This is illustrated in the following example, where elements of matrix a
which belong to a checkerboard pattern are copied to matrix b:
1 program do_concurrent_checkerboard_selection
2 i m p l i c i t none
3 integer , p a r a m e t e r : : D O U B L E _ R E A L = s e l e c t e d _ r e a l _ k i n d (15 , 307)
4 integer , p a r a m e t e r : : N = 5 ! side - l e n g t h of the m a t r i c e s
5 i n t e g e r : : i , j ! dummy - i n d i c e s
6 real ( kind = D O U B L E _ R E A L ) , d i m e n s i o n ( N , N ) : : a , b ! the m a t r i c e s
7 c h a r a c t e r ( len = 1 0 0 ) : : o u t F o r m a t
8
9 ! C r e a t e d y n a m i c format , using i n t e r n a l file
10 w r i t e ( outFormat , *) " ( " , N , " ( x , f8 .2)) "
11 ! I n i t i a l i z e m a t r i x a to some random v a l u e s
12 call r a n d o m _ n u m b e r ( a )
13
14 ! Pattern - s e l e c t i o n with do c o n c u r r e n t
15 do c o n c u r r e n t ( i =1: N , j =1: N , mod ( i + j , 2 ) = = 1 )
16 b ( i , j ) = a (i , j )
17 end do
18
19 ! Print matrix b
support this construct, with the exception of the type specification. Check the documenta-
tion of your compiler, for any flags that may need to be added to enable this feature (e.g.
−ftree−parallelize−loops=n , with n being the number of parallel threads
(for gfortran), or −parallel (for ifort)
2.6 Arrays and Array Notation 63
Syntactically, the construct in lines 15–17 in the previous listing could have been
written using nested do-loops and an if, as in:
do i =1 , N
do j =1 , N
if ( mod ( i +j , 2 ) = = 1 ) then
b ( i , j ) = a (i , j )
end if
end do
end do
73 Therefore, the program may successfully compile, but still contain bugs, if some of these implied
In the examples so far, we only showed how to work with arrays whose shape is
known at compile-time. This is often not the case in real applications, where this
information may be the result of some computations, or may even be provided by the
user at runtime. If this were a book about C++, now would definitively be the place
to discuss pointers. In Fortran, however, this is not necessary75 for dynamic-size
arrays, which are supported through a simpler (and faster) mechanism, discussed in
this section.
We often use the terms static and dynamic when discussing how memory is
reserved for data entities. Generally speaking, memory for static objects is auto-
matically managed by the OS. Examples of static entities are static global variables
(defined through the module-facility, discussed later), variables local to a procedure,
and procedure arguments (also covered later). Contrarily, dynamic objects require the
programmer to explicitly make requests for acquiring and releasing regions of mem-
ory. Therefore, whereas for working with normal (static) arrays only a declaration is
necessary, the workflow for dynamic arrays involves three steps:
1. declaration: Dynamic arrays are declared similarly to normal arrays. For exam-
ple, a dynamic version of array bigArray (see Sect. 2.6.1) is given below:
integer , d i m e n s i o n (: ,: ,:) , a l l o c a t a b l e : : b i g A r r a y
75 Pointers are still useful in many contexts, like for constructing more advanced data structures.
They too are supported in Fortran, via the pointer-attribute (but Fortran pointers carry more
information and restrictions than their C/C++ counterparts). We do not discuss this issue in this
text—see, e.g. Metcalf et al. [10] or Chapman [1].
2.6 Arrays and Array Notation 65
Note that there are two notable differences in the dynamic version:
a. the shape of the array is not specified; instead, only the rank is declared
(encoded as the number of : -characters in the list within the parentheses)
b. the allocatable-attribute needs to be added, to clarify that this is a
dynamic array
2. allocation: Before working with array elements is allowed, memory has to be
allocated, so that the exact shape of the array is specified. This is done with the
allocate-statement, which has the form:
a l l o c a t e ( l i s t _ o f _ o b j e c t s _ w i t h _ s h a p e s [ , stat = s t a t C o d e ] )
a l l o c a t e ( x A r r a y (16) , b i g A r r a y (520 ,520 ,520) , z A r r a y ( -100:100) , stat = s t a t C o d e )
After allocation, one can work with these arrays normally, as discussed before
for the static case.
3. deallocation: A last concern related to dynamic arrays is to release the memory
to the system, as soon as it is not needed by the program anymore. This is a
highly recommended practice, both for performance reasons (because it reduces
the amount of bookkeeping at runtime), and for increasing the readability of
the programs (to signal the fact that the data is not used in other parts of the
program). This step is achieved with the deallocate-function, which has the
syntax:
d e a l l o c a t e ( l i s t _ o f _ o b j e c t s [ , stat = s t a t C o d e ] )
where statCode has the same error-signalling role as before, and list_of_
objects is a list of arrays. For example, the following statement releases the
memory allocated above, for the arrays xArray, bigArray, and zArray:
d e a l l o c a t e ( xArray , bigArray , zArray , stat = s t a t C o d e )
array may become difficult to track in larger programs, especially if the array is part
of the global data and used by many procedures. The allocated intrinsic function
can be used in such cases. For example:
allocated ( xArray )
will return .false. before the allocate-call above, and after the deallocate-
call; it will return .true. , however, between these two calls. Interestingly, since
Fortran 2003, it is not necessary [13] to use this intrinsic function when we want
to assign to the allocatable array another array (or array expression): in that case,
allocation to the correct shape is automatically done by the Fortran runtime.
do k=1,N
do j=1,N
do i=1,N
a(i,j,k) = a(i,j,k) + b(i,j,k)
enddo
enddo
enddo
Hints:
• The length of the cube’s side (N) should be large enough to be representative
for a real-world scenarios (i.e. the whole arrays should not fit in the cache).
For example, take N = 813, and 32bit real array elements. It is easier to use
allocatable arrays.a
• To improve the accuracy of the result, wrap the code above within another
loop, so that the operations are performed, say, Nrepetitions = 30 times.b
• It is also instructive to test the programs with several compilers, because
some highly-optimizing compilers (like ifort) may recognize perfor-
mance “bugs” like these in simple programs, and correct the problem
2.6 Arrays and Array Notation 67
internally (but this can fail in more complex scenarios, so learning about
these issues is still valuable). Also, compilers can simply “optimize away”
code when the computation results are not used, so try to print some elements
of a at the end of the computation.
a Most systems have some limits for the size of static data (“stack size”). Therefore, large
static arrays would require adjusting these limits and, possibly, adjusting the “memory
model” through compiler flags.
b This reduces the effect of system noise, and it also provides a “poor man’s” solution for
reducing the relative importance of the (de)allocation overhead—a more accurate approach
is to benchmark the computational parts exclusively, using techniques discussed later, in
Sect. 2.7.
In the course of our discussion so far, we have already mentioned some of the many
intrinsic procedures offered by Fortran. In this section, we describe a few additional
ones, which would not easily fit into the previous sections, but are nonetheless com-
mon practice. We discuss later (in Chap. 3) how to define custom procedures.
Some ESS applications need to be concerned with the current date and time. The
date_and_time intrinsic subroutine is appropriate for this. When calling this, one
can pass (as an argument) an integer-array, of size 8 or more. The Fortran-runtime
will then fill the components with integer-values, as described in Table 2.3.
A very common application is timing a certain portion of code, as a quick way
for profiling parts of a program. In principle, using date_and_time before and
after the part of the algorithm to be profiled could be used, but this limits the time
resolution that can be achieved. Fortran also has the cpu_time intrinsic for such
purposes, which provides microsecond precision on many platforms.
A complete program, demonstrating these functions, is given below:
program working_with_date_and_time
i m p l i c i t none
! for d a t e _ a n d _ t i m e - call
i n t e g e r : : d a t e A n d T i m e A r r a y (8)
! for cpu_time - call
real : : timeStart , t i m e E n d
! v a r i a b l e s for e x p e n s i v e loop
i n t e g e r : : mySum =0 , i
call d a t e _ a n d _ t i m e ( v a l u e s = d a t e A n d T i m e A r r a y )
print * , " d a t e A n d T i m e A r r a y = " , d a t e A n d T i m e A r r a y
call c p u _ t i m e ( time = t i m e S t a r t )
! e x p e n s i v e loop
do i =1 , 1 0 0 0 0 0 0 0 0 0
mySum = mySum + mySum / i
end do
call c p u _ t i m e ( time = t i m e E n d )
print * , " Time for e x p e n s i v e loop = " , timeEnd - timeStart , " s e c o n d s " ,&
" , mySum = " , mySum
end p r o g r a m w o r k i n g _ w i t h _ d a t e _ a n d _ t i m e
Listing 2.32 src/Chapter2/working_with_date_and_time.f90
Statistical methods form the basis of many powerful algorithms in ESS. For example,
stochastic parameterizations are commonly used in models, to simulate the effects of
processes at smaller spatial scales (clouds, convection, etc.), which are not resolved
by the (usually severely coarsened) model mesh. A basic necessity for many such
algorithms is the ability to generate sequences of random numbers. This may seem
2.7 More Intrinsic Procedures 69
76 This is fundamentally different from randomness in the physical sense, which is driven by
the quantum-probabilistic processes at the atomic scale. These effects are then amplified at the
mesoscopic scales, due to the large number of degrees of freedom of the system (e.g. climate
system, see Hasselmann [6]).
77 In situations where perfect reproducibility of results is necessary, the seeding step could be
References
1. Chapman, S.J.: Fortran 95/2003 for Scientists and Engineers. McGraw-Hill Science/Engineer-
ing/Math, New York (2007)
2. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press,
Cambridge (2009)
3. Fraedrich, K., Jansen, H., Kirk, E., Luksch, U., Lunkeit, F.: The planet simulator: towards a
user friendly model. Meteorol. Z. 14(3), 299–304 (2005)
4. Goldberg, D.: What every computer scientist should know about floating-point arithmetic.
ACM Comput. Surv. 23(1), 5–48 (1991)
5. Hager, G., Wellein, G.: Introduction to High Performance Computing for Scientists and Engi-
neers. CRC Press, Boca Raton (2010)
6. Hasselmann, K.: Stochastic climate models Part I. Theory Tellus 28A(6), 473–485 (1976)
7. Kirk, E., Fraedrich, K., Lunkeit, F., Ulmen, C.: The planet simulator: a coupled system of
climate modules with real time visualization. Technical Report 45(7), Linköping University
(2009)
8. Marshall, J., Plumb, R.A.: Atmosphere, Ocean and Climate Dynamics: An Introductory Text,
1st edn. Academic Press, Boston (2007)
9. Mehlhorn, K., Sanders, P.: Algorithms and Data Structures: The Basic Toolbox. Springer,
Berlin (2010)
10. Metcalf, M., Reid, J., Cohen, M.: Modern Fortran Explained. Oxford University Press, Oxford
(2011)
11. Overton, M.L.: Numerical Computing with IEEE Floating Point Arithmetic. Society for Indus-
trial and Applied Mathematics, Philadelphia (2001)
12. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: The art of parallel scientific
computing. Numerical Recipes in Fortran 90, vol. 2, 2nd edn. Cambridge University Press,
Cambridge (1996). also available as http://apps.nrbook.com/fortran/index.html
13. Reid, J.: The new features of Fortran 2003. ACM SIGPLAN Fortran Forum 26(1), 10–33
(2007)
Chapter 3
Elements of Software Engineering
3.1 Motivation
1 Even in the example programs presented so far we used modularization, in the form of intrinsic
procedures, such as for I/O operations.
© Springer-Verlag Berlin Heidelberg 2015 71
D.B. Chirila and G. Lohmann, Introduction to Modern Fortran
for the Earth System Sciences, DOI 10.1007/978-3-642-37009-0_3
72 3 Elements of Software Engineering
often provide hints on how the modularization could be performed. Some specific
reasons include:
• Modularization introduces “data partitions”, so that only some parts of the program
are allowed to change some variables (otherwise, all data would be “global”, which
is a discouraged practice). Such “fences” are useful, because they reduce the
number of entities that the programmer needs to keep in mind at any given time.
Also, they can significantly reduce the effort for parallelizing the application.
• Subtasks of a program which are general enough (like computing the norm of
a vector, vorticity of a vector field, or displaying a result on the screen) can be
re-used in other applications, eliminating the need to start from scratch every
time. Such subprograms can then be collected into libraries, shared between many
applications. This practice also increases the probability (and decreases the effort)
for having the subprograms thoroughly tested, which is often difficult to do for a
monolithic program.
• A more mundane aspect is that Fortran requires variable declarations to precede
executable statements,2 which in a monolithic program would force the program-
mer to frequently alternate between the region where variables are declared and
the regions where they are used. It is clearly more convenient to have both regions
fit on-screen at the same time, which is why a good practice is to aim for programs
and procedures which are not too long.
Modularization of software can be approached from different points of view. In SP,
which is currently the norm in Fortran, this is done with a focus on the subtasks of the
program. Contrarily, in OOP, which only later received extensive language support,
the focus is on types and their associated operations. Another approach for keeping
the complexity of software manageable is GP, where the focus is on formulating
algorithms in more general forms, so that they can be applied to the entire set of
data structures which satisfy the algorithm’s requirements.3 However, we devote
very little space to this approach, since Fortran does not support it extensively at this
stage.
2 There is actually a new Fortran 2008 feature (block/end block—see Metcalf et al. [8]),
which enables variable declarations in other parts of the code too (e.g. local variables inside a loop).
However, we do not cover this feature, as extensive compiler support was still missing at the time
of this writing.
3 For example, a sorting algorithm should work with elements of integer, real, or even
user-defined types, as long as a suitable binary operator (like “less than”) is defined on any two
elements of the type. The ideal of GP is to write the algorithm only once, reducing duplication of
code (the need to maintain a different implementation of sort for each type).
3.2 Structured Programming (SP) in Fortran 73
to be done forms a unit that can be easily mapped onto concrete statements in the
programming language.4 Several language features support this breaking down of
tasks, which we describe in this section.
4 The workflow is somewhat analogous to Richardson’s (Richardson and Lynch [9]) energy cascade
in turbulence theory (replace energy with “work still to be done”, and viscous dissipation with writing
code).
5 We use the terms subprogram and procedure interchangeably in this text.
6 Strictly speaking, the term SP also covers the use of flow-control constructs (ifs, loops, etc.),
not only of subprograms. However, in this text we reserve the term for referring to the aspect of
subprogram-based program design, since the use of flow-control constructs (instead of goto-
statements) is nowadays taken for granted.
74 3 Elements of Software Engineering
7 In particular, internal subprograms have direct access to the data (and other internal subprograms)
of their host. This form of unstructured data access may be tempting on the short-term, but generally
has a negative effect on the readability of the software. Also, internal subprograms are fundamentally
tied to their host, which makes them difficult to re-use in other (sub)programs (since the internal
subprograms would need to be converted into normal subprograms first; however, this process may
be nontrivial, especially if the above form of data access was (ab)used by the programmer).
8 Packaging a lot of data inside modules can increase the probability of bugs, such as accidentally
modifying the data from procedures that should not modify it (in contrast, passing data through the
procedure interface allows more control on allowed operations). Also, subprograms which rely on
much module data are generally more difficult to understand, and cannot be easily re-used.
3.2 Structured Programming (SP) in Fortran 75
To illustrate the use of procedures, we provide in the next listings two possi-
ble ways of printing the prime numbers up to some values (in the code, 100).
The program primes_with_func1a (in the Listing 3.1) uses the function
isPrimeFunc1a, which takes an integer argument, and returns a logical value,
depending on whether the number is prime or not (the actual algorithm for testing
whether a number is prime is not important in this context). Similarly, the program
primes_with_sub (in the Listing 3.2) uses the subroutine isPrimeSub, for
the same effect.
1 l o g i c a l f u n c t i o n i s P r i m e F u n c 1 a ( nr )
2 i m p l i c i t none
3 ! data - d e c l a r a t i o n s ( for i n t e r f a c e )
4 integer , i n t e n t ( in ) : : nr
5 ! data - d e c l a r a t i o n s ( local v a r i a b l e s )
6 integer : : i , squareRoot
7
8 if ( nr <= 1 ) then
9 i s P r i m e F u n c 1 a = . f a l s e .; r e t u r n
10 e l s e i f ( nr == 2 ) then
11 i s P r i m e F u n c 1 a = . true .; r e t u r n
12 e l s e i f ( mod ( nr , 2) == 0) then
13 i s P r i m e F u n c 1 a = . f a l s e .; r e t u r n
14 else
15 s q u a r e R o o t = int ( sqrt ( real ( nr )) )
16 do i =3 , s q u a r e R o o t , 2
17 if ( mod ( nr , i ) == 0 ) then
18 i s P r i m e F u n c 1 a = . f a l s e .; r e t u r n
19 endif
20 enddo
21 endif
22 i s P r i m e F u n c 1 a = . true .
23 end f u n c t i o n i s P r i m e F u n c 1 a
24
25 program primes_with_func1a
26 i m p l i c i t none
27 integer , p a r a m e t e r : : N_MAX =100
28 integer : : n
29 ! d e c l a r a t i o n for f u n c t i o n
30 ! NOT the r e c o m m e n d e d a p p r o a c h
31 logical isPrimeFunc1a
9 Derived Data Types (DTs), alternatively named “abstract” types, are discussed in Sect. 3.3.2.
76 3 Elements of Software Engineering
32
33 do n =2 , N _ M A X
34 if ( i s P r i m e F u n c 1 a ( n )) p r i n t * , n
35 enddo
36 end p r o g r a m p r i m e s _ w i t h _ f u n c 1 a
Listing 3.1 src/Chapter3/primes_with_func1a.f90
1 s u b r o u t i n e i s P r i m e S u b ( nr , i s P r i m e )
2 i m p l i c i t none
3 ! data - d e c l a r a t i o n s ( for i n t e r f a c e )
4 integer , i n t e n t ( in ) : : nr
5 logical , i n t e n t ( out ) : : i s P r i m e
6 ! data - d e c l a r a t i o n s ( local v a r i a b l e s )
7 integer : : i , squareRoot
8
9 if ( nr <= 1 ) then
10 i s P r i m e = . f a l s e .; r e t u r n
11 e l s e i f ( nr == 2 ) then
12 i s P r i m e = . true .; r e t u r n
13 e l s e i f ( mod ( nr , 2) == 0) then
14 i s P r i m e = . f a l s e .; r e t u r n
15 else
16 s q u a r e R o o t = int ( sqrt ( real ( nr )) )
17 do i =3 , s q u a r e R o o t , 2
18 if ( mod ( nr , i ) == 0 ) then
19 i s P r i m e = . f a l s e .; r e t u r n
20 endif
21 enddo
22 endif
23 i s P r i m e = . true .
24 end s u b r o u t i n e i s P r i m e S u b
25
26 program primes_with_sub
27 i m p l i c i t none
28 integer , p a r a m e t e r : : N_MAX =100
29 integer : : n
30 l o g i c a l : : stat
31
32 do n =2 , N _ M A X
33 call i s P r i m e S u b ( n , stat )
34 if ( stat ) p r i n t * , n
35 enddo
36 end p r o g r a m p r i m e s _ w i t h _ s u b
Listing 3.2 src/Chapter3/primes_with_sub.f90
Let us analyze the new constructs. First of all, both isPrimeFunc1a and
isPrimeSub appear as blocks above the corresponding main-programs. On line
1/Listing 3.1 (respectively line 1/Listing 3.2), we have the function(subroutine)-
statement (corresponding to the function header in C/C++ terminology). The
implicit none statements have exactly the same role as in a main-program (to
prevent the language from implicitly associating variables with data types); as for
main-programs, it is recommended to use this statements at the beginning of all
procedures (as well as modules—discussed later).
On line 4/Listing 3.1 (respectively 4-5/Listing 3.2), several variables are declared.
These are not normal variable declarations, however. Rather, these lines define parts
of the function (subroutine) interfaces—a fact marked by the use of the intent-
attribute. Possible choices for this attribute are:
• in when a value only needs to be read within the procedure (as is the case in
isPrimeFunc1a)
• out when a value is overwritten by the procedure, without being accessed before-
hand (as is the case for the returned value in isPrimeSub)
• inout when the value needs to be both read and written by the procedure (not
used in the examples above)
3.2 Structured Programming (SP) in Fortran 77
Such variables, which appear in the list of arguments of the procedure and have the
intent-attribute,10 are known as dummy arguments. In essence, such arguments
are placeholders, waiting to be replaced by actual arguments, when the procedures
are invoked.
In Listing 3.1, there is also an implicit variable declaration, which completes the
interface of the function. This corresponds to the value returned by the function to
the calling program. By default, this value has the same name as the function (in our
example isPrimeFunc1a, of logical type—as specified within the function
statement, on line 1).
There are actually two other equivalent methods for defining a function, which
may be encountered in practice:
1. The first one, which makes the declaration explicit, is:
1 f u n c t i o n i s P r i m e F u n c 1 b ( nr )
2 i m p l i c i t none
3 ! data - d e c l a r a t i o n s ( for i n t e r f a c e )
4 integer , i n t e n t ( in ) : : nr
5 l o g i c a l : : i s P r i m e F u n c 1 b ! NOTE : return - type d e f i n e d here ; no ’ intent ’
6 ! a l l o w e d ( it is e f f e c t i v e l y " out ")
7 ! data - d e c l a r a t i o n s ( local v a r i a b l e s )
integer : : i , squareRoot
8
Note that, unlike the first variant, the return type of the function is specified in a
separate line within the function body (line 5). This should not have an intent-
attribute (since that is effectively set to out by the language).
2. The second alternative also changes the name of the function’s result, to make
this distinct from the function’s name:
1 f u n c t i o n i s P r i m e F u n c 1 c ( nr ) result ( primStat )
2 i m p l i c i t none
3 ! data - d e c l a r a t i o n s ( for i n t e r f a c e )
4 integer , i n t e n t ( in ) : : nr
5 l o g i c a l : : p r i m S t a t ! NOTE : return - type d e c l a r e d here
6 ! data - d e c l a r a t i o n s ( local v a r i a b l e s )
7 integer : : i , squareRoot
Listing 3.4 src/Chapter3/primes_with_func1c.f90 (excerpt)
Here, the return type of the function is also specified separately and without
an intent-attribute (line 5). In addition, however, we also use the result-
keyword (line 1), to change the name of the result to something different from
the function’s name. This can be useful when the function name is long and
inconvenient to use in expressions. Also, it is mandatory when writing recursive
functions (a topic not discussed in this book).
Returning to Listings 3.1 and 3.2, notice that two more variables are declared
(i and squareRoot). Since they are introduced within the scope of the procedures
10 Although some compilers may still accept procedure-argument declarations without the
intent-attribute, it is always recommended to specify these attributes, to make it clear how
each of the arguments is supposed to be used (good documentation); additionally, the compiler
can then detect some frequent mistakes (such as accidentally overwriting a variable that is only
supposed to be read).
78 3 Elements of Software Engineering
(thus only being accessible while the procedures are executing), these are called local
variables. We will discuss several issues related to such variables in Sect. 3.2.4.
Finally, the executable parts of the procedures (lines 8–22 in Listing 3.1, respec-
tively lines 9–23 in Listing 3.2) can contain assignments, flow-control constructs,
calls to other procedures, etc., similarly to the executable portions of the example
programs presented previously. There are, however, two notable differences:
1. First of all, propagation of information outside the procedures is achieved
by assigning the desired result value to the variables isPrimeFunc1a and
isPrime, respectively.
2. Second, note the return-statements, which cause the procedures to stop, and
execution to continue in the calling (sub)program. This is especially useful for
skipping part of the procedure’s code, for example when the result can be deter-
mined early on. In our case, for example, if nr < 1, it does not make sense to
perform any additional tests, since such numbers are, by definition, not prime.11
11 In this case, using return improves performance for large values of N_MAX. However, for
some simple procedures there might also be a performance penalty, as having multiple exit points
from a procedure may prevent some compiler optimizations, e.g., auto-vectorization.
3.2 Structured Programming (SP) in Fortran 79
execution between caller and callee is usually generated at link-time. However, check-
ing that the procedure was correctly invoked (with the right types of arguments and
in the correct order) is usually the responsibility of the compiler. For the compiler to
be able to perform this task, it needs knowledge about both the call site and the dec-
larations in the procedure (so that actual arguments can be matched against dummy
arguments). Depending on how much of this information is actually available to the
compiler, the interface is said to be implicit or explicit:
• implicit When only the types at the call site are available to the compiler
(and this much is always known when the program unit of the caller is parsed),
the interface is said to be implicit. Compilation can succeed in this situation (with
only a type-declaration for functions, and “as is” for subroutines). However, relying
on implicit interfaces is dangerous, since many useful compiler checks are thus
effectively turned off by the programmer. To illustrate how easily this may lead to
bugs, replace line 34 (Listing 3.1) with:
! bug
if ( i s P r i m e F u n c 1 a ( n * 1 . 0 ) ) p r i n t * , n
Listing 3.5 src/Chapter3/primes_with_func_bug.f90 (excerpt)
Note the multiplication by 1.0, which leads to a result of type real.12 The program
still compiles13 ; however, when executed, it does not report any prime numbers
anymore. The reason can be found by analyzing (e.g. with a simple write-
statement) what data the function isPrimeFunc1a actually receives. On our
platform, the first 3 numbers were:
(instead of the expected 2, 3, 4). The perplexing numbers occur because the com-
piler tries to interpret a real-number as an integer.
• explicit When the compiler has access to both the types at the call site, as
well as to the correct types of the procedure arguments, the interface is said to be
explicit. Proper checking of interfaces can then take place, so that bugs of the type
discussed above (and others) are easily detected automatically. Clearly, this is a
desirable situation. It can be achieved through three main mechanisms:
1. Compilers usually interpret each program unit as a whole, so the interface will
automatically be explicit, without any additional programmer effort, when both
the caller and the callee are within the same program unit.
12 Granted, this change is a little artificial in this context, but attempting to call procedures with the
program are still in the same file. If they are separated, however, this would not occur.
80 3 Elements of Software Engineering
the results for a small set of input-angles (e.g. 0◦ , 30◦ , 45◦ , 60◦ and 90◦ ). Does
“pipelining” the function and the subroutine give back the same degree-value
with which you started? If not, explain why.
Since most applications in ESS operate on arrays, it is important to know how these
can be passed as procedure arguments. In modern Fortran, there are two recom-
mended approaches for achieving this: explicit-shape and assumed-shape arrays.
Both approaches allow the modern features of Fortran arrays, which we mentioned
in Chap. 2, to be used (array sections, array expressions, etc.).14
Explicit-shape dummy arrays In this case, the programmer needs to explicitly pass
the bounds for each dimension of the used array(s) to the procedure, via additional
procedure arguments. This is shown in the following example, which takes as an
argument a 2D-array of temperature values (in ◦ C), measured at numSites stations,
and computes the overall average temperature:
1 real f u n c t i o n c a l c A v g T e m p V 1 ( inArray , startTime , endTime , n u m S i t e s )
2 i m p l i c i t none
3 integer , i n t e n t ( in ) : : startTime , endTime , n u m S i t e s
4 ! explicit - shape dummy array
5 real , d i m e n s i o n ( s t a r t T i m e : endTime , n u m S i t e s ) , i n t e n t ( in ) : : i n A r r a y ! C e l s i u s
6
7 c a l c A v g T e m p V 1 = sum ( inArray , mask =( i n A r r a y > -273.15) ) / &
8 count ( mask =( i n A r r a y > -273.15) )
9 end f u n c t i o n c a l c A v g T e m p V 1
14 There is also a third approach (assumed-size arrays), which is however strongly discouraged
(and not covered in this text), since it provides little information about the array to the compiler,
effectively disabling those high-level array features—see for example Chapman [3] if working with
legacy code that uses this feature.
82 3 Elements of Software Engineering
One fact to keep in mind when using assumed-shape dummy arrays is that the
extents of the actual array along each dimension are passed to the procedure, but not
the lower and upper bounds. For example, if we called the function above as in:
real : : f u n c t i o n R e s u l t , s a m p l e D a t a (100:365 , 20)
! ... write data into s a m p l e D a t a array ...
f u n c t i o n R e s u l t = c a l c A v g T e m p V 2 ( s a m p l e D a t a ) ! call
the dummy array inArray (within the function) will assume the bounds (1:266,
1:20). This was not a problem for our function, where the result is not influenced
by this “shifting” of the bounds. However, in applications where this is important, the
loss of “metadata” can be prevented by specifying a lower bound in the declaration
for the dummy array; this bound may either be a constant (when there is a natural
choice for this), or another argument. We illustrate the second approach below:
17 real f u n c t i o n c a l c A v g T e m p V 3 ( inArray , s t a r t T i m e )
18 i m p l i c i t none
19 ! e x p l i c i t lower - bound , to p r e s e r v e array - shape
20 integer , i n t e n t ( in ) : : s t a r t T i m e
21 ! assumed - shape dummy array , with e x p l i c i t lower - bound
22 real , d i m e n s i o n ( s t a r t T i m e : , :) , i n t e n t ( in ) : : i n A r r a y ! C e l s i u s
23
24 c a l c A v g T e m p V 3 = sum ( inArray , mask =( i n A r r a y > -273.15) ) / &
25 count ( mask =( i n A r r a y > -273.15) )
26 end f u n c t i o n c a l c A v g T e m p V 3
Listing 3.9 src/Chapter3/function_assumed_shape_array.f90
(excerpt)
16 end f u n c t i o n c o u n t V o w e l s
Although we would usually try to keep the number of arguments in procedures small,
this may not always be possible (for example, when subprograms from a library are
used). In such situations, it is all too easy to make the mistake of passing arguments in
the wrong order (especially if adjacent dummy arguments in the procedure prototype
have the same type—in which case the compiler would not catch the semantic error).
A very useful Fortran feature for avoiding such problems is that the names given to
the dummy arguments can actually be used as keywords (tags). With this technique,
the order in which arguments are specified at the call site is not important. To give
an example, the following subroutine samples the function z(x, y) = cos(x 2 +
x y)e−0.05(x +y ) , with a resolution res, along a rectangular plane section (x, y) ∈
2 2
26 end s u b r o u t i n e s a m p l e F u n c t i o n T o F i l e V 1
Using keywords, the following call still produces the intended result,15 even if
the order of the arguments does not coincide with that in the function header:
call s a m p l e F u n c t i o n T o F i l e V 1 ( o u t F i l e N a m e = " t e s t _ f u n c _ s a m p l e . dat " , &
xMin = -5. , yMin = -5. , xMax =10. , yMax =10. , res =200 )
Another method for making work with custom procedures easier is to make (some
of) the arguments optional. This makes sense when sensible default values can be
15The resulting data file can be easily visualized, for example, in (gnuplot), using the command:
splot ’sampling_test1.dat’ using 1:2:3 with pm3d .
84 3 Elements of Software Engineering
chosen for some arguments, which are appropriate most of the time, but we still want
to allow advanced users to tune the values. In Fortran, the corresponding dummy
arguments need to be declared with the additional optional-attribute. Then, within
the executable part of the procedure, it is possible to check (with the present
intrinsic function) if the optional argument was actually specified at the call site or
not.
To provide an example, let us re-write the previous procedure, so that some default
values are chosen for the res and outFileName arguments, when they are not
specified at the call site:
1 s u b r o u t i n e s a m p l e F u n c t i o n T o F i l e V 2 ( xMin , xMax , yMin , yMax , res , o u t F i l e N a m e )
2 i m p l i c i t none
3 real , i n t e n t ( in ) : : xMin , xMax , yMin , yMax
4 integer , optional , i n t e n t ( in ) : : res
5 c h a r a c t e r ( len =*) , optional , i n t e n t ( in ) : : o u t F i l e N a m e
6 integer : : i , j , outFileID
7 real : : x , y , a , b , c , d
8 ! d e f a u l t v a l u e s for o p t i o n a l a r g u m e n t s
9 integer , p a r a m e t e r : : d e f a u l t R e s = 300
10 c h a r a c t e r ( len =*) , p a r a m e t e r : : d e f a u l t O u t F i l e N a m e = " t e s t _ f u n c _ s a m p l e . dat "
11 ! local vars for optional - args
12 integer : : actualRes
13 c h a r a c t e r ( len =256) : : a c t u a l O u t F i l e N a m e ! need to s p e c i f y l e n g t h
14
15 ! i n i t i a l i z e local vars c o r r e s p o n d i n g to optional - args . If the c a l l e r
16 ! a c t u a l l y p r o v i d e d values for these args , we copy them ; otherwise , we use
17 ! the d e f a u l t v a l u e s ...
18 ! ... res
19 if ( p r e s e n t ( res ) ) then
20 a c t u a l R e s = res
21 else
22 actualRes = defaultRes
23 endif
24 ! ... o u t F i l e N a m e
25 if ( p r e s e n t ( o u t F i l e N a m e ) ) then
26 actualOutFileName = outFileName
27 else
28 actualOutFileName = defaultOutFileName
29 endif
30
31 ! ensure ’ actualRes ’ value is valid ( should be >=2)
32 if ( a c t u a l R e s < 2 ) then
33 write (* , ’( a ,1 x , i0 ,1 x , a ) ’) " Error : res = " , res , " is i n v a l i d ! A b o r t i n g . " ; stop
34 end if
35
36 ! open output - file
37 open ( n e w u n i t = outFileID , file = trim ( a d j u s t l ( a c t u a l O u t F i l e N a m e )) , s t a t u s = " r e p l a c e " )
38 ! e v a l u a t e scaling - c o e f f i c i e n t s
39 a =( xMax - xMin )/( actualRes -1); b =( a c t u a l R e s * xMin - xMax )/( actualRes -1)
40 c =( yMax - yMin )/( actualRes -1); d =( a c t u a l R e s * yMin - yMax )/( actualRes -1)
41
42 do i =1 , a c t u a l R e s
43 do j =1 , a c t u a l R e s
44 x = a * i + b ; y = c * j + d ! scale to real
45 write ( outFileID , ’(3( f16 .8)) ’) x , y , cos ( x *( x + y ) )* exp ( -0.05*( x **2+ y **2) )
46 end do
47 write ( outFileID ,*) ! n e w l i n e for G n u P l o t
48 end do
49 close ( o u t F i l e I D )
50 end s u b r o u t i n e s a m p l e F u n c t i o n T o F i l e V 2
With this new version of the subroutine, the following call becomes valid:
! call which does not s p e c i f y the res - a r g u m e n t
call s a m p l e F u n c t i o n T o F i l e V 2 ( xMin = -5. , xMax =5. , yMin = -10. , yMax =10. , &
o u t F i l e N a m e = " t e s t _ f u n c _ s a m p l e _ l o w r e s . dat " )
In fact, we already illustrated previously another excellent use for optional argu-
ments, when we discussed error-handling for some of the intrinsic procedures. This
is implemented with the optional stat argument (which can also be used as a key-
word), which the subroutine updates, to mark if any error condition occurred during
its execution. This is preferable to the alternative mechanism (whereby the subrou-
tine simply causes the program to crash if an error condition occurred) since it allows
3.2 Structured Programming (SP) in Fortran 85
the caller to take control over the error-recovery process (perhaps there is a method
to recover from the error or, if not, maybe some operations are necessary, such as
saving data generated up to that point, closing files, etc.). Hence, the use of optional
arguments for this type of error-handling is considered good practice, especially for
libraries to be used by other programmers. A complete discussion is outside the scope
of this book—see, e.g., Chapman [3] for details.
The attentive reader may have noticed a practical issue with our function-sampling
example so far: it can only sample a specific function, which is hard-coded. Clearly,
with the exception of the single line where the expression for the function to be
sampled actually appears, the rest of the subroutine is generic enough to apply to any
function which takes two real-values and returns another real (mathematically–
f : R2 → R). It would be tedious (and a major source of code duplication) if we had
to write a different version of the subroutine for each function of two real variables we
want to sample. Luckily, Fortran allows other procedures to be passed as arguments to
other functions, via a mechanism that works similarly to the C/C++ function pointers
or the C++ function objects. The next version of our function-sampling16 subroutine
illustrates this:
1 s u b r o u t i n e s a m p l e F u n c t i o n T o F i l e V 3 ( xMin , xMax , yMin , yMax , func , o u t F i l e N a m e )
2 i m p l i c i t none
3 real , i n t e n t ( in ) : : xMin , xMax , yMin , yMax
4 ! IFACE for procedure - a r g u m e n t
5 interface
6 real f u n c t i o n func ( x , y )
7 real , i n t e n t ( in ) : : x , y
8 end f u n c t i o n func
9 end i n t e r f a c e
10 c h a r a c t e r ( len =*) , i n t e n t ( in ) : : o u t F i l e N a m e
11 integer : : i , j , outFileID
12 real : : x , y , a , b , c , d
13 i n t e g e r : : res = 300
14
15 open ( n e w u n i t = outFileID , file = o u t F i l e N a m e , s t a t u s = " r e p l a c e " )
16 ! e v a l u a t e scaling - c o e f f i c i e n t s
17 a =( xMax - xMin )/( res -1); b =( res * xMin - xMax )/( res -1)
18 c =( yMax - yMin )/( res -1); d =( res * yMin - yMax )/( res -1)
19
20 do i =1 , res
21 do j =1 , res
22 x = a * i + b ; y = c * j + d ! scale to real
23 w r i t e ( outFileID , ’(3( f16 .8)) ’) x , y , func ( x , y )
24 end do
25 w r i t e ( outFileID ,*)
26 end do
27 close ( outFileID )
28 end s u b r o u t i n e s a m p l e F u n c t i o n T o F i l e V 3
16The resolution was hard-coded here, for brevity. However, the reader can find a more complete
implementation, which re-introduces adjustable resolution and also demonstrates error-handling,
in the program sample_any_surface_with_error_recovery.f90 , in the
source code repository.
17 For convenience, we anticipate the discussion of modules—we cover these shortly, in
Sect. 3.2.7.
86 3 Elements of Software Engineering
1 module TestFunctions2D
2 contains
3 real f u n c t i o n e v a l F u n c 1 ( x , y )
4 real , i n t e n t ( in ) : : x , y
5 e v a l F u n c 1 = cos ( x *( x + y ) )* exp ( -0.05*( x **2+ y **2) )
6 end f u n c t i o n e v a l F u n c 1
7
8 real f u n c t i o n e v a l F u n c 2 ( x , y )
9 real , i n t e n t ( in ) : : x , y
10 e v a l F u n c 2 = cos ( x + y )
11 end f u n c t i o n e v a l F u n c 2
12 end m o d u l e T e s t F u n c t i o n s 2 D
Listing 3.16 src/Chapter3/sample_any_surface.f90 (excerpt)
Finally, the sampling subroutine may be called as in:
1 program sample_any_surface
2 use T e s t F u n c t i o n s 2 D
3 i m p l i c i t none
4 interface
5 s u b r o u t i n e s a m p l e F u n c t i o n T o F i l e V 3 ( xMin , xMax , yMin , yMax , func , &
6 outFileName )
7 real , i n t e n t ( in ) : : xMin , xMax , yMin , yMax
8 interface
9 real f u n c t i o n func ( x , y )
10 real , i n t e n t ( in ) : : x , y
11 end f u n c t i o n func
12 end i n t e r f a c e
13 c h a r a c t e r ( len =*) , i n t e n t ( in ) : : o u t F i l e N a m e
14 end s u b r o u t i n e s a m p l e F u n c t i o n T o F i l e V 3
15 end i n t e r f a c e
16
17 ! sample function 1
18 call s a m p l e F u n c t i o n T o F i l e V 3 ( xMin = -5. , xMax =5. , yMin = -10. , yMax =10. , &
19 func = evalFunc1 , o u t F i l e N a m e = " s a m p l i n g _ f u n c 1 . dat " )
20
21 ! sample function 2
22 call s a m p l e F u n c t i o n T o F i l e V 3 ( xMin = -5. , xMax =5. , yMin = -10. , yMax =10. , &
23 func = evalFunc2 , o u t F i l e N a m e = " s a m p l i n g _ f u n c 2 . dat " )
24 end p r o g r a m s a m p l e _ a n y _ s u r f a c e
Listing 3.17 src/Chapter3/sample_any_surface.f90 (excerpt)
For readers unfamiliar with these techniques, it is interesting to pause and think
about the succession of procedure calls taking place: program sample_any_sur-
face calls the subroutine sampleFunctionToFileV3 which, in turn, calls
whatever function was passed to it as an argument by the user (above—evalFunc1
and evalFunc2). In practice, procedures such as sampleFunctionToFileV3
are often part of libraries for some domain-specific problem (function integration,
minimization, etc.). Passing procedures as arguments is then essential for library
authors, who cannot anticipate all the functions which library users may wish to
combine with the library.
Note on performance
Some of the most useful places where calls to procedure arguments could be
made are within (nested) loops. In these cases, using this technique as illus-
trated above can degrade performance (especially when individual invocations
of the user-supplied function are relatively inexpensive). The reason is that, to
maximize efficiency in this case (inexpensive functions), the compiler should
be able to simply copy the body of the user-supplied function inside the code
generated for the caller (a process named function inlining). When this opti-
mization occurs, explicit function calls are avoided, which can significantly
3.2 Structured Programming (SP) in Fortran 87
Data entities have an attached scope, consisting of the places (in the code of the appli-
cation) where the data can be accessed and—for variables—modified. Restricting
the scope of entities is useful, since it allows the programmer to minimize unwanted
interactions between different program units (therefore making the code more main-
tainable). In general, entities defined within a program unit (main-program, external
procedure, or module), are not accessible from other program units. This rule, how-
ever, does not prevent access within a given program unit. For example, a module
procedure can access all the data entities within the same module (as discussed in
more detail in Sect. 3.2.7). Also, internal procedures can access data of their host, as
well as call other internal procedures of the same host (“host association”).
We discussed previously, in Sect. 2.6.8, how to reserve memory for arrays whose
shape is only known at runtime. Such arrays are very useful for storing the large
data structures in applications, which may need to be read and modified by many
different procedures. A pitfall of such arrays, however, is that they require explicit
memory-management effort from the programmer, who should remember to allocate
the arrays before use, and de-allocate them as soon as they are no longer needed.
In the context of procedures, however, automatic arrays provide a convenient,
procedure-local alternative to allocatable arrays. They are especially suited for cre-
ating arrays whose shape also becomes known only at runtime, but which are not
needed outside the procedure. With this restriction, the programmer can simply
declare these as normal arrays (except that the bounds of the dimensions are vari-
ables); the Fortran runtime system then takes care of the lower-level (de)allocation
details in the background. For example, the following function uses the automatic
90 3 Elements of Software Engineering
array workArray, to implement the counting sort algorithm, that takes an array of
n integers ∈ [0, k], and returns another array with the sorted integers19 :
1 f u n c t i o n c o u n t i n g S o r t ( inArray , k )
2 i m p l i c i t none
3 integer , d i m e n s i o n (:) , i n t e n t ( in ) : : i n A r r a y
4 integer , i n t e n t ( in ) : : k ! max p o s s i b l e element - value in ’ inArray ’
5 integer , d i m e n s i o n ( size ( i n A r r a y ) ) : : c o u n t i n g S o r t ! func r e s u l t
6 ! NOTE : a u t o m a t i c array ( shape depends on function ’ s a r g u m e n t s )
7 i n t e g e r : : w o r k A r r a y (0: k )
8 integer : : i
9
10 ! A u t o m a t i c a r r a y s c a n n o t be i n i t i a l i z e d on declaration - line
11 workArray = 0
12 ! Place h i s t o g r a m of input inside w o r k A r r a y .
13 do i =1 , size ( i n A r r a y )
14 workArray ( inArray (i) ) = workArray ( inArray (i) ) + 1
15 end do
16 ! A c c u m u l a t e in w o r k A r r a y ( i ) the # of e l e m e n t s less than or
17 ! equal to i .
18 do i =1 , k
19 w o r k A r r a y ( i ) = w o r k A r r a y ( i ) + w o r k A r r a y ( i -1)
20 end do
21 ! Place e l e m e n t s at a p p r o p r i a t e p o s i t i o n in output - array .
22 do i = size ( i n A r r a y ) , 1 , -1
23 c o u n t i n g S o r t ( w o r k A r r a y ( i n A r r a y ( i )) ) = i n A r r a y ( i )
24 workArray ( inArray (i) ) = workArray ( inArray (i) ) - 1
25 end do
26 end f u n c t i o n c o u n t i n g S o r t
Listing 3.20 src/Chapter3/function_with_automatic_array.f90 (excerpt)
There are several restrictions on the use of automatic objects:
• they may not be initialized in a type-declaration statement (which is why we also
initialize workArray in the executable section of the function in the previous
example)
• they may not have the save-attribute, i.e., an automatic array cannot persist across
multiple calls to the same procedure; indeed, because the shape of the array would
probably be different for different calls to the procedure, the very concept of
persistence does not make sense here
• they may not be used in namelist-groups (a topic we discuss later, in Sect. 5.2.1).
19 For our purpose here, the details of the algorithm are not important (but we refer the interested
readers to Cormen et al. [6] for more details).
3.2 Structured Programming (SP) in Fortran 91
20 The procedure is said to gain access to the module’s data through use association in this case.
21 Data access is through what is known as host association then.
92 3 Elements of Software Engineering
Given the large number of intrinsic procedures in Fortran,22 and that large applica-
tions usually consist of many custom-written procedures themselves, name clashes
are a real concern (whereby a programmer defines a procedure with the same name
as one of the intrinsics). The behavior in such cases (i.e. which version is selected
by the compiler) can become a source of confusion, which we attempt to summarily
clarify here.23
If the intention of the programmer is to select the custom procedure instead of the
intrinsic one, it is enough to make the interface of the custom procedure explicit, as
22 In addition, compiler vendors are allowed to provide additional intrinsic subroutines, not specified
was described in Sect. 3.2.3. It is also possible to select the custom procedure when the
interface is implicit, by adding a external procedure1, procedure2, ...
specification statement. Note, however, that since this does not add any information
about the interface, the compiler still cannot perform proper checking of argument
(therefore this method is not recommended).
If, however, the intention of the programmer is to select the intrinsic procedure,
this can be achieved with the intrinsic-statement. This needs to appear before
the executable statements in the (sub)program, as in:
! make it clear that we refer to the i n t r i n s i c p r o c e d u r e
! ( a s s u m i n g a name clash with a c u s t o m p r o c e d u r e could occur )
i n t r i n s i c min ! list of i n t r i n s i c p r o c e d u r e s
Although sometimes useful in handling name clashes, this practice is not routinely
used for specifying all the intrinsic procedures used in the application, which would
be too tedious. However, a very good use of this feature is for simplifying the porting
of applications to a different platform, when the application uses vendor-dependent
intrinsic functions for some well-defined reason. A different compiler, which does
not have the non-standard extension, would then issue a more useful error message.
3.2.7 Modules
Another type of program unit, which is very useful for structuring nontrivial applica-
tions better, is the module. In particular, these can be used to group items related to
a particular task (which is also why most Fortran software libraries are conveniently
exposed as a module24 ). Items which can be packaged in a module are:
• global data: constants needed across multiple (sub)programs; variables may also
need to be shared for efficiency sometimes, although in general they are best
avoided;
• subprograms: this practice which has the additional benefit of making the inter-
face of the subprogram explicit automatically, leading to shorter programs due to
elimination of interface-blocks—for example, the program sample_any_
surface and many others from previous sections of this chapter;
• interface-blocks: these are relevant for external procedures, not defined inline
within the module, and also for many OOP techniques, discussed in Sect. 3.3;
• namelist-groups: this use is covered in Sect. 5.2.1;
• derived types and corresponding operations: these are essential in OOP—see
Sect. 3.3.2.
A module can contain a specification part in the beginning, although this is
optional. After that, it can include (also optionally) a procedures part (in which case
it is necessary to include the contains-keyword on a separate line, to mark that
what follows are procedure definitions. This structure is shown below:
24 The library implementation may actually use a hierarchy of modules internally, but often a single
module needs to be presented to the user.
94 3 Elements of Software Engineering
module ModuleName
! ’ use ’ - statements , to i n c l u d e o t h e r m o d u l e s ( o p t i o n a l )
i m p l i c i t none
! s p e c i f i c a t i o n statements , for e x a m p l e :
! * g l o b a l v a r i a b l e s / constants ,
! * i n t e r f a c e blocks ,
! * n a m e l i s t groups , or
! * d e c l a r a t i o n s for d e r i v e d types
contains
! p r o c e d u r e d e f i n i t i o n s ( f u n c t i o n s or s u b r o u t i n e s )
end m o d u l e M o d u l e N a m e
Note the implicit none line, which has the same role as discussed earlier
(enforcing explicit type declarations for variables).25
Entities declared within a module can be made available to (sub)programs (or
even another module), with the use-statement, as in:
use M o d u l e N a m e [ , only : m o d u l e E n t i t y 1 , m o d u l e E n t i t y 2 , ... ]
where the portion inside brackets indicates that it is possible to select individual
entities from a module (otherwise, all public26 entities of the module will become
available). Such a restriction is useful when only a few entities are needed from
a large module, or to document the source of specific entities for developers of
the program/module,27 when several modules are used. The use-statement, when
present, should be the first to appear in the body of the subroutine or module (even
before the implicit none statement).
It is also possible to create a local alias when including a specific entity from
a module, to improve clarity or to avoid name clashes. This is also done in the
use-statement, as shown below:
use M o d u l e N a m e [ , only : l o c a l A l i a s 1 = > m o d u l e E n t i t y 1 , ... ]
As a first module-example, let us package in a more convenient form the code
for obtaining portable precision for the real type (first discussed in Sect. 2.3.4):
10 module RealKinds
11 i m p l i c i t none
12 ! KIND - p a r a m e t e r s for real - v a l u e s
13 integer , p a r a m e t e r : : &
14 R_SP = s e l e c t e d _ r e a l _ k i n d ( 6 , 37 ) , &
15 R_DP = s e l e c t e d _ r e a l _ k i n d ( 15 , 307 ) , &
16 R_QP = s e l e c t e d _ r e a l _ k i n d ( 33 , 4931 )
17 ! Edit - d e s c r i p t o r s for real - v a l u e s
18 c h a r a c t e r ( len =*) , p a r a m e t e r : : R _ S P _ F M T = " f0 .6 " , &
19 R _ D P _ F M T = " f0 .15 " , R _ Q P _ F M T = " f0 .33 "
20
21 contains
22 ! Module - s u b p r o g r a m .
23 s u b r o u t i n e p r i n t S u p p o r t e d R e a l K i n d s ()
24 w r i t e (* , ’( a ) ’) " ** S T A R T : p r i n t S u p p o r t e d R e a l K i n d s ** "
25 if ( R_SP > 0 ) then
26 w r i t e (* , ’( a , i0 , a ) ’) " single - prec . s u p p o r t e d ( kind = " , R_SP , " ) "
27 else
28 w r i t e (* , ’( a , i0 , a ) ’) " single - prec . M I S S I N G ! ( kind = " , R_SP , " ) "
29 end if
30
25 Interestingly, by adding such a line at the beginning of the module, it is not necessary to include
it inside the procedure declarations (if there are any)—although it also does not hurt to keep that
habit.
26 Access control for modules will be discussed shortly.
27 Who would be spared the effort to read through all of the used module to find a specific
data/procedure definition.
3.2 Structured Programming (SP) in Fortran 95
The module can then be used in (sub)programs, as shown in the next listing.
Note that, because of the only keyword, the constants R_SP and R_SP_FMT will
not be available to the program. Also, a module-procedure alias is defined, such that
printSupportedRealKinds (defined in the module) can be used under the
name showFloatingPointDiagnostics.
46 program portable_real_kinds
47 use RealKinds , only : R_DP , R_QP , &
48 R_DP_FMT , R_QP_FMT , &
49 showFloatingPoingDiagnostics => printSupportedRealKinds
50 i m p l i c i t none
51
52 real ( R_DP ) a
53 real ( R_QP ) b
54
55 a = sqrt (2.0 _R_DP ); b = sqrt (2.0 _R_QP )
56
57 call s h o w F l o a t i n g P o i n g D i a g n o s t i c s ()
58
59 w r i t e (* , ’( a ,1 x , ’ // R _ D P _ F M T // ’) ’) " sqrt (2) in double - p r e c i s i o n is " , a
60 w r i t e (* , ’( a ,1 x , ’ // R _ Q P _ F M T // ’) ’) " sqrt (2) in quadruple - p r e c i s i o n is " , b
61 end p r o g r a m p o r t a b l e _ r e a l _ k i n d s
Note that when the module and the (sub)program/other module which uses it
are in the same file (as is the case in our example), most compilers require the
module to appear before the point where it is actually used. Packaging both entities
in the same file is, however, not recommended (except for small tests). Indeed, in
the present application the RealKinds-module can only be used by (sub)programs
and other modules in the same file, which is clearly too restrictive. We present a
better approach later, while covering build systems such as GNU Make (gmake)
(see Sect. 5.1). However, for conciseness we use mostly the “single-file” approach
throughout this chapter.
Persistence of data within a module As of Fortran 2008, variables declared within
a module implicitly have the save-attribute (so it is not necessary to include
this keyword on the declaration line, as was required by previous iterations of the
standard).
details of the library from the programs which use it, so that access to the func-
tionality only proceeds through a well-defined interface28 (and not, for example, by
reading and/or writing directly internal data structures of the library). This restric-
tion is beneficial, since library developers are free to improve the library through
re-structuring the internal implementation, and the users of the library do not need
to modify their programs (assuming the interface was kept invariant). Also, if there
are more libraries with the same interface, the users can also painlessly switch to
another library (a good example is Basic Linear Algebra Subprograms (BLAS)—see
Sect. 5.6.2).
Information hiding is well supported in Fortran via modules, where one can
specify an access-control attribute, to set the visibility of the entity from program
units where the module is used. When an entity is visible to another program unit,
it is furthermore possible to restrict its uses to read-only, with the additional attribute
protected. Thus, in order of increasing rights, entities can be:
• private : not visible outside the module. This should be used for any data
and procedures only relevant to the module implementation (but irrelevant to the
users).
• public, protected 29 : visible outside the module, but only as read-only. This
is useful for exposing things like internal counters of the module, which may be
needed by users, but should not be modified by them. Note that protected only
complements public, so both are necessary (unless the latter attribute is gained
through a module-wide statement, as described below).
• public : entity is visible outside the module, and can be both read and written
in the program units which use the module. This is necessary for procedures
relevant to the users, but one should seek to minimize the number of variables with
this attribute, to enforce information hiding.
All types of entities which can be packaged in a module can be augmented
with such access specifiers (except interface-blocks, which are public). By
default, if no access-control is used, the entities are given the public-status, so
everything is visible to the outside code. To change the default policy to private
for a specific module, the private-keyword can be included (on a line of its own),
in the specification part of the module. Finally, these attributes can also be used in
the form of statements (appearing in the specification part of the module); this can
increase readability, by listing the public interface in a single place. Usage of these
access-control features is demonstrated below:
9 module TestModule
10 i m p l i c i t none
11 p r i v a t e ! C h a n g e to r e s t r i c t i v e default - a c c e s s .
12 integer , public , p r o t e c t e d : : c o u n t A =0 , c o u n t B =0
13 integer : : c o u n t C =0
14
15 p u b l i c e x e c u t e T a s k A , e x e c u t e T a s k B ! S p e c i f y public - i n t e r f a c e of the m o d u l e .
16 contains
28 In this context, “interface” represents the entire set of library procedures that can be called by
the program.
29 C++ programmers should note that “protected” here is not the same notion as in that language,
17 s u b r o u t i n e e x e c u t e T a s k A ()
18 call e x e c u t e T a s k C ()
19 countA = countA + 1 ! increment debug counter
20 end s u b r o u t i n e e x e c u t e T a s k A
21
22 s u b r o u t i n e e x e c u t e T a s k B ()
23 countB = countB + 1 ! increment debug counter
24 end s u b r o u t i n e e x e c u t e T a s k B
25
26 s u b r o u t i n e e x e c u t e T a s k C ()
27 countC = countC + 1 ! increment debug counter
28 end s u b r o u t i n e e x e c u t e T a s k C
29 end m o d u l e T e s t M o d u l e
There, we defined three subroutines (one private to the module), and some vari-
ables (countA, countB and countC) to keep track of the number of invocations
of these subroutines (for debugging). The module can then be used by programs,
for example:
31 program test_access_control_in_modules
32 use T e s t M o d u l e
33 i m p l i c i t none
34
35 call e x e c u t e T a s k A () ! Some calls
36 call e x e c u t e T a s k B () ! to
37 call e x e c u t e T a s k A () ! module - s u b r o u t i n e s .
38 ! Compilation - error if e n a b l e d ( s u b r o u t i n e not visible , b e c a u s e it is made
39 ! ’ private ’ in the m o d u l e )
40 ! call e x e c u t e T a s k C ()
41
42 ! D i s p l a y debugging - c o u n t e r s .
43 w r i t e (* , ’( a ,1 x , i0 ,1 x , a ) ’) ’ " e x e c u t e T a s k A " was called ’ , countA , ’ times ’
44 w r i t e (* , ’( a ,1 x , i0 ,1 x , a ) ’) ’ " e x e c u t e T a s k B " was called ’ , countB , ’ times ’
45 ! C o m p i l a t i o n - e r r o r if e n a b l e d ( module - v a r i a b l e not visible , b e c a u s e it is
46 ! made ’ private ’ in the m o d u l e )
47 ! write (* , ’( a ,1 x , i0 ,1 x , a ) ’) ’" e x e c u t e T a s k C " was called ’ , countC , ’ times ’
48 end p r o g r a m t e s t _ a c c e s s _ c o n t r o l _ i n _ m o d u l e s
DISCLAIMER
Many of the features described in this section require a compiler with (at
least partial) support for Fortran 2003. To ensure that you have the appropri-
ate compiler see, for example, Chivers and Sleightholme [4] (or consult the
documentation of your compiler for up-to-date information).
98 3 Elements of Software Engineering
30 Note that this example is meant only for illustrative purposes—for linear algebra there is already
a wealth of good software available (see Sect. 5.6.2).
31 This does not mean that knowing OOP in one language guarantees a smooth transition—
unfortunately, there are no strict one-to-one mappings of terminology from traditional OOP lan-
guages (like C++ or Java) to equivalent Fortran constructs, hence confusion can occur.
3.3 Elements of Object-Oriented Programming (OOP) 99
even to replace large parts of the applications more easily (in ESS—to use a different
ocean model for example).32
Such partitions are intuitive, so OOP is not a revolution in software engineer-
ing, but rather an evolutionary step, which allows for more powerful management
of abstractions.33 The Fortran language-standardization committee acknowledged
these developments, by including many features of OOP into the modern revisions
(especially Fortran 2003).
In modeling problems, we often encounter entities which are more complex than
what can be described by a variable or array of homogeneous, intrinsic type. Fortran
accommodates such situations by allowing user-defined types (also known as Derived
Data Types (DTs) or abstract data types (ADTs)). These provide the means to pack-
age entities of different types (scalars, arrays, other DTs, etc.) into a single logical
unit; they are the closest correspondent to traditional OOP classes,34 and provide the
basic vehicle for encapsulation.
32 All this assumes that the interfaces between the modules are invariant.
33 Such evolution phenomena reflect the attempts of the software community to keep up with
the large leaps in the capabilities of the underlying hardware and in user expectations. Assembly
language was an evolution from machine opcodes, which took place when the hardware became too
complex to manage directly in terms of opcodes. Similarly, high-level SP-languages appeared as a
second evolutionary step, when assembly was not sufficient anymore to handle the software- and
requirements-complexity. Nowadays, we have OOP, but functional programming is also gaining
more ground.
34 Pioneering efforts in OOP using Fortran (e.g. Akin [1]) used modules to emulate classes,
since they can encapsulate both data and procedures. However, since there is no concept of multiple
instances of a module, only “class-wide” data is supported (corresponding to static class
members in C++). With these tools, it was still possible to emulate “usual” classes by making
the static data an array, which held all “instances” of the class. However, this condemned the
programmer of the module to handle tedious memory management for that array; since there is a
more convenient alternative in Fortran 2003, we do not describe this practice in details, and instead
view modules largely as C++ namespaces.
100 3 Elements of Software Engineering
10 real : : mU = 0. , mV = 0.
11 contains ! Below : d e c l a r a t i o n s for type - bound p r o c e d u r e s
12 procedure : : getMagnitude => getMagnitudeVec2D
13 end type V e c 2 D
14
15 contains
16 real f u n c t i o n g e t M a g n i t u d e V e c 2 D ( this )
17 c l a s s ( V e c 2 D ) , i n t e n t ( in ) : : this
18 g e t M a g n i t u d e V e c 2 D = sqrt ( this % mU **2 + this % mV **2 )
19 end f u n c t i o n g e t M a g n i t u d e V e c 2 D
20 end m o d u l e V e c 2 D _ c l a s s
Listing 3.27 src/Chapter3/dt_basic_demo.f90 (excerpt)
To encourage their re-use, each DT-definition is usually placed in a module
(ideally one module for each DT, which is why some authors, including us here,
customarily add the suffix _class ).
In many ways, a DTs resembles a module. First, there is a specification part
(only line 10 in the example above), where the data components are specified. In
our case, each instance of Vec2D will have two real-variables. Assuming myVec
is a variable of type Vec2D, we can access the components as in: myVec%mU and
myVec%mV.35
Second, separated by a contains-statement, we have the optional procedures
part (line 12 above). This is however not exactly the same as for a module, since
only declarations appear—the actual code for the procedures is elsewhere. Support
for such procedures (named type-bound procedures or methods) was introduced in
Fortran 2003. The interface for them needs to be explicit (so they can be either module
procedures, or external procedures with an interface-block). Our initial version
of Vec2D has a single method—the function getMagnitude, which is in fact an
alias to the function getMagnitudeVec2D, at lines 16–19 in the host module.
That looks like a normal function definition, except the dummy argument (this) is
declared differently: in the position where we used intrinsic types until now, we need
to use class(<DtName>) (line 17), to tell the compiler that we refer to a derived
type. This argument, named passed-object dummy argument, will correspond to the
object for which the method is called, when it is bound to the DT. The binding of
this dummy argument is triggered by line 12 in our example. The general syntax for
such bindings is:
p r o c e d u r e [( i n t e r f a c e N a m e )] [ L i s t O f B i n d A t t r s : :] b i n d N a m e [= > p r o c e d u r e N a m e ]
where:
• interfaceName,36 if specified, can be used to implement the Fortran equivalent
of abstract base classes.37 However, this is a topic outside our short tutorial here
(see, e.g., Clerman and Spector [5] for details).
35 Of course, this is allowed only if the data is public. This is the default policy, which we
leverage here for brevity (we will soon discuss alternatives more consistent with the information
hiding principle).
36 When this argument is present, it needs to be surrounded by round brackets.
37 These are special DTs, which are relevant in inheritance-hierarchies, for fixating the interface
for DTs in such a hierarchy, but deferring the actual implementation of methods to the leaf-DTs.
3.3 Elements of Object-Oriented Programming (OOP) 101
The module hosting the DT can be used by programs, to declare variables and
constants of the new type, as in:
22 program test_driver_a
23 use V e c 2 D _ c l a s s
24 i m p l i c i t none
25
26 type ( V e c 2 D ) : : A ! I m p l i c i t i n i t i a l i z a t i o n
27 type ( V e c 2 D ) : : B = V e c 2 D ( mU =1.1 , mV = 9 . 4 ) ! can use mU & mV as k e y w o r d s
28 type ( V e c 2 D ) , p a r a m e t e r : : C = V e c 2 D (1.0 , 3.2)
29
30 ! A c c e s s i n g c o m p o n e n t s of a data - type .
31 w r i t e (* , ’(3( a ,1 x , f0 .3)) ’) &
32 " A % U = " , A % mU , " , A % V = " , A % mV , " , A % m a g n i t u d e = " , A % g e t M a g n i t u d e () , &
33 " B % U = " , B % mU , " , B % V = " , B % mV , " , B % m a g n i t u d e = " , B % g e t M a g n i t u d e () , &
34 " C % U = " , C % mU , " , C % V = " , C % mV , " , C % m a g n i t u d e = " , C % g e t M a g n i t u d e ()
35 end p r o g r a m t e s t _ d r i v e r _ a
For declarations (lines 26–28), the type of data is specified with type
(<DtName>)-constructs (instead of, e.g., integer). Regarding initialization, it
is possible to initialize directly on the declaration line (B)—as usual, this is required
for constants (C). However, note that we did not explicitly initialize A. This is to
demonstrate a mechanism which is available for DTs but not for implicit types—
default values. Whereas there is no standard method for assigning a conventional
default value (e.g. 0) to variables of intrinsic types, for DTs we can specify such
values (mU = mV = 0—see line 10 in Listing 3.27, where the DT was defined).
To make DT-initializations possible, Fortran provides implicit constructors
behind-the-scenes. These look like function calls, where the names of the data mem-
bers of the DT can be used as keywords, to improve readability (line 27 above). It is
possible to write custom constructors, if the default ones are not sufficient (but with
some important observations, discussed below).
Similarly, the analogue of destructors in other languages are final-procedures.
These should be written when pointers are used, or when special actions are necessary
when the DT ceases to exist. The finalizers are also specified after the contains-
statement in the DT definition (although, strictly speaking, they are not type-bound
procedures). The syntax for them is:
final : : ListOfProcedures
We give an example for a case when such procedures are useful later (Sect. 5.2.2),
while discussing netCDF-output.
Methods of the DT can be invoked38 in a similar way as one would reference a
data member, with the name of the object, followed by % , and then by the name of
the method with arguments in brackets. For example, in lines 32–34 of Listing 3.28,
we call the getMagnitude method of Vec2D. We can apply here the previous
discussion on passed-object dummy arguments: although no arguments seem to be
specified to the method in this example, we know from the definition of the method
that there should be one argument—this is silently added by the compiler (receiving
A, B, and C as an actual argument—see lines 32, 33, and 34 respectively).
For demonstration purposes, we left the internal data of the type Vec2D above acces-
sible from the main-program. However, in doing so we violated the data hiding and
encapsulation principles of OOP, which undermines many of the benefits of the par-
adigm: for example, if the maintainers of the Vec2D DT decide that a representation
in polar coordinates (r, θ ) would be more efficient than (x, y), they cannot simply
make this change without considering that all users would also need to modify their
programs.39 To remedy this problem, it is best to fine-tune exactly what is visible
38 In OOP jargon, method invocations are also referred to as “sending a message” to the object.
39 For our simplified example, this would not be a big problem. However, in large projects, where
the DT is used by many developers, such disruptive changes can cause significant friction.
3.3 Elements of Object-Oriented Programming (OOP) 103
to other program units, with judicious use of the private and public keywords.
For our example, we could adopt the following DT-definition:
1 ! NOTE : This DT d e c l a r a t i o n is too r e s t r i c t i v e ( see f o l l o w i n g d i s c u s s i o n ).
2 type , p u b l i c : : Vec2D ! DT e x p l i c i t l y d e c l a r e d " p u b l i c "
3 private ! Make i n t e r n a l data " private " by d e f a u l t .
4 real : : mU = 0. , mV = 0.
5 contains
6 p r i v a t e ! Make methods " p r i v a t e " by d e f a u l t .
7 ! ( good p r a c t i c e for the case when we have
8 ! i m p l e m e n t a t i o n - s p e c i f i c methods , that the user
9 ! does not need to know about ).
10 procedure , p u b l i c : : g e t M a g n i t u d e
end type Vec2D
11
40 Making methods private can be useful, for example, when some of them are implementation-
when relying on custom constructors for large objects (e.g. those encapsulating model
arrays in ESS), or when objects need to be repeatedly re-initialized, within time-
consuming loops. We make use of both approaches in the code samples for the rest
of the book.
A second problem with Listing 3.29, not solved by the updated DT-definition
in Listing 3.30, was the lack of a mechanism for accessing the components of the
vector. We can easily solve this, by adding two type-bound functions43 (this time—
type-bound), as shown below. These are also called accessor-methods (or getters,
since their name is typically formed by concatenating “get” and the name of the
component). Also, we re-introduce the getMagnitude-function:
6 module Vec2d_class
7 i m p l i c i t none
8 p r i v a t e ! Make module - e n t i t i e s " p r i v a t e " by d e f a u l t .
9
10 type , p u b l i c : : V e c 2 d ! DT e x p l i c i t l y d e c l a r e d " p u b l i c "
11 private ! Make i n t e r n a l data " p r i v a t e " by d e f a u l t .
12 real : : mU = 0. , mV = 0.
13 contains
14 private ! Make m e t h o d s " p r i v a t e " by d e f a u l t .
15 procedure , p u b l i c : : init = > i n i t V e c 2 d
16 procedure , p u b l i c : : getU = > g e t U V e c 2 d
17 procedure , p u b l i c : : getV = > g e t V V e c 2 d
18 procedure , p u b l i c : : g e t M a g n i t u d e = > g e t M a g n i t u d e V e c 2 d
19 end type V e c 2 d
20
21 ! G e n e r i c IFACE , for type - o v e r l o a d i n g
22 ! ( to i m p l e m e n t user - d e f i n e d CTOR )
23 interface Vec2d
24 module procedure createVec2d
25 end i n t e r f a c e V e c 2 d
26
27 contains
28 type ( V e c 2 d ) f u n c t i o n c r e a t e V e c 2 d ( u , v ) ! CTOR
29 real , i n t e n t ( in ) : : u , v
30 c r e a t e V e c 2 d % mU = u
31 c r e a t e V e c 2 d % mV = v
32 end f u n c t i o n c r e a t e V e c 2 d
33
34 s u b r o u t i n e i n i t V e c 2 d ( this , u , v ) ! init - s u b r o u t i n e
35 c l a s s ( V e c 2 d ) , i n t e n t ( i n o u t ) : : this
36 real , i n t e n t ( in ) : : u , v
37 ! copy - over data inside the object
38 this % mU = u
39 this % mV = v
40 end s u b r o u t i n e i n i t V e c 2 d
41
42 real f u n c t i o n g e t U V e c 2 d ( this ) ! accessor - m e t h o d ( G E T t e r )
43 c l a s s ( V e c 2 d ) , i n t e n t ( in ) : : this
44 g e t U V e c 2 d = this % mU ! direct - a c c e s s IS a l l o w e d here
45 end f u n c t i o n g e t U V e c 2 d
46
47 real f u n c t i o n g e t V V e c 2 d ( this ) ! accessor - m e t h o d ( G E T t e r )
48 c l a s s ( V e c 2 d ) , i n t e n t ( in ) : : this
49 g e t V V e c 2 d = this % mV
50 end f u n c t i o n g e t V V e c 2 d
51
52 real f u n c t i o n g e t M a g n i t u d e V e c 2 d ( this ) r e s u l t ( mag )
53 c l a s s ( V e c 2 d ) , i n t e n t ( in ) : : this
54 mag = sqrt ( this % mU **2 + this % mV **2 )
55 end f u n c t i o n g e t M a g n i t u d e V e c 2 d
56 end m o d u l e V e c 2 d _ c l a s s
Listing 3.32 src/Chapter3/dt_accessors.f90 (excerpt)
The DT can now be used similarly to the public-version (but note the change
from A%mU to A%getU(), and same for v):
43 Optimizing compilers should “see through” this intermediate layer, and inline the functions, so
that they do not affect performance (although this needs to be verified through benchmarks, as
usual).
106 3 Elements of Software Engineering
67 ! A c c e s s i n g c o m p o n e n t s of DT t h r o u g h m e t h o d s ( type - bound p r o c e d u r e s ).
68 write (* , ’(3( a ,1 x , f0 .3)) ’) " A % U = " , A % getU () , &
" , A % V = " , A % getV () , " , A % m a g n i t u d e = " , A % g e t M a g n i t u d e ()
69
We mentioned in the beginning of Sect. 3.3 that the OOP paradigm usually leads to
a hierarchy of types. Two mechanisms are at the disposal of the Fortran programmer
to construct these hierarchies: inheritance and aggregation. We briefly discuss these
in this section.
As a simple showcase example, we will look at how to extend the DT from the
previous section (Vec2d), to represent 3D-vectors.44
3.3.3.1 Inheritance
44 Of course, for such a simple DT, it would be easier (and potentially also more efficient) to write the
class for 3D-vectors from scratch. However, we implement it here based on Vec2d, to illustrate
the techniques in a simple setting.
3.3 Elements of Object-Oriented Programming (OOP) 107
49 module Vec3d_class
50 use V e c 2 d _ c l a s s
51 i m p l i c i t none
52 private
53
54 type , public , e x t e n d s ( Ve c 2 d ) : : Vec3d
55 private
56 real : : mW = 0.
57 contains
58 private
59 procedure , p u b l i c : : getW = > g e t W V e c 3 d
60 procedure , p u b l i c : : g e t M a g n i t u d e = > g e t M a g n i t u d e V e c 3 d
61 end type Vec3d
62
63 i n t e r f a c e Vec3d
64 module procedure createVec3d
65 end i n t e r f a c e Vec3d
66
67 contains
68 ! Custom CTOR for the child - type .
69 type ( Vec3d ) f u n c t i o n c r e a t e V e c 3 d ( u , v , w )
70 real , i n t e n t ( in ) : : u , v , w
71 c r e a t e V e c 3 d % Vec2d = Vec2d ( u , v ) ! Call CTOR of p a r e n t .
72 c r e a t e V e c 3 d % mW = w
73 end f u n c t i o n c r e a t e V e c 3 d
74
75 ! O v e r r i d e m e t h o d of parent - type .
76 ! ( to c o m p u t e magnitude , c o n s i d e r i n g ’w ’ too )
77 real f u n c t i o n g e t M a g n i t u d e V e c 3 d ( this ) r e s u l t ( mag )
78 class ( Vec3d ) , i n t e n t ( in ) : : this
79 ! this % Vec2d % getU () is equivalent , here , with this % getU ()
80 mag = sqrt ( this % Vec2d % getU ()**2 + this % getV ()**2 + this % mW **2 )
81 end f u n c t i o n g e t M a g n i t u d e V e c 3 d
82
83 ! M e t h o d s p e c i f i c to the child - type .
84 ! ( G E T t e r for new c o m p o n e n t ).
85 real f u n c t i o n g e t W V e c 3 d ( this )
86 class ( Vec3d ) , i n t e n t ( in ) : : this
87 g e t W V e c 3 d = this % mW
88 end f u n c t i o n g e t W V e c 3 d
89 end m o d u l e V e c 3 d _ c l a s s
Listing 3.34 src/Chapter3/dt_composition_inheritance.f90 (excerpt)
45 If the data of the parent was public, an implicit constructor would have been created for the
child type, which would accept as arguments first the components of the parent type (in sequence),
followed by the additional components of the child type (also in sequence).
46 Unless those methods have the non_overridable-specifier in their binding attribute list.
108 3 Elements of Software Engineering
In closing our quick coverage of inheritance note that, in Fortran jargon, the
class-keyword indicates “class of types” (or inheritance hierarchy). This is differ-
ent from other OOP languages, where “class” means a data type (type in Fortran).
Also, unlike other languages, Fortran does not allow multiple inheritance (Metcalf
et al. [8]).
3.3.3.2 Aggregation
A second mechanism for implementing hierarchies of types, which may come more
natural to some programmers, is aggregation, which models a “has a” relationship
between the types. We could also use this approach to implement another version of
our Vec3d-class:
54 type , p u b l i c : : V e c 3 d
55 private
56 type ( V e c 2 d ) : : m V e c 2 d ! DT - a g g r e g a t i o n
57 real : : mW = 0.
58 contains
59 private
60 procedure , p u b l i c : : getU = > g e t U V e c 3 d
61 procedure , p u b l i c : : getV = > g e t V V e c 3 d
62 procedure , p u b l i c : : getW = > g e t W V e c 3 d
63 procedure , p u b l i c : : g e t M a g n i t u d e = > g e t M a g n i t u d e V e c 3 d
64 end type V e c 3 d
Listing 3.36 src/Chapter3/dt_composition_aggregation.f90 (excerpt)
This is nothing else than simply using the less complex type as a component (line
56). The usual access-control mechanisms specify what data and methods of Vec2d
can be referenced in the implementation of Vec3d (except that we now have to use
the component’s name, mVec2d, to get access). Since the implementation has no
other remarkable features, we omit discussion of the methods here.
The attentive reader may notice that the distinction of “is a” and “has a” relationships
between DTs can sometimes be subjective. Indeed, to follow our previous example,
the same type Vec3d was implemented with the same functionality based on either
approach. This can make it confusing to select between the two in practice. A rough
rule of thumb is to use inheritance if there is an obvious hierarchy of types in the
3.3 Elements of Object-Oriented Programming (OOP) 109
problem, which will make children’s direct inheritance of parent methods beneficial
(no need to re-implement them, or to define “wrapper methods”). If, however, children
would routinely need to override methods of the parent (or, worse, if parent methods
do not make sense for children types!), aggregation is preferred as a composition
method (see Rouson et al. [10]).
47 This is essentially what we referred to as the “interface”, without the return type, since most
languages (including Fortran) do not look at this type when distinguishing overloads.
48 This is also known as “type overloading”, since the name of the generic interface was that of the
type.
49 Note that they serve a different purpose than the unnamed interface-blocks, shown in the
beginning of this chapter (which were demonstrated for making the interface of an external
procedure explicit).
110 3 Elements of Software Engineering
23 ! ’ module procedure ’ - s t a t e m e n t .
24 module procedure swapInteger
25 end i n t e r f a c e swap
26 contains
27 ! Module - p r o c e d u r e .
28 subroutine swapInteger ( a, b )
29 integer , i n t e n t ( i n o u t ) : : a , b
30 i n t e g e r : : tmp
31 tmp = a ; a = b ; b = tmp
32 end s u b r o u t i n e s w a p I n t e g e r
33 end m o d u l e U t i l i t i e s
Listing 3.37 src/Chapter3/overload_normal_procedures.f90 (excerpt)
The user of the module Utilities can then swap both integers and reals,
using the same syntax:
35 program test_util_a
36 use U t i l i t i e s
37 i m p l i c i t none
38 i n t e g e r : : i1 = 1 , i2 = 3
39 real : : r1 = 9.2 , r2 = 5.6
40
41 w r i t e (* , ’( " I n i t i a l s t a t e : " ,1 x ,2( a , i0 ,1 x ) , 2( a , f0 .2 ,1 x )) ’) &
42 " i1 = " , i1 , " , i2 = " , i2 , " , r1 = " , r1 , " , r2 = " , r2
43 call swap ( i1 , i2 )
44 call swap ( r1 , r2 )
45 w r i t e (* , ’( " S t a t e a f t e r s w a p s : " ,1x ,2( a , i0 ,1 x ) , 2( a , f0 .2 ,1 x )) ’) &
46 " i1 = " , i1 , " , i2 = " , i2 , " , r1 = " , r1 , " , r2 = " , r2
47 end p r o g r a m t e s t _ u t i l _ a
Listing 3.38 src/Chapter3/overload_normal_procedures.f90 (excerpt)
Note that we can still access swapReal (even if it is private), through the
generic interface (which is public).
In addition to the requirements that the overloads should have distinct signatures,
note that they should also be all functions or all subroutines. Finally, it is also
worth noting that there is an additional overloading mechanism for types, using what
are known as “generic type-bound procedures”. This is beneficial especially when
the only-modifier is present at the place where modules are included (to import
only selected entities). A mistake which can easily occur then is forgetting to include
a generic interface, which can cause implicit functions (such as the assignment oper-
ator) to be called instead of the intended overloads in the module. We do not develop
this issue (see Metcalf et al. [8] for details, if you encounter this scenario).
Operator overloading It is interesting to note that operators (like the unary .not.
or the binary + ) are also procedures, only with special support from the language,
to allow a more convenient notation (infix notation)—so the idea of overloading
should apply to them as well. Indeed, Fortran (and other languages) allows developers
to overload these functions for non-intrinsic types. We can simply achieve this by
replacing the name of the generic interface (“swap” in our previous example) by
operator(<operatorName>) , where operatorName is one of the intrinsic
operators. This is demonstrated below:
8 module Vec3d_class
9 i m p l i c i t none
10
11 type , p u b l i c : : V e c 3 d
12 real : : mU = 0. , mV = 0. , mW = 0. ! Make ’ private ’ in p r a c t i c e !
13 contains
14 p r o c e d u r e : : d i s p l a y ! C o n v e n i e n c e output - m e t h o d .
15 end type V e c 3 d
16
17 ! G e n e r i c i n t erface , for operator - o v e r l o a d i n g .
3.3 Elements of Object-Oriented Programming (OOP) 111
18 i n t e r f a c e o p e r a t o r ( -)
19 module procedure negate ! unary - minus
20 m o d u l e p r o c e d u r e s u b t r a c t ! binary - s u b t r a c t i o n
21 end i n t e r f a c e o p e r a t o r ( -)
22
23 contains
24 type ( V e c 3 d ) f u n c t i o n n e g a t e ( i n V e c )
25 c l a s s ( V e c 3 d ) , i n t e n t ( in ) : : i n V e c
26 n e g a t e % mU = - i n V e c % mU
27 n e g a t e % mV = - i n V e c % mV
28 n e g a t e % mW = - i n V e c % mW
29 end f u n c t i o n n e g a t e
30
31 ! NOTE : it is also p o s s i b l e to o v e r l o a d b i n a r y o p e r a t o r s with h e t e r o g e n e o u s
32 ! data - types . In our case , we could devine two more o v e r l o a d s for
33 ! binary - ’ - ’ , to s u p p o r t s u b t r a c t i o n when inVec1 or inVec2 is a scalar . In that
34 ! case , only the type of inVec1 or inVec2 needs to change , and the code i n si d e
35 ! the f u n c t i o n to be a d a p t e d .
36 type ( V e c 3 d ) f u n c t i o n s u b t r a c t ( inVec1 , i n V e c 2 )
37 c l a s s ( V e c 3 d ) , i n t e n t ( in ) : : inVec1 , i n V e c 2
38 s u b t r a c t % mU = i n V e c 1 % mU - i n V e c 2 % mU
39 s u b t r a c t % mV = i n V e c 1 % mV - i n V e c 2 % mV
40 s u b t r a c t % mW = i n V e c 1 % mW - i n V e c 2 % mW
41 end f u n c t i o n s u b t r a c t
42
43 ! Utility - method , for more c o n v e n i e n t d i s p l a y of ’ Vec3d ’ - e l e m e n t s .
44 ! NOTE : A better s o l u t i o n is to use I / O for derived - types ( see M e t c a l f 2 0 1 1 ).
45 s u b r o u t i n e d i s p l a y ( this , n a m e S t r i n g )
46 c l a s s ( V e c 3 d ) , i n t e n t ( in ) : : this
47 c h a r a c t e r ( len =*) , i n t e n t ( in ) : : n a m e S t r i n g
48 w r i t e (* , ’(2 a ,3( f0 .2 ,2 x ) , a ) ’) &
49 trim ( n a m e S t r i n g ) , " = ( " , this % mU , this % mV , this % mW , " ) "
50 end s u b r o u t i n e d i s p l a y
51 end m o d u l e V e c 3 d _ c l a s s
The new operators can then be used to form expressions with our DT, as in:
53 program test_overload_intrinsic_operators
54 use V e c 3 d _ c l a s s
55 i m p l i c i t none
56 type ( V e c 3 d ) : : A = V e c 3 d (2. , 4. , 6.) , B = V e c 3 d (1. , 2. , 3.)
57
58 w r i t e (* , ’(/ , a ) ’) " initial - s t a t e : "
59 call A % d i s p l a y ( " A " ); call B % d i s p l a y ( " B " )
60
61 A = -A
62 w r i t e (* , ’(/ , a ) ’) ’ a f t e r o p e r a t i o n " A = -A " : ’
63 call A % d i s p l a y ( " A " ); call B % d i s p l a y ( " B " )
64
65 A = A - B
66 w r i t e (* , ’(/ , a ) ’) ’ a f t e r o p e r a t i o n s " A = A - B " : ’
67 call A % d i s p l a y ( " A " ); call B % d i s p l a y ( " B " )
68 end p r o g r a m t e s t _ o v e r l o a d _ i n t r i n s i c _ o p e r a t o r s
Listing 3.40 src/Chapter3/overload_intrinsic_operators.f90 (excerpt)
This powerful technique can lead to more readable code, by raising the level of
abstraction, as in:
49 C = A . cross . B
Listing 3.42 src/Chapter3/overload_custom_operator.f90 (excerpt)
Related to precedence, user-defined unary operators have higher priority than
all other operators, while user-defined binary operators are the opposite (lowest
priority—intrinsic operators included in both cases). However, it is easy (and often
clearer) to override the order of evaluations with brackets, as usual.
Finally, another operator which can be overloaded is the assignment ( = ).50 This
is relevant only when the DT has a pointer-component, which is a topic outside
the scope of this text.51
3.3.5 Polymorphism
50 This type of overloading is named “defined assignment”, marked by changing the name of the
generic interface by assignment(=) and implemented by a subroutine which takes
two arguments (first intent(out/inout)) and second intent(in)).
51 In that case, the implicit assignment implemented by the compiler would only perform a shallow
copy of the object, without duplicating the data accessed by the pointer. However, when
pointers are not used, the implicit assignment will perform a proper deep copy of the object, even
when the DT has allocatable arrays as data members.
52 To be precise, the concept we are referring to here is also known as “subtype polymorphism”, to
distinguish it from other methodologies which are also named “polymorphism” sometimes—e.g.,
overloading (“ad-hoc polymorphism”) and generic programming (“parametric polymorphism”).
3.3 Elements of Object-Oriented Programming (OOP) 113
When variables are defined with type class(∗), they can be assigned values of
any DT (including intrinsic ones).
Due to their dynamic nature, polymorphic variables need to be allocatable,
dummy arguments, or pointers.
• polymorphic procedures: These may operate on data of different types during the
execution of the program. The advantage is that the code for such procedures
can be written in generic terms, calling methods for variables of different DTs.
As long as the DTs satisfy some interface conventions (the calls made by the
polymorphic procedure need to actually exist in the callee’s DT), the runtime
system will dynamically determine the method of which DT needs to be called.
In Fortran, polymorphic procedures are supported by using polymorphic variables
(see above) as dummy arguments. It is also possible to take different actions based
on the type of the actual arguments, using the select type -construct (which
then supports matching a specific DT, or a class of DTs).
A more complete description of the mechanisms of polymorphism is outside the
scope of this book. For more information, see Metcalf et al. [8] or Clerman and
Spector [5].
Languages like C++ also support GP, whereby procedures are written once, in terms
of types that are specified later—see, e.g., Stepanov and McJones [11]. These can sig-
nificantly reduce duplication of code; for example, a single swap-procedure can be
written, from which the compiler may instantiate versions to swap data of integer,
real, or user-defined type. Currently, Fortran also supports some of these ideas, but
in a more limited sense.53
elemental procedures First, procedures can be made generic with respect to
their rank, by making them elemental. Such functions take an array of any rank
(including rank 0, so they also support scalars), and return an array of the same
shape, but where each element in the output array contains the result of the function
application to the corresponding element in the input array. When such an element-
wise application makes sense, it can bring a significant reduction in code size (since
it is not necessary to write specific versions of the procedure, for each array shape
that may be used in our application). The following example demonstrates how this
may be used with a Vec3d type, to implement vector normalization54 :
1 module Vec3d_class
2 i m p l i c i t none
3 private
4 p u b l i c : : n o r m a l i z e ! E x p o s e the e l e m e n t a l f u n c t i o n .
5
6 type , p u b l i c : : V e c 3 d
7 real : : mU = 0. , mV = 0. , mW = 0.
8 end type V e c 3 d
9
10 contains
11 type ( V e c 3 d ) e l e m e n t a l f u n c t i o n n o r m a l i z e ( this )
12 type ( V e c 3 d ) , i n t e n t ( in ) : : this
13 ! Local v a r i a b l e ( note that the ’ g e t M a g n i t u d e ’ - m e t h o d could also be called ,
14 ! but we do not have it i m p l e m e n t e d here , for b r e v i t y ).
15 real : : m a g n i t u d e
16 m a g n i t u d e = sqrt ( this % mU **2 + this % mV **2 + this % mW **2 )
17 n o r m a l i z e % mU = this % mU / m a g n i t u d e
18 n o r m a l i z e % mV = this % mV / m a g n i t u d e
19 n o r m a l i z e % mW = this % mW / m a g n i t u d e
20 end f u n c t i o n n o r m a l i z e
21 end m o d u l e V e c 3 d _ c l a s s
22
23 program test_elemental
24 use V e c 3 d _ c l a s s
25 i m p l i c i t none
26
27 type ( V e c 3 d ) : : scalarIn , a r r a y 1 I n (10) , a r r a y 2 I n (15 , 20)
28 type ( V e c 3 d ) : : scalarOut , a r r a y 1 O u t (10) , a r r a y 2 O u t (15 , 20)
29
30 ! Place some values in the ’in ’ - v a r i a b l e s ...
31 s c a l a r O u t = n o r m a l i z e ( s c a l a r I n ) ! Apply n o r m a l i z e to s c a l a r
32 a r r a y 1 O u t = n o r m a l i z e ( a r r a y 1 I n ) ! Apply n o r m a l i z e to rank -1 array
33 a r r a y 2 O u t = n o r m a l i z e ( a r r a y 2 I n ) ! Apply n o r m a l i z e to rank -2 array
34 end p r o g r a m t e s t _ e l e m e n t a l
Writing procedures as elemental not only make them generic, but can also
improve performance. The latter is due to the fact that elemental procedures are
also required to be pure (a topic we described in Sect. 3.2.5); with this restriction
satisfied, it is guaranteed that the correct result will be obtained, no matter in which
order (serial/parallel) the function is applied to the input elements. Many intrinsic
procedures were designed to be elemental.
Parameterized types It is also possible55 in Fortran to parameterize data types based
on integer-values. Specific values for these parameters can then be assigned either
at compile-time (also known as kind-like parameters, since they can be used to
change the precision for the intrinsic types56 ), or at runtime (also known as len-like
parameters, to highlight the connection with character strings of length assigned at
runtime). For a discussion of this more advanced feature see e.g. Metcalf et al. [8].
References
1. Akin, E.: Object-Oriented Programming via Fortran 90/95. Cambridge University Press, Cam-
bridge (2003)
2. Booch, G., Maksimchuk, R.A., Engle, M.W., Young, B.J., Connallen, J., Houston, K.A.: Object-
Oriented Analysis and Design with Applications. Addison-Wesley Professional, Boston (2007)
3. Chapman, S.J.: Fortran 95/2003 for Scientists and Engineers. McGraw-Hill Science/Engineer-
ing/Math, New York (2007)
55 Although this is standard Fortran 2003, most compilers had yet to implement this feature at the
time of our writing unfortunately.
56 Remember that kind-parameters are also integer-values.
References 115
4. Chivers, I.D., Sleightholme, J.: Compiler support for the Fortran 2003 and 2008 standards
(Revision 11). ACM SIGPLAN Fortran Forum 31(3), 17–28 (2012)
5. Clerman, N.S., Spector, W.: Modern Fortran: Style and Usage. Cambridge University Press,
Cambridge (2011)
6. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press,
Cambridge (2009)
7. Hager, G., Wellein, G.: Introduction to High Performance Computing for Scientists and Engi-
neers. CRC Press, Boca Raton (2010)
8. Metcalf, M., Reid, J., Cohen, M.: Modern Fortran Explained. Oxford University Press, Oxford
(2011)
9. Richardson, L.F., Lynch, P.: Weather Prediction by Numerical Process. Cambridge University
Press, Cambridge (2007)
10. Rouson, D., Xia, J., Xu, X.: Scientific Software Design: The Object-Oriented Way. Cambridge
University Press, Cambridge (2011)
11. Stepanov, A.A., McJones, P.: Elements of Programming. Addison-Wesley Professional, Boston
(2009)
Chapter 4
Applications
In the previous chapters, we kept the difficulty of the examples at a minimum, since
the goal was to clearly illustrate basic Fortran features. Computer models in ESS
are, of course, orders of magnitude more complex (∼105 –106 lines of code are not
uncommon). While it would not be practical (nor immediately instructive) to confront
the readers directly with such a model, we attempt to make a “transition” towards
more complex applications in this chapter, by presenting three case studies based on
problems relevant to ESS. We start with a finite differences (FD) solver for the time-
dependent heat diffusion problem in 2D. The second case study (which is also more
specific to ESS) discusses an EBM for simulating some feedbacks occurring in the
climate system. Finally, we discuss a classic flow problem (Rayleign-Bénard (RB)
convection), along with a numerical solution in 2D based on the lattice Boltzmann
method (LBM).
As a first application, we consider heat diffusion in 2D. The governing equation for
this phenomenon (in isotropic media) is:
Not
∂t θ = κ∂ββ θ ≡ κ∇ 2 θ (4.1)
where θ is temperature (in K) and κ is the thermal diffusivity coefficient (in m2 /s).
Index notation
Not
For partial derivatives, we used the subscript notation, i.e. ∂βγ F(x, y) ≡
∂ 2 F(x,y)
∂xγ ∂xβ (for a function F depending on x and y). Also, x, y and t represent
the usual space and time coordinates. To keep the equations compact, we
use the Einstein convention whenever possible, whereby a repeated subscript
implies summation over the range of values of that subscript.
θA − θB θ B − θC
θ(x, L , t) = x + θB , θ(0, y, t) = y + θC , (4.2)
L L
θ D − θC θA − θD
θ(x, 0, t) = x + θC , θ(L , y, t) = y + θD , (4.3)
L L
The previous expressions were derived by setting the temperature values at the
four corners (θ A , . . . , θ D —see also Fig. 4.1), and assuming the temperature along
the edges to vary linearly between the values at the corresponding corners.
As IC for the interior (i.e. excluding the domain boundaries), we take:
θ(x, y, 0) = θ A (4.4)
(a) y (d)
(b) y (d)/δ (d)
x
(d) 2 (d)
θB = 3 θA = 1
(d),k (d),k
(d),k ui,j +vi,j
(d) (d)
∂t θ(d) = ∂ββ θ(d) θi,j = 2
(d) 1 (d)
θC = 3 x(d) x(d)/δx
(d)
θD =0 0 1 ······ Nx
Fig. 4.1 Geometry for the 2D heat diffusion problem: a dimensionless system and b discretized
(FD) system
4.1 Heat Diffusion 119
Before presenting the procedure for solving our problem numerically, it is useful to
re-write the equations in dimensionless form. This leverages the concept of dynami-
cal similarity, as introduced by Buckingham [3]. The goal of this transformation is to
minimize the number of parameters necessary to describe our physical problem. This
makes our subsequent numerical solution more generally applicable, and also facil-
itates comparisons with analytical solutions, experimental data, or other numerical
results.
1
x (d) ≡ x ⇐⇒ x = L x (d) (4.5)
L
κ L 2 (d)
t (d) ≡ 2 t ⇐⇒ t = t (4.6)
L κ
θ − θD
θ(d) ≡ ⇐⇒ θ = θ D + (θ A − θ D )θ(d) , (4.7)
θA − θD
(d) L2 κ (d)
∂t = ∂t ⇐⇒ ∂t = 2 ∂t (4.8)
κ L
(d) 1 (d)
∂αβ = L 2 ∂αβ ⇐⇒ ∂αβ = 2 ∂αβ (4.9)
L
120 4 Applications
Plugging equations (4.5)–(4.9) into Eq. (4.1), we obtain the heat diffusion equation
in the nondimensional system of units:
and is unconditionally stable, 2nd-order accurate1 in space and time, and also easy
to implement (but with some disadvantages in terms of parallel execution, as we
will discuss in next chapter). The idea is to average the solutions obtained with two
specially designed discretizations of Eq. (4.10), so that the leading-order errors of
each discretization are of similar magnitude and opposite sign to those of the other
discretization—leading to a higher accuracy of the combined solution.
With the notation explained above, the two discretizations for the method of
Barakat and Clark [1] read:
(d), k+1 (d), k
u i, j − u i, j 1 (d), k+1 (d), k+1 (d), k (d), k
(d)
= 2 (u i−1, j − u i, j − u i, j + u i+1, j )
δt (d)
δx
(d), k+1 (d), k+1 (d), k (d), k
+ (u i, j−1 − u i, j − u i, j + u i, j+1 )
(4.16)
(d), k+1 (d), k
vi, j − vi, j 1 (d), k (d), k (d), k+1 (d), k+1
(d)
= 2 (vi−1, j − vi, j − vi, j + vi+1, j )
δt (d)
δx
(d), k (d), k (d), k+1 (d), k+1
+ (vi, j−1 − vi, j − vi, j + vi, j+1 )
(4.17)
where δt(d) 1 is the dimensionless time between any two successive iterations
and δx(d) = δ (d)
y 1 (isotropic grid) is the dimensionless distance between adja-
cent nodes. The final discrete solution for temperature is obtained by averaging the
sub-solutions:
1 Here, accuracy refers to the difference between the solution of the continuous equation and that
of the discretized equation. This “discretization error” (also known as “truncation error”) occurs
due to the method itself. The magnitude of this error is expected to decrease for a well-constructed
discretized scheme as the number of grid points is increased (the order of the scheme quantifies how
fast this error decreases). An additional type of errors (“roundoff error”) is introduced by the fact
that digital computers can only store most real numbers with a limited accuracy, as we discussed
in Sect. 2.3.4.
122 4 Applications
(d), k (d), k
(d), k
u i, j + vi, j
θi, j = (4.18)
2
With some additional notations, i.e.:
2δt(d)
λ≡ (d)
(4.19)
(δx )2
1−λ
A≡ (4.20)
1+λ
λ
B≡ , (4.21)
2(1 + λ)
we can cast the algebraic equations into a more convenient form for implementa-
tion:
u i,(d),
j
k+1
= Au (d), k
i, j + B u (d), k+1
i−1, j + u (d), k
i+1, j + u (d), k+1
i, j−1 + u (d), k
i, j+1 (4.22)
(d), k+1 (d), k (d), k (d), k+1 (d), k (d), k+1
vi, j = Avi, j + B vi−1, j + vi+1, j + vi, j−1 + vi, j+1 (4.23)
The equations above are, in principle, implicit (since we have some remaining terms
on the right-hand side (RHS), which refer to variables at time step k + 1). The “trick”
is to notice that, if we adopt a particular order of node updates for u (d) (progressively
advancing grid nodes from time k to k + 1), the missing terms will be available “for
free”, since they correspond to locations which were already brought to the new time
step. The same observation holds for v (d) too, only that the reverse node ordering
has to be used. These aspects will be important for our later Fortran implementation.
The computational costs (in terms of memory and computing time), as well as the
accuracy of the numerical solution, are dictated by the choice of the discretization
(d) (d)
parameters (δx and δt ). In programming practice, it is convenient to refer to the
(d) (d)
integer-valued parameters N x = 1/δx and Nt = 1/δt , representing the number
of discrete space intervals necessary for representing a characteristic length, and
similarly for time.2 Unlike most explicit methods, implicit methods such as the
one we use here have the advantage of remaining stable, for any combination of
positive N x and Nt . However, the (transient) numerical solution may not be physically
(d)
meaningful if the discrete time step δt is too large. A safe choice (see Barakat and
Clark [1]) is:
(d)
δt = (δx(d) )2 ⇐⇒ Nt = N x2 (4.24)
2 Note that the number of discrete points used for representing the characteristic length is actually
N x + 1; similarly, there are Nt + 1 time steps (including the initial state at t = 0) for simulating
the evolution of the system during a characteristic time duration.
4.1 Heat Diffusion 123
With this piece of “infrastructure” out of the way, we can develop a strategy for
structuring our program implementation. Here, we propose a simple decomposition,
consisting of a Config and a Solver type, each in its own module. Motivating
this decomposition is the principle of decoupling parts of the program which are
expected to undergo future modifications. In our case, it is sensible to decouple
the code which reads the simulation parameters from the actual solver, because
we will extend each of these components in Chap. 5, for demonstrating additional
techniques (the Config type will be improved with namelist-support, while the
Solver type will be extended with rudimentary support for parallel processing).
In a large application, such a separation of the physical problem formulation from
the numerical method would also open the possibility of switching solvers if this
becomes necessary at a later stage in the project (in this case, it would probably also
prove useful to further partition the Solver type itself).
Config type: From the FD-method discussed above, we can identify
several parameters that are relevant to our solution. Because we will demonstrate
124 4 Applications
two methods for reading these parameters into the program, it is useful to group
this data in one place, as a distinct type. To avoid implementing too many meth-
ods of SET/GET-variety, we will leave the variables in the DT public.3 There is
no need for type-bound procedures in this DT, but it would be useful to provide
a custom constructor, to initialize a Config-instance from a file on disk (see file
Chapter4/config_file_formatted.in in the source code repository).
The declarations part of the corresponding module and the procedure interfaces are
shown below:
module Config_class
use N u m e r i c K i n d s
i m p l i c i t none
private
type , p u b l i c : : C o n f i g
real ( RK ) : : m D i f f u s i v i t y = 1.15 E -6 _RK , & ! s a n d s t o n e
! NOTE : " p h y s i c a l " u n i t s here ( C e l s i u s )
mTempA = 100. _RK , &
mTempB = 75. _RK , &
mTempC = 50. _RK , &
mTempD = 25. _RK , &
m S i d e L e n g t h = 30. _RK
i n t e g e r ( IK ) : : mNx = 200 ! # of points for square side - length
end type C o n f i g
contains
type ( C o n f i g ) f u n c t i o n c r e a t e C o n f i g ( c f g F i l e P a t h )
c h a r a c t e r ( len =*) , i n t e n t ( in ) : : c f g F i l e P a t h
integer : : cfgFileID
open ( n e w u n i t = cfgFileID , file = trim ( c f g F i l e P a t h ) , s t a t u s = ’ old ’ , a c t i o n = ’ read ’ )
close ( cfgFileID )
end f u n c t i o n c r e a t e C o n f i g
end m o d u l e C o n f i g _ c l a s s
Solver type: This type forms the core of our solution. As data members, it encap-
sulates an object of type Config, from which some other variables are evaluated:
• mNt (representing Nt ) is assigned a value (i.e. Nt = N x2 ) in the Solver type,
since such constraints are usually specific to the numerical algorithm used (a
different method would most probably have different limitations)
• mDx and mDt are the discretization parameters, inversely proportional to N x
and Nt
• mA and mB are pre-factors (defined in Eqs. (4.19)–(4.21)) for expressing the
algorithm more concisely
• mNumItersMax represents the total number of algorithm iterations to be per-
formed
Also as data members, we have two dynamic arrays ( mU and mV ), which will
hold the state of the two FD sub-solutions. To simplify things, we will not implement
3 This does breach the OOP idea of encapsulation, but not in a significant way here, since this DT
is essentially part of the implementation of the Solver DT (which does not expose it further).
4.1 Heat Diffusion 125
a custom constructor for this type.4 Finally, mCurrIter keeps track of the current
iteration number, to serve as documentation for the simulation output.
The methods for Solver, can be grouped into public and private ones:
• public: As far as the user code is concerned, it would be reasonable to add
methods for: (1) initializing a Solver based on a user-specified file containing
parameters (delegating the actual reading of the file to the data member of type
Config), (2) performing the time marching and (3) writing an output file (whose
name is to be specified by the user). To facilitate debugging, we also add a method
to inquire the temperature at a certain position.
Note that in this interface part of the abstract data type (ADT) we did not mention
too many details specific to the actual numerical method used in the Solver—the
only time when the user of this module interacts with the details of the method is
while creating the configuration file (specifically, through the choice of N x ). This
practice of keeping the interface as generic as possible is very natural with OOP,
and can lead to more maintainable programs. Additionally, it allows a potential
user of our data types to easily switch to a new Solver, if one becomes available.5
• private: To implement the method for time marching (run-subroutine), we
can define two private-methods in the Solver DT, each one responsible for
updating one of our two sub-solution fields. Also, we add a final (destructor)
method, to demonstrate how these can be bound to the type.6
The declarations part of the corresponding module, procedure interfaces, and
some parts of the implementations are shown below:
module Solver_class
use N u m e r i c K i n d s
use C o n f i g _ c l a s s
i m p l i c i t none
private
type , p u b l i c : : S o l v e r
p r i v a t e ! Hide internal - data from users .
type ( C o n f i g ) : : m C o n f i g
real ( RK ) : : mNt , & ! # of i t e r a t i o n s to s i m u l a t e a c h a r a c t e r i s t i c time
mDx , mDt , mA , mB ! C o n f i g u r a t i o n - d e p e n d e n t f a c t o r s .
real ( RK ) , a l l o c a t a b l e , d i m e n s i o n (: ,:) : : mU , mV ! main work - a r r a y s
i n t e g e r ( IK ) : : m N u m I t e r s M a x , m C u r r I t e r = 0
contains
p r i v a t e ! By default , hide m e t h o d s ( and e x p o s e as n e e d e d ).
procedure , p u b l i c : : init
procedure , p u b l i c : : run
procedure , p u b l i c : : w r i t e A s c i i
procedure , p u b l i c : : g e t T e m p
! I n t e r n a l m e t h o d s ( users don ’ t need to know about these ).
procedure : : advanceU
procedure : : advanceV
! final : : c l e a n u p ! NOTE : may need to comment - out for g f o r t r a n !
end type S o l v e r
contains
s u b r o u t i n e init ( this , c f g F i l e P a t h , s i m T i m e ) ! i n i t i a l i z a t i o n s u b r o u t i n e
c l a s s ( S o l v e r ) , i n t e n t ( i n o u t ) : : this
c h a r a c t e r ( len =*) , i n t e n t ( in ) : : c f g F i l e P a t h
4 A custom constructor could lead to unnecessary copying of the allocatable arrays (which
can often be large) when initializing the object with assignments between Solver-instances. This
problem could be circumvented in principle by making mU and mV pointers, but it would also
obscure the present discussion.
5 Equivalently, it allows the implementer of the Solver type to improve the internal implementa-
(notably— gfortran-4.8 ).
126 4 Applications
real ( RK ) , i n t e n t ( in ) : : s i m T i m e
this % mV = this % mU
end s u b r o u t i n e init
s u b r o u t i n e a d v a n c e U ( this )
c l a s s ( S o l v e r ) , i n t e n t ( i n o u t ) : : this
i n t e g e r ( IK ) : : i , j ! local v a r i a b l e s
! actual update for ’mU ’ - field ( NE - ward )
do j =1 , this % m C o n f i g % mNx -1 ! do NOT u p d a t e
do i =1 , this % m C o n f i g % mNx -1 ! b o u n d a r i e s
this % mU ( i , j ) = this % mA * this % mU ( i , j ) + this % mB *( &
this % mU ( i -1 , j ) + this % mU ( i +1 , j ) + this % mU ( i , j -1) + this % mU ( i , j +1) )
end do
end do
end s u b r o u t i n e a d v a n c e U
s u b r o u t i n e a d v a n c e V ( this ) ! s i m i l a r to ’ advanceU ’
c l a s s ( S o l v e r ) , i n t e n t ( i n o u t ) : : this
! .....................................................
end do
end s u b r o u t i n e a d v a n c e V
! destructor method
s u b r o u t i n e c l e a n u p ( this )
! ’ class ’ -> ’ type ’ ( dummy - arg c a n n o t be p o l y m o r p h i c for final p r o c e d u r e s )
type ( S o l v e r ) , i n t e n t ( i n o u t ) : : this
! in this version , we only d e a l l o c a t e m e m o r y
d e a l l o c a t e ( this % mU , this % mV )
end s u b r o u t i n e c l e a n u p
end m o d u l e S o l v e r _ c l a s s
Note that subroutines advanceU and advanceV scan the domain in opposite
directories, updating their corresponding arrays “in-place” (i.e. while the update is
in progress, these arrays will contain values at both time step n and n+1 ).
With the types defined above, we can write a very compact main-program:
program solve_heat_diffusion_v1
use N u m e r i c K i n d s
use S o l v e r _ c l a s s
i m p l i c i t none
type ( S o l v e r ) : : s q u a r e
real ( RK ) : : s i m T i m e = 0.1 ! no . of c h a r a c t e r i s t i c time - i n t e r v a l s to s i m u l a t e
call s q u a r e % w r i t e A s c i i ( o u t p u t F i l e )
end p r o g r a m s o l v e _ h e a t _ d i f f u s i o n _ v 1
θ[° C]
30
100
95
90
25 90
85
80
80
20
75 70
y [m]
15
70 60
65
50
10
60
40
55
5
50
45 30
40
35 30
0
0 5 10 15 20 25
x [m]
Fig. 4.2 Plot of the numerical solution for the transient heat diffusion equation at t (d) = 0.1, using
the method of Barakat and Clark [1]
A sample solution, obtained with the parameter values listed in the declara-
tion of the Config type and with the main-program shown above and time step
dictated by Eq. (4.24) is given in Fig. 4.2 (the plot was produced with the script
src/Chapter4/plotHeatDiffSoln.R , also in the source code repository).
Exercise 14 (Robust code with error checking) In the interest of clarity, the
example presented in this section did not include error checking for the cases
when the input and output files cannot be opened, or when the configuration
data cannot be read, due to mistakes in the input file. Extend our example (or
your modified version produced for Exercise 13), to increase the robustness
128 4 Applications
We will revisit this example in Sect. 5.2.1 (to improve the methodology for spec-
ifying input parameters), and in Sect. 5.3.5 (to show how to improve performance
slightly with parallelization).
Here we provide a brief overview of an inter-hemispheric box model of the deep ocean
circulation to study the feedbacks in the climate system. The model is based on an
ocean box model of the Atlantic Ocean [25] coupled to an energy balance model of
the atmosphere [17, 21]. The inter-hemispheric box model consists of four oceanic
and three atmospheric boxes, as indicated in Fig. 4.3. The ocean boxes represent
the Atlantic Ocean from 80◦ N to 60◦ S, with a width of 80◦ (assumed constant).
The indices of the temperatures T , the salinities S, the surface heat fluxes H , the
atmospheric heat fluxes F, the radiation terms R as well as later on the volumes bear
on the different boxes ( N for the northern, M for the tropical, D for the deep
and S for the southern box).
For simplicity, the discrete boxes are assumed to be homogeneous, i.e. tempera-
tures and the salinities everywhere within one box are alike. The climate model is
based on mass and energy considerations. Emphasis is placed on the overturning
flow Φ of the ocean circulation.
4.2 Climate Box Model 129
dz2 Ocean
Φ Φ
oc
SD , TDoc
The prognostic equations for the temperatures of the ocean boxes consist of two
parts. The first part is proportional to the overturning flow Φ and represents the
advective (transport) coupling between the boxes. The second part, which is depen-
dent on the surface heat flux H , stands for the coupling between the ocean and the
atmosphere. This latter part is missing for the deep box, which is not connected to
the atmosphere. The four differential equations for the ocean temperatures read:
d oc Φ HS
T = − TSoc − TDoc + , (4.25)
dt S VS ρ0 c p dz 2
d oc Φ HM
T = − TMoc − TSoc + , (4.26)
dt M VM ρ0 c p dz 1
d oc Φ HN
T = − TNoc − TMoc + and (4.27)
dt N VN ρ0 c p dz 2
d oc Φ
T = − TDoc − TNoc (4.28)
dt D VD
where ρ0 denotes a reference density for saltwater and c p the specific heat capacity
of water. The depths of the discrete ocean boxes were chosen as dz M = dz 1 =
600 m and dz N = dz S = dz 2 = 4,000 m; volumes of the boxes are denoted by
Vi, i∈{N ,M,S,D} .
The overturning flow is assumed to be proportional to the density gradients of
the oceanic boxes after Stommel [26]. Like in Rahmstorf [23] the northern and the
130 4 Applications
southern box will be taken into account for this, which leads to the equation for the
calculation of the overturning flow:
Φ = c −α TNoc − TSoc + β S Noc − SSoc (4.29)
where the constants α and β represent the thermal and the haline expansion
coefficients in the equation of state; c is an adjustable parameter which is set to
produce present day overturning rates.
The surface heat fluxes follow the equations from Haney [15]:
Hi = Q 1,i − Q 2,i Tioc − Tiat , i ∈ {S, M, N } (4.30)
where Q 1,i and Q 2,i are tuning parameters for the surface heat fluxes (a pair for each
atmosphere box).
Analogously to Eqs. (4.27), (4.28), the prognostic differential equations for the
salinities consist of two components. One of those is again the advective part, caused
by the interconnection between the boxes and the other one quantifies the effects of
the freshwater fluxes between the ocean and the atmosphere. The latter is again only
for the boxes near the surface, thus the equations are:
d oc
oc Φ (P − E) S
S = − SSoc − S D − Sref , (4.31)
dt S VS dz 2
d oc oc Φ (P − E) M
S = − SM − SSoc + Sref , (4.32)
dt M VM dz 1
d oc
oc Φ (P − E) N
S = − S Noc − S M − Sref , (4.33)
dt N VN dz 2
d oc oc Φ
S = − SD − S Noc . (4.34)
dt D VD
The reference salinity Sref is a characteristic average value for the entire Atlantic
Ocean, and the freshwater fluxes are denoted as precipitation minus evaporation
(P − E). These freshwater fluxes are calculated by the divergence of the latent heat
transport in the atmosphere and are assumed to be proportional to the meridional
moisture gradient explained below.
The atmospheric EBM calculates the heat fluxes between the ocean and
atmosphere, as well as horizontal latent and sensible heat transports as diffusion
following Chen et al. [5]. The EBM contains sensible and latent heat transports,
radiation Ri , as well as the surface heat fluxes Hi between the atmosphere and the
ocean. The atmospheric temperatures Tiat follow the prognostic equations:
4.2 Climate Box Model 131
d at ∂ FSs + FSl
c2 TS = + R S − HS , (4.35)
dt ∂y
s
d at ∂ FS + FNs + FSl + FNl
c2 TM = − + R M − HM , (4.36)
dt ∂y
d ∂ FNs + FNl
c2 TNat = + R N − HN . (4.37)
dt ∂y
where c2 is related to the specific heat of air. The sensible Fis and latent Fil heat
transport are described in dependence of the meridional gradient of the surface tem-
perature and moisture qis , respectively:
∂Tiat
Fis = K s (4.38)
∂y
∂q s
Fil = K l i . (4.39)
∂y
The model calculates the freshwater fluxes from the divergence of the latent heat
transport, assuming a proportionality of the form:
The theoretical model presented above ultimately leads to a system of coupled cou-
pled ordinary differential equations (ODEs). Since the spatial dependence was incor-
porated into the choice and properties of the discrete model boxes, we only need to
consider dependence on time. Specifically, we need to choose a discretization for
the time derivatives in the left-hand side (LHS) of the governing Eqs. (4.25)–(4.28),
(4.31)–(4.37). To keep the example simple, we use the Euler forward scheme7 for
7 For Exercise 17, we briefly discuss how to extend the code with a more accurate integration
scheme.
132 4 Applications
d
X i (t) = f i (X(t)) (4.42)
dt
is discretized as:
X ik+1 − X ik
≈ f i (Xk ), (4.43)
δt
and X i denotes any of these variables, and f i —the accompanying expression on the
RHS of the evolution equations.
In the model, we use a time step δt = 1/100 yr , to ensure the stability of the
system according to the Courant-Friedrichs-Levy (CFL) criterion [8].8
real ( RK ) , p a r a m e t e r : : &
R H O _ S E A _ W A T E R = 1025. , & ! [ kg / m ^3]
! .............................
W I D T H _ A T L A N T I C = 80. ! l a t e r a l span of the A t l a n t i c [ d e g r e e s of l o n g i t u d e ]
end m o d u l e P h y s i c s C o n s t a n t s
Since we do not expect these to change, we do not need to create a separate type
to encapsulate these constants, so a plain module should suffice.
module ModelConstants: In addition to the physics constants, several model-
dependent parameters will also appear, to control aspects of numerics (e.g. time
step), or for tuning parameterizations used by the model. Normally, these should
be encapsulated into a separate type, to allow reading them from a file. However, to
simplify things, we do not show this here, and create another plain module instead9 :
module ModelConstants
use N u m e r i c K i n d s
use P h y s i c s C o n s t a n t s
i m p l i c i t none
public
real ( RK ) , p a r a m e t e r : : &
N O _ Y E A R S = 10000. , & ! total s i m u l a t i o n time [ yr ]
D T _ I N _ Y E A R S = 1./100. , & ! time - step [ yr ]
DTS = D T _ I N _ Y E A R S * S E C O N D S _ I N _ Y E A R , & ! time - step [ s ]
! tuning - p a r a m e t e r s for s u r f a c e heat - f l u x e s
Q1_S = 10. , Q2_S = 50. , &
! .............................
! v o l u m e s of the ocean boxes
V_S = A R E A _ S * DZ2 , V_M = A R E A _ M * DZ1 , V_N = A R E A _ N * DZ2 , V_D = A R E A _ D *( DZ2 - DZ1 )
integer , p a r a m e t e r : : &
N O _ T _ S T E P = int ( N O _ Y E A R S / D T _ I N _ Y E A R S ) , & ! n u m b e r of model - i t e r a t i o n s
O U T P U T _ F R E Q U E N C Y = 100
end m o d u l e M o d e l C o n s t a n t s
contains
! C o n v e r t r a d i a n s to d e g r e e s
real ( RK ) f u n c t i o n r a d 2 D e g ( r a d i a n s )
real ( RK ) , i n t e n t ( in ) : : r a d i a n s
rad2Deg = radians / ONE_DEG_IN_RADS
end f u n c t i o n r a d 2 D e g
! C o n v e r t d e g r e e s to r a d i a n s
real ( RK ) f u n c t i o n d e g 2 R a d ( d e g r e e s )
real ( RK ) , i n t e n t ( in ) : : d e g r e e s
deg2Rad = degrees * ONE_DEG_IN_RADS
end f u n c t i o n d e g 2 R a d
9 See Sect. 4.1.3 for an example of encapsulating configuration data, and Sect. 5.2.1 for improving
that further by using a namelist.
134 4 Applications
end m o d u l e G e o m U t i l s
ModelState type: From the description in the previous section, note that the
model is conceptually just a system of coupled ODEs. We define the abstract data
type (ADT) ModelState as a container for the state vector. We only add a few
“methods” for this new type:
• getCurrModelState returns the model state for writing output.
• preventOceanFreezing prevents the model from entering physical regimes
beyond its scope. In particular, our model does not account for potential sea ice
formation. Therefore, if ocean temperatures decrease below the freezing point
temperature, we issue a warning and bring them back to this value.10
• computePhi calculates the intensity of the overturning circulation. As for the
previous procedure, we issue a warning if the overturning circulation seems to be
reversed, since the model is not designed for that situation (same comment applies
here also).
For a more expressive formulation of the numerical scheme (later, in the
main program), we make extensive use of operator overloading (procedures
scalarTimesModelState , modelStateTimesScalar , addModel
States and subtractModelStates 11 ).
As “free” subroutines for the new ADT, we have newModelState , which
constructs a new instance of the type (based on ICs of the model), dQSdT , which
computes the slope of the saturation vapor pressure and, finally, dModelState .
This last procedure is particularly important, since it encodes the actual physics
of our model (the RHS of the evolution equations). The procedure also returns
a ModelState-instance, representing the rate of change of the model state.
Here, the fact that the procedure is not type-bound (and requires a ModelState
input argument), since this facilitates evaluation of the rate of change at fractional
10 Note that, for production code, it would probably be a better idea to stop the program altogether
if such an exceptional situation occurs, to rule-out any misinterpretations (for various reasons, the
warnings may not reach the user).
11 The numerical schemes we use do not actually need this last subroutine, but we include it since
type : : M o d e l S t a t e
real ( RK ) : : TocS , TocM , TocN , TocD , SocS , SocM , SocN , SocD , TatS , TatM , TatN
contains
procedure , p u b l i c : : g e t C u r r M o d e l S t a t e
procedure , p u b l i c : : p r e v e n t O c e a n F r e e z i n g
procedure , p u b l i c : : c o m p u t e P h i
end type M o d e l S t a t e
interface ModelState
module procedure newModelState
end i n t e r f a c e M o d e l S t a t e
i n t e r f a c e o p e r a t o r (*)
module procedure scalarTimesModelState
module procedure modelStateTimesScalar
end i n t e r f a c e o p e r a t o r (*)
i n t e r f a c e o p e r a t o r (+)
module procedure addModelStates
end i n t e r f a c e o p e r a t o r (+)
i n t e r f a c e o p e r a t o r ( -)
module procedure subtractModelStates
end i n t e r f a c e o p e r a t o r ( -)
contains
! C a l c u l a t e the slope of s a t u r a t i o n vapor p r e s s u r e w . r . t t e m p e r a t u r e
! ( see R o g e r s and Yau , Cloud Physics , 1976 , p .16).
! NOTE : Unlike the other p r o c e d u r e s in the module , this one is not type - bound .
real ( RK ) f u n c t i o n dQSdT ( Tc )
real ( RK ) , i n t e n t ( in ) : : Tc
real ( RK ) : : p , ex , sat
p = 1000.
ex = 17.67 * Tc /( Tc + 243.5 )
sat = 6.112 * exp ( ex )
dQSdT = 243.5 * 17.67 * sat / ( Tc + 2 4 3 . 5 ) * * 2 * 0.622 / p
end f u n c t i o n dQSdT
phi = C *( - ALPHA *( this % TocN - this % TocS ) + BETA *( this % SocN - this % SocS ))
if ( phi < 0. ) then
phi =0. ! p r e v e n t r e v e r s a l of c i r c u l a t i o n
write ( error_unit , ’( a ) ’) " W a r n i n g : r e v e r s a l of c i r c u l a t i o n d e t e c t e d ! "
end if
136 4 Applications
end f u n c t i o n c o m p u t e P h i
f u n c t i o n g e t C u r r M o d e l S t a t e ( this ) r e s u l t ( res )
class ( M o d e l S t a t e ) , i n t e n t ( in ) : : this
real ( RK ) : : res (13)
! local vars
real ( RK ) : : t e m p G l o b a l
res = [ tempGlobal , this % TocS , this % TocM , this % TocN , this % TocD , &
this % SocS , this % SocM , this % SocN , this % SocD , &
this % TatS , this % TatM , this % TatN , &
this % c o m p u t e P h i ()*1. E -6 ] ! units are t r a n s f o r m e d to [ Sv ]
end f u n c t i o n g e t C u r r M o d e l S t a t e
! P h y s i c s is e n c o d e d here ( i . e . RHS of e v o l u t i o n e q u a t i o n s )
type ( M o d e l S t a t e ) f u n c t i o n d M o d e l S t a t e ( old )
type ( M o d e l S t a t e ) , i n t e n t ( in ) : : old
real ( RK ) : : F30S , F45N , phi , Tat30S , Tat45N , FsS , FsN , FlS , FlN , hS , hM , hN , &
fwFaS , fwFaN , rS , rM , rN , midLatS , midLatM , m i d L a t N
phi = old % c o m p u t e P h i ()
d M o d e l S t a t e % TatM = ( &
-( cosD (30. _RK )* F30S + cosD (45. _RK )* F45N ) / ( R_E *( sinD (30. _RK )+ sinD (45. _RK ))) &
+ rM - FRF_M * hM &
)/( C P _ D R Y _ A I R * B E T A _ M )
d M o d e l S t a t e % TatN = ( &
( cosD (45. _RK )* F45N ) / ( R_E *( sinD (90. _RK ) - sinD (45. _RK ))) &
+ rN - FRF_N * hN &
)/( C P _ D R Y _ A I R * B E T A _ N )
end f u n c t i o n d M o d e l S t a t e
integer : : i , outFileID
type ( M o d e l S t a t e ) : : stateSim1E , s t a t e P e r t u r b a t i o n
! P e r t u r b a t i o n to s u p e r i m p o s e over e q u i l i b r i u m state .
s t a t e P e r t u r b a t i o n = M o d e l S t a t e ( TocS = 0. , TocM = 0. , &
TocN = 0. , TocD = 0. , &
4.2 Climate Box Model 137
s t a t e S i m 1 E = M o d e l S t a t e ( T o c S = 4 . 7 7 7 4 0 4 3 1 , TocM = 2 4 . 4 2 8 7 6 6 2 5 , &
TocN = 2.66810894 , TocD = 2.67598915 , &
SocS = 3 4 . 4 0 7 5 3 5 5 5 , SocM = 3 5 . 6 2 5 8 5 0 6 8 , &
SocN = 3 4 . 9 2 5 1 3 6 5 7 , SocD = 3 4 . 9 1 1 3 0 0 6 6 , &
T a t S = 4 . 6 7 4 3 9 5 5 6 , TatM = 2 3 . 3 0 4 3 7 8 5 1 , TatN = 0 . 9 4 0 6 1 8 2 8 ) + s t a t e P e r t u r b a t i o n
! p r e p a r e for o u t p u t
open ( n e w u n i t = outFileID , file = " b o x _ m o d e l _ e u l e r . out " , &
form = " f o r m a t t e d " , s t a t u s = " r e p l a c e " )
do i =1 , N O _ T _ S T E P
! Euler - f o r w a r d step
s t a t e S i m 1 E = s t a t e S i m 1 E + DTS * d M o d e l S t a t e ( s t a t e S i m 1 E )
call s t a t e S i m 1 E % p r e v e n t O c e a n F r e e z i n g ()
! C o n d i t i o n a l OUTPUT - w r i t i n g
if ( mod (i -1 , O U T P U T _ F R E Q U E N C Y ) == 0 ) then
write ( outFileID , ’(14( ’// R K _ F M T // ’, 1 x )) ’) i * D T _ I N _ Y E A R S , &
s t a t e S i m 1 E % g e t C u r r M o d e l S t a t e ()
end if
end do
close ( o u t F i l e I D ) ! Clean - up for o u t p u t
end p r o g r a m b o x _ m o d e l _ e u l e r
Listing 4.9 Main-program for box model application (see file src/Chapter4/box_modelc
_euler.f90 )
δt
X ik+1 ≈ X ik + [k1 + 2(k2 + k3 ) + k4 ] (4.46)
6
where:
k 1 = f i Xk (4.47)
δt
k2 = f i X + k1
k
(4.48)
2
δt
k3 = f i X + k2
k
(4.49)
2
k 4 = f i X k + δt k 3 (4.50)
Extend the program presented above with this discretization scheme, and
compare the results with those obtained using the previous discretization.
We will revisit this example in Sect. 5.1.2.3 to illustrate how to spread the com-
ponents of the project across distinct files, to make it more modular.
periodic−X domain−wrapping
y θt
g
no−slip walls
L
x θb
W ≈2×L
Fig. 4.4 Geometry for the 2D RB problem (see text for description)
The evolution equations for the incompressible fluid and for the temperature com-
ponent read:
∂β u β = 0 (4.51)
ρ ∂t u γ + u β ∂β u γ = −∂γ p + νρ∂ u γ + ρrγ , ∀γ ∈ {x, y} (4.52)
∂t θ + u β ∂β θ = κ∂ θ (4.53)
ρ = ρ0 [1 − α(θ − θ0 )] , (4.54)
140 4 Applications
θb + θt
θ0 = . (4.55)
2
With these assumptions, and also absorbing the constant part of the gravity force
into the pressure term, Eq. (4.52) becomes:
1
∂t u γ + u β ∂ β u γ = − ∂γ p + ν∂ u γ + αg(θ − θ0 )δγ,y , ∀γ ∈ {x, y} (4.56)
ρ0
– Also, driving the flow is a vertical temperature gradient, imposed through the
temperature BCs (θb > θt ):
Δθ0
θ(x, 0, t) = θb ≡ θ0 + (4.58)
2
Δθ0
θ(x, L , t) = θt ≡ θ0 − (4.59)
2
where we introduced:
Δθ0 ≡ θb − θt (4.60)
ICs
• The velocity field is set identically to zero everywhere initially:
• The initial temperature is given by a linear profile, which matches the values at
the horizontal boundaries:
4.3 Rayleigh-Bénard (RB) Convection in 2D 141
1 y
θ(x, y, 0) = θ0 + Δθ0 − (4.64)
2 L
Just as for the problem discussed in Sect. 4.1, we prefer to re-write the governing
equations in a dimensionless form, instead of solving them directly in the physical
system of units (in terms of meters, seconds, Kelvin, etc.). For the following calcu-
lations, we adopt the same notation conventions as explained in that section. For the
RB problem, it is natural to choose the height of the channel (L) as the characteristic
distance, which suggests the following scaling relations:
x
x (d) ≡ ⇐⇒ x = L x (d) (4.65)
L
Because the flow is initially at rest, we construct a characteristic time scale based
on diffusivity12 and the characteristic length. The resulting scaling relations for time
read:
κ L 2 (d)
t (d) ≡ t ⇐⇒ t = t (4.66)
L2 κ
κ
Similarly, L can be chosen as a characteristic velocity, which leads to:
L κ
u (d)
γ = u γ ⇐⇒ u γ = u (d) , ∀γ ∈ {x, y} (4.67)
κ L γ
The characteristic (modified) pressure difference can be defined based on the
characteristic velocity, i.e. ρ0 Lκ 2 , so that:
2
L2 ρ0 κ2 (d)
p (d) = p ⇐⇒ p = p (4.68)
ρ0 κ2 L2
12 [κ]SI = m2 /s; note that some authors use α instead of κ for denoting the thermal diffusivity.
142 4 Applications
Using the chain rule, the scaling relations for the derivatives can also be obtained:
(d) 1 (d)
∂β = L∂β ⇐⇒ ∂β = ∂ (4.70)
L β
(d) 1 (d)
∂βγ = L 2 ∂βγ ⇐⇒ ∂βγ = 2 ∂βγ (4.71)
L
(d) L2 κ (d)
∂t = ∂t ⇐⇒ ∂t = 2 ∂t (4.72)
κ L
Using these characteristic scales, the equations of conservation for fluid mass
Eq. (4.51), momentum Eq. (4.56) and heat Eq. (4.53) become in the dimensionless
system:
(d) (d)
∂β u β = 0 (4.73)
(d) (d) (d)
∂t u (d)
γ + u β ∂β u (d)
γ = −∂γ(d) p (d) + Pr (d) (d)
∂ uγ + Ra Pr θ (d)
δγ,y , ∀γ ∈ {x, y}
(4.74)
(d) (d) (d)
∂t θ(d) + u β ∂β θ(d) = (d) (d)
∂ θ (4.75)
– temperature:
1
θ(d) (x (d) , 0, t (d) ) = + (4.79)
2
1
θ(d) (x (d) , 1, t (d) ) = − (4.80)
2
• temperature:
1
θ(d) (x (d) , y (d) , 0) = − y (d) (4.84)
2
• pressure (chosen so that the pressure and buoyancy terms in Eq. (4.74) are in
balance initially):
Ra Pr (d)
p (d) (x (d) , y (d) , 0) = y (1 − y (d) ) (4.85)
2
It is interesting to note that the dynamical behaviour of the system is determined
by the coefficients Pr and Ra (see Eqs. (4.76) and (4.77)). For example, two geomet-
rically similar setups A and B with Δθ0,A L 3A = Δθ0,B L 3B have identical solutions
in the dimensionless system (the flows are said to be dynamically similar).13
As already mentioned, a possible state of the system corresponds to the fluid being
at rest, with the temperature field undergoing pure diffusion (which results in a linear
temperature profile, depending on the y-coordinate only). Linear stability theory (see
Tritton [30] and references therein) predicts that this solution is realized as long as
the Rayleigh number is
when the boundary conditions at both horizontal walls are of no-slip, and when
for free-slip. For higher values of the Rayleigh number, convection sets in. Remark-
ably, the values of Racrit. for the initial transition is independent of the Prandtl
number (which only plays a role after convective motion emerges).
13 In athermal flows, there is an additional degree of freedom, because we can actually use a different
fluid (i.e. change the viscosity) in each setup. This fact is often used in experimental fluid dynamics,
to replace a large-scale flow system with one which fits within the scales of the laboratory or wind
tunnel—as long as the dimensionless numbers are the same, the setups are equivalent in principle.
For the RB setup, however, it is more difficult to exploit this degree of freedom, because of the
requirement for having the same Pr value.
144 4 Applications
where ci is the discretized velocity associated with particle species i, and Ω is the
collision operator. Remarkably, solutions to various macroscopic equations can be
recovered numerically by choosing suitable sets of discretized velocities and the
collision operators. The solver we summarize below (following [31]) provides two
examples of such choices, as the fluid and temperature equations are solved on two
separate lattices.
14Even more precisely, the two-relaxation-times (TRT) subset of the MRT family is used, as some
parameters are fixed.
4.3 Rayleigh-Bénard (RB) Convection in 2D 145
Notation Conventions
In the presentation of the model below we use the following conventions:
• Prescripts F or T indicate to indicate that we refer to the fluid or tem-
perature solvers respectively, when confusion may occur.
• Compared to the heat diffusion example from Sect. 4.1, we introduce an
additional system of units (the numerical system; here—LBM). We will
discuss in Sect. 4.3.4 how this is related to the dimensionless system of
units. The superscript (n) denotes quantities in the numerical system.
• Finally, superscripts † and −1 denote the matrix transpose and inverse
operations, respectively.
(n) (n)
f i (x (n) +F ci δt , t (n) + δt ) = f i (x (n) , t (n) ) − Mαβ
−1
Sβγ [mγ − meq
γ ] (4.89)
stream relax moments
collide
where {i, α, β, γ} ∈ {0, . . . , 8} and repeated Greek subscripts imply again summa-
tion.
There are 9 discretized velocities15 for this model:
F 0 1 0 −1 0 1 −1 −1 1
c0 , . . . ,F c8 = c(n) (4.90)
0 0 1 0 −1 1 1 −1 −1
where the basic lattice speed c(n) —the same for the fluid and temperature solvers—is
defined in terms of the discrete lattice spacing δx(n) and time step δt(n) :
(n)
δx
c(n) ≡ (n)
(4.91)
δt
15 This particular topology is known as D2Q9 in the literature. The generic notation is Dd Qq,
where d stands for the dimensionality of the lattice, and q for the number of particle species.
16 The rows of M̃ represent orthogonal polynomials of the discretized velocities (see, e.g. Bouzidi
m = M̃ f (4.92)
For the specific fluid solver chosen here, the transformation matrix reads:
⎛ ⎞
1 1 1 1 1 1 1 1 1
⎜ 0 1 0 −1 0 1 −1 −1 1⎟
⎜ ⎟
⎜ 0 0 1 0 −1 1 1 −1 −1⎟
⎜ ⎟
⎜−4 −1 −1 −1 −1 2 2 2 2⎟
⎜ ⎟
M̃ = ⎜
⎜ 0 1 −1 1 −1 0 0 0 0⎟ ⎟ (4.93)
⎜ 0 0 0 0 0 1 −1 1 −1⎟
⎜ ⎟
⎜ 0 −2 0 2 0 1 −1 −1 1⎟
⎜ ⎟
⎝ 0 0 −2 0 2 1 1 −1 −1⎠
4 −2 −2 −2 −2 1 1 1 1
For the specific fluid model considered here, the components of m are:
m 0 = δρ(n)
eq
(4.96)
ρ(n) (n)
eq
m1 = 0 ux (4.97)
ρ(n) (n)
eq
m2 = 0 uy (4.98)
(n)
= −2δρ(n) + 3ρ0 (u (n) (n) 2
eq
m3 x ) 2
+ (u y ) (4.99)
eq (n) (n) 2 (n) 2
m4 = ρ0 (u x ) − (u y ) (4.100)
(n)
m 6 = −ρ0 u (n)
eq
x (4.102)
(n)
−ρ0 u (n)
eq
m7 = y (4.103)
δρ(n) − 3ρ(n) (u (n) (n) 2
eq
m8 = 0 x ) + (u y )
2
(4.104)
where the macroscopic variables are evaluated from the local PDFs:
(n) (n)
8
(n) (n)
ρ = ρ0 + δρ ≡ ρ0 + fi (4.105)
i=0
1 F
8
u (n)
γ = (n) ci,γ f i , with γ ∈ {x, y} (4.106)
ρ0 i=0
(n)
where for convenience we choose ρ0 = 1 (reference density in the numerical system
of units).
Finally, the relaxation matrix S̃ is a diagonal matrix:
S̃ = diag(0, 1, 1, se , sν , sν , sq , sq , s ) (4.107)
S̃ = diag(0, 1, 1, sν , sν , sν , sq , sq , sν ) (4.108)
where the adjustable parameter sν determines the kinematic viscosity of the model:
1 1 1
ν (n) = − (4.109)
3 sν 2
For physical reasons, we need sν and sq ∈ [0, 2) (see Wang et al. [31] and Ginzburg
and d’Humieres [12]).
Body forces: Strictly speaking, the evolution Eq. (4.89) only applies when there is
no body force acting on the fluid. For simulating convective flows, as intended here,
it would normally be necessary to extend this equation, by adding some correction
terms to the RHS (e.g. as discussed by Guo et al. [14]). However, with the LBM-
MRT class of models, the force can be added in a more natural way, directly to the
corresponding moment (in our case—m 2 because the gravitational acceleration is
148 4 Applications
along the y-coordinate). To recover the Navier-Stokes equations with a body force
with 2nd-order accuracy, the force contribution to this moment is added in two
stages, before and after the collision. This procedure is known as “Strang splitting”
(see Dellar [11] and references therein for details).
boundary conditions (BCs): Our setup demands two types of BCs for the fluid solver:
• periodic: these can be easily enforced for our simple geometry directly at the
implementation level, by constraining the streaming of PDFs along the y-direction
using a modulo operation.
• no-slip: these are traditionally implemented in LBM via a procedure known as
“bounce-back”. In this approach, each post-collision PDF that would be moved
to a solid node by normal streaming is instead copied to the local node (where it
originated from a collision operation), but with the opposite orientation. This can
be expressed mathematically as:
where the overline is used to denote the discrete vector with opposite orientation:
−F ci =F ci (4.112)
and x (n)
f is the position of the fluid node adjacent to the solid boundary.
This simple procedure effectively realizes a no-slip wall halfway between the fluid
node and the neighbouring wall node.17 Also, since no PDFs are “lost” or “gained”,
the approach conserves the total mass in the system.
The model for solving the temperature advection-diffusion Eq. (4.53) is very similar
to the one above corresponding to the fluid equations. However, a lattice with lower
connectivity18 is sufficient, because the temperature equations do not involve higher-
order quantities like the stress tensor (which appears in the fluid equations). The
evolution equations for the temperature PDFs read:
(n) (n)
gi (x (n) +T ci δt , t (n) + δt ) = gi (x (n) , t (n) ) − Nαβ
−1
Q βγ [nγ − neq
γ ] (4.113)
stream relax moments
collide
17 This displacement of the boundary relative to the last fluid node needs to be taken into account
during the initialization and postprocessing stages.
18 Specifically, we use a D2Q5 lattice for temperature, while at least D2Q9 was necessary for the
fluid.
4.3 Rayleigh-Bénard (RB) Convection in 2D 149
where {i, α, β, γ} ∈ {0, . . . , 4} and repeated Greek subscripts imply summation over
the fictitious temperature particles.
The 5 discretized velocities for the model are:
T 0 1 0 −1 0
c0 , . . . ,T c4 = c(n) (4.114)
0 0 1 0 −1
where we use the same conventions as for the fluid solver above (i.e. c(n) = δx(n) /δt(n) ,
with δx(n) = 1 and δt(n) = 1 for simplicity).
The local vector of moments n is recovered from the temperature PDFs through
the linear transformation Ñ :
n = Ñ g (4.115)
where:
⎛ ⎞
1 1 1 1 1
⎜ 0 1 0 −1 0⎟
⎜ ⎟
Ñ = ⎜
⎜ 0 0 1 0 −1⎟ ⎟ (4.116)
⎝−4 1 1 1 1⎠
0 1 −1 1 −1
As for the analogue fluid matrix M̃, the product Ñ · Ñ † is a diagonal matrix, which
simplifies the calculation of its inverse.
The equilibrium moments for temperature are defined as:
where:
• the macroscopic temperature is evaluated from the PDFs:
4
θ(n) = gi (4.118)
i=0
Q̃ = diag(0, σκ , σκ , σe , σν ) (4.119)
The parameter σκ , together with the parameter a from Eq. (4.117), determine the
thermal diffusivity of the model:
150 4 Applications
4+a 1 1
κ(n) = − (4.120)
10 σκ 2
As for the fluid model, stability and accuracy considerations restrict the possible
values of the parameters σi and a. One particular choice, suitable for flows where
the
Q̃ = diag(0, σκ , σκ , σν , σν ) (4.121)
where for reasons of accuracy and stability (see Wang et al. [31] for details) the
relaxation matrix is fixed:
√
• σκ = 3 −√ 3
• σν = 2(2 3 − 3)
and the thermal diffusivity of the model is instead determined by the parameter a:
√
(n) 3
κ = (4 + a), with −4<a <1 (4.122)
60
boundary conditions (BCs): Our setup requires two types of BCs for the temperature
solver:
• periodic: this BC, necessary for the vertical walls, is again achieved at the imple-
mentation level, by “folding” the y-direction for the streaming step.
• constant temperature: several methods exist for imposing a constant temperature
at the walls. One difficulty [16] is that maintaining a constant temperature also
requires no heat conduction along the boundaries. To satisfy this condition with
2nd-order accuracy, we use the same scheme as Wang et al. [31], consisting of a
procedure known as “anti-bounce-back” procedure. Mathematically, this reads:
While in Sect. 4.1.2 we discretized our heat diffusion problem directly in the
dimensionless system of units, for the current problem we work with yet another
system of units—the numerical system. This is beneficial here, since the LBM algo-
rithm requires several constraints on the parameters to hold, as discussed earlier.
However, these are not connected (at least not in an obvious manner) to the actual
physics in the system, and it helps to draw a distinction between the system in which
4.3 Rayleigh-Bénard (RB) Convection in 2D 151
xβ(n) − 1
1
xβ(d) ⇔ xβ(n) =
+ N y xβ(d)
2
= (4.124)
Ny 2
(d) (n) (n) 1 (d)
∂β = N y ∂ β ⇔ ∂ β = ∂ (4.125)
Ny β
(d) (n) (n) 1 (d)
∂βγ = N y2 ∂βγ ⇐⇒ ∂βγ = 2 ∂βγ (4.126)
Ny
1 (n)
t (d) = t ⇐⇒ t (n) = Nt t (d) (4.127)
Nt
(d) (n) (n) 1 (d)
∂t = Nt ∂t ⇐⇒ ∂t = ∂ (4.128)
Nt t
(d) Nt (n) (n) N y (d)
uβ = u β ⇐⇒ u β = u (4.129)
Ny Nt β
2
1 Nt 2 (n) Ny
p (d) = δρ ⇐⇒ δρ = 3 (n)
p (d) (4.130)
3 Ny Nt
θ(d) = θ(n) (4.131)
1
p (n) = (δρ)(n) (4.132)
3
to translate directly between dimensionless pressure and the solver’s numerical den-
sity anomalies.
Plugging the equations above into the dimensionless governing equations, we
obtain the following expressions for the model parameters:
Pr N y2
ν (n) = (4.133)
Nt
Ra Pr N y
(αg)(n) = (4.134)
Nt2
152 4 Applications
N y2
κ(n) = (4.135)
Nt
To complete the formulation of the problem in the numerical system, we have the
following BCs for temperature:
1 1
θ x (n) , , t (n) = + (4.136)
2 2
1 1
θ x (n) , N y + , t (n) = − , (4.137)
2 2
and the following initial profiles for temperature and density anomaly:
1 1
θ(n) (x (n) , y (n) , 0) = − (2y (n) − 1) (4.138)
2 2N y
3(αg)(n) 1 1
(δρ)(n) (x (n) , y (n) , 0) = y (n) − N y + − y (n) (4.139)
2N y 2 2
As for the heat diffusion solver described in Sect. 4.1, we construct our implemen-
tation around the OOP methodology. However, we organize the solution differently
here, because of the increased complexity of the numerical algorithm, and to illus-
trate some additional techniques. As for the other case studies, we describe the main
entities below (see file lbm2d_mrt_rb_v1.f90 for the complete code).
module NumericKinds: Even more than in our previous example application
(Sect. 4.2), the range and accuracy of the numeric types used in our fluid solver
is crucial. Therefore, we use the same mechanisms as before, to allow convenient
and reliable selection of the precision of the variables. As a small enhancement of
this module for this application, we provide appropriate’swap’-subroutines, grouped
under a generic interface:
module NumericKinds
i m p l i c i t none
i n t e r f a c e swap ! g e n e r i c I F A C E
module procedure swapRealRK , swapIntIK
4.3 Rayleigh-Bénard (RB) Convection in 2D 153
end i n t e r f a c e swap
contains
end m o d u l e N u m e r i c K i n d s
type : : M r t S o l v e r B o u s s i n e s q 2 D
private
! p a r a m e t e r s for the a l g o r i t h m ( not bound to the RB - setup )
real ( RK ) : : mAlphaG , mAParam , m V i s c o s i t y , m D i f f u s i v i t y , &
mTempColdWall , mTempHotWall , &
! for relaxation - matrices , we store only the non - zero part (= d i a g o n a l s )
m R e l a x V e c F l u i d (0:8) , m R e l a x V e c T e m p (0:4)
! i n t e r n a l model arrays
! NOTES : - last d i m e n s i o n is for 2 - l a t t i c e a l t e r n a t i o n
! - 1 st d i m e n s i o n : 0 -8 = fluid , 9 -13 = temp DFs
real ( RK ) , d i m e n s i o n (: ,: ,: ,:) , a l l o c a t a b l e : : mDFs
! raw m o m e n t s from which we can c o m p u t e m a c r o s c o p i c f i e l d s ; this is used
! mainly for simulation - output
! 0 ~ p r e s s u r e | 1 ~ uX | 2 ~ uY | 3 ~ temp
real ( RK ) , d i m e n s i o n (: ,: ,:) , a l l o c a t a b l e : : m R a w M a c r o s
contains
private
procedure , public : : init = > i n i t M r t S o l v e r B o u s s i n e s q 2 D
procedure , public : : advanceTime => advanceTimeMrtSolverBoussinesq2D
procedure , public : : cleanup => cleanupMrtSolverBoussinesq2D
procedure , public : : getRawMacros => getRawMacrosMrtSolverBoussinesq2D
! internal methods
19 Note, however, that the boundary conditions are hard-coded into the solver, for sim-
plicity. To simulate a problem with different BCs, it is necessary to modify the procedure
advanceTimeMrtSolverBoussinesq2D (which implements the actual LBM-dynamics).
The interested reader may remove this hard-coding by adding “mask”-arrays, which classify the
different types of nodes (e.g. bulk, no-slip, etc. for the fluid component and bulk, constant temper-
ature, and adiabatic for the temperature component). Also, the code for enforcing these different
types of BCs can be further isolated into distinct procedures, or even into different classes (useful
for BC-algorithms which also need to hold some own data, e.g. the temperature at the boundary).
154 4 Applications
procedure : : calcLocalMomsMrtSolverBoussinesq2D
procedure : : calcLocalEqMomsMrtSolverBoussinesq2D
end type M r t S o l v e r B o u s s i n e s q 2 D
contains
f u n c t i o n g e t R a w M a c r o s M r t S o l v e r B o u s s i n e s q 2 D ( this ) r e s u l t ( m a c r o s )
class ( M r t S o l v e r B o u s s i n e s q 2 D ) , i n t e n t ( in ) : : this
real ( RK ) , d i m e n s i o n ( this % mNx , this % mNy , 0:3) : : m a c r o s
macros = this % mRawMacros
end f u n c t i o n g e t R a w M a c r o s M r t S o l v e r B o u s s i n e s q 2 D
s u b r o u t i n e i n i t M r t S o l v e r B o u s s i n e s q 2 D ( this , nX , nY , t e m p C o l d W a l l , t e m p H o t W a l l , &
viscosity , diffusivity , alphaG , aParam , r e l a x V e c F l u i d , r e l a x V e c T e m p )
class ( M r t S o l v e r B o u s s i n e s q 2 D ) , i n t e n t ( inout ) : : this
real ( RK ) , i n t e n t ( in ) : : t e m p C o l d W a l l , t e m p H o t W a l l , &
viscosity , diffusivity , alphaG , aParam , &
r e l a x V e c F l u i d (0:8) , r e l a x V e c T e m p (0:4)
i n t e g e r ( IK ) , i n t e n t ( in ) : : nX , nY
i n t e g e r ( IK ) : : x , y , i ! dummy vars
i n t e g e r ( IK ) , d i m e n s i o n (0:1) : : dest
! t e m p o r a r y moments - vars
real ( RK ) : : f l u i d M o m s (0:8) , t e m p M o m s (0:4) , t e m p P e r t u r b a t i o n
! copy argument - v a l u e s i n t e r n a l l y
this % mNx = nX ; this % mNy = nY
this % m T e m p C o l d W a l l = t e m p C o l d W a l l ; this % m T e m p H o t W a l l = t e m p H o t W a l l
this % m V i s c o s i t y = v i s c o s i t y ; this % m D i f f u s i v i t y = d i f f u s i v i t y
this % m A l p h a G = a l p h a G ; this % m A P a r a m = a P a r a m
this % m R e l a x V e c F l u i d = r e l a x V e c F l u i d ; this % m R e l a x V e c T e m p = r e l a x V e c T e m p
! initialize
this % mDFs = 0. _RK
this % m R a w M a c r o s = 0. _RK
! ... temp
do i =0 , 4
dest (0) = mod ( x + E V _ T E M P (1 , i )+ this % mNx -1 , this % mNx )+1
dest (1) = y + E V _ T E M P (2 , i )
if ( ( dest (1) == 0) . or . ( dest (1) == this % mNy +1) ) then
this % mDFs ( i +9 , dest (0) , dest (1) , this % mOld ) = &
this % mDFs ( i +9 , x , y , this % mOld )
end if
end do
! save ICs
this % m R a w M a c r o s ( x , y , :) = [ f l u i d M o m s (0:2) , t e m p M o m s (0) ]
end do
end do
end s u b r o u t i n e i n i t M r t S o l v e r B o u s s i n e s q 2 D
s u b r o u t i n e c a l c L o c a l M o m s M r t S o l v e r B o u s s i n e s q 2 D ( this , x , y , fluidMoms , t e m p M o m s )
class ( M r t S o l v e r B o u s s i n e s q 2 D ) , i n t e n t ( in ) : : this
i n t e g e r ( IK ) , i n t e n t ( in ) : : x , y
4.3 Rayleigh-Bénard (RB) Convection in 2D 155
s u b r o u t i n e c a l c L o c a l E q M o m s M r t S o l v e r B o u s s i n e s q 2 D ( this , &
dRho , uX , uY , temp , f l u i d E q M o m s , t e m p E q M o m s )
class ( M r t S o l v e r B o u s s i n e s q 2 D ) , i n t e n t ( inout ) : : this
real ( RK ) , i n t e n t ( in ) : : dRho , uX , uY , temp
real ( RK ) , i n t e n t ( out ) : : f l u i d E q M o m s (0:8) , t e m p E q M o m s (0:4)
! .............................
end s u b r o u t i n e c a l c L o c a l E q M o m s M r t S o l v e r B o u s s i n e s q 2 D
! initializations
dest = 0; f l u i d M o m s = 0. _RK ; t e m p M o m s = 0. _RK
f l u i d E q M o m s = 0. _RK ; t e m p E q M o m s = 0. _RK
old = this % mOld ; new = this % mNew
do y =1 , this % mNy
do x =1 , this % mNx
call this % c a l c L o c a l M o m s M r t S o l v e r B o u s s i n e s q 2 D (x , y , fluidMoms , t e m p M o m s )
! save m o m e n t s r e l a t e d to o u t p u t
this % m R a w M a c r o s ( x , y , :) = &
[ f l u i d M o m s (0) , f l u i d M o m s (1) , f l u i d M o m s (2) , t e m p M o m s (0) ]
! c o l l i s i o n ( in moment - space )
f l u i d M o m s = f l u i d M o m s - this % m R e l a x V e c F l u i d * ( f l u i d M o m s - f l u i d E q M o m s )
tempMoms = tempMoms - this % m R e l a x V e c T e m p * ( t e m p M o m s - t e m p E q M o m s )
s u b r o u t i n e c l e a n u p M r t S o l v e r B o u s s i n e s q 2 D ( this )
class ( M r t S o l v e r B o u s s i n e s q 2 D ) , i n t e n t ( inout ) : : this
d e a l l o c a t e ( this % mDFs , this % m R a w M a c r o s ) ! r e l e a s e m e m o r y
end s u b r o u t i n e c l e a n u p M r t S o l v e r B o u s s i n e s q 2 D
end m o d u l e M r t S o l v e r B o u s s i n e s q 2 D _ c l a s s
! Fixed simulation - p a r a m e t e r s
real ( RK ) , p a r a m e t e r : : &
! To allow the 1 st i n s t a b i l i t y to develop , the aspect - ratio needs to be a
! m u l t i p l e of k2πC , where kC = 3.117 ( see [ S h a n 1 9 9 7 ]).
A S P E C T _ R A T I O = 2*2.0158 , &
! See [ W a n g 2 0 1 3 ] for j u s t i f i c a t i o n of these p a r a m e t e r s .
S I G M A _ K = 3. _RK - sqrt (3. _RK ) , &
S I G M A _ N U _ E = 2. _RK * (2. _RK * sqrt (3. _RK ) - 3. _RK ) , &
T E M P _ C O L D _ W A L L = -0.5 , T E M P _ H O T _ W A L L = +0.5
type : : R B e n a r d S i m u l a t i o n
private
i n t e g e r ( IK ) : : mNx , mNy , & ! l a t t i c e size
mNumIters1CharTime , mNumItersMax , &
m N u m O u t S l i c e s ! user - s e t t i n g
type ( M r t S o l v e r B o u s s i n e s q 2 D ) : : m S o l v e r ! a s s o c i a t e d s o l v e r ...
type ( O u t p u t A s c i i ) : : m O u t S i n k ! ... and output - w r i t e r
contains
private
procedure , p u b l i c : : init = > i n i t R B e n a r d S i m u l a t i o n
procedure , p u b l i c : : run = > r u n R B e n a r d S i m u l a t i o n
procedure , p u b l i c : : c l e a n u p = > c l e a n u p R B e n a r d S i m u l a t i o n
end type R B e n a r d S i m u l a t i o n
contains
s u b r o u t i n e i n i t R B e n a r d S i m u l a t i o n ( this , Ra , Pr , nY , simTime , maxMach , &
numOutSlices , outFilePrefix )
c l a s s ( R B e n a r d S i m u l a t i o n ) , i n t e n t ( out ) : : this
real ( RK ) , i n t e n t ( in ) : : Ra , Pr , simTime , m a x M a c h
i n t e g e r ( IK ) , i n t e n t ( in ) : : nY , n u m O u t S l i c e s
c h a r a c t e r ( len =*) , i n t e n t ( in ) : : o u t F i l e P r e f i x
! .............................
end s u b r o u t i n e i n i t R B e n a r d S i m u l a t i o n
s u b r o u t i n e r u n R B e n a r d S i m u l a t i o n ( this )
c l a s s ( R B e n a r d S i m u l a t i o n ) , i n t e n t ( i n o u t ) : : this
i n t e g e r ( IK ) : : c u r r I t e r N u m ! dummy index
real ( RK ) : : tic , toc ! for performance - r e p o r t i n g
call this % m S o l v e r % a d v a n c e T i m e ()
s u b r o u t i n e c l e a n u p R B e n a r d S i m u l a t i o n ( this )
c l a s s ( R B e n a r d S i m u l a t i o n ) , i n t e n t ( i n o u t ) : : this
call this % m S o l v e r % c l e a n u p ()
call this % m O u t S i n k % c l e a n u p ()
4.3 Rayleigh-Bénard (RB) Convection in 2D 157
end s u b r o u t i n e c l e a n u p R B e n a r d S i m u l a t i o n
end m o d u l e R B e n a r d S i m u l a t i o n _ c l a s s
! string - c o n s t a n t s for o u t p u t m e t a d a t a
c h a r a c t e r ( len =*) , p a r a m e t e r : : U N I T S _ S T R = " units " , & ! for global - a t t r i b u t e
S P A C E _ U N I T S _ S T R = " char . length " , T I M E _ U N I T S _ S T R = " char . time " , &
P R E S S _ U N I T S _ S T R = " char . pressure - d i f f e r e n c e " , &
V E L _ U N I T S _ S T R = " char . v e l o c i t y " , T E M P _ U N I T S _ S T R = " char . temperature - d i f f e r e n c e "
type : : O u t p u t B a s e
real ( RK ) : : m U y M a x
c h a r a c t e r ( len =256) : : m O u t F i l e P r e f i x
! i n f o r m a t i o n about the s i m u l a t i o n
i n t e g e r ( IK ) : : mNx , mNy , m N u m O u t S l i c e s , m N u m I t e r s M a x , mOutDelay , m O u t I n t e r v
real ( RK ) : : mDxD , mDtD , mRa , mPr , m M a x M a c h
i n t e g e r ( IK ) : : m C u r r O u t S l i c e ! for t r a c k i n g o u t p u t time - s l i c e s
if ( n u m O u t S l i c e s < 0 ) then
this % m N u m O u t S l i c e s = n I t e r s M a x + 1 ! write e v e r y t h i n g
else
this % m N u m O u t S l i c e s = n u m O u t S l i c e s
end if
! conversion - f a c t o r s for o u t p u t
this % m V e l S o l v e r 2 V e l D i m l e s s = dxD / dtD
this % m D R h o S o l v e r 2 P r e s s D i m l e s s = this % m V e l S o l v e r 2 V e l D i m l e s s **2 / 3. _RK
this % m C u r r O u t S l i c e = 0
else
write (* , ’( a ) ’) " INFO : no file - output , due to c h o s e n ’ n u m O u t S l i c e s ’ "
end if
end s u b r o u t i n e i n i t O u t p u t B a s e
s u b r o u t i n e c l e a n u p O u t p u t B a s e ( this )
class ( O u t p u t B a s e ) , i n t e n t ( inout ) : : this
if ( this % i s A c t i v e () ) then
d e a l l o c a t e ( this % mXVals , this % mYVals , this % m T V a l s )
end if
end s u b r o u t i n e c l e a n u p O u t p u t B a s e
l o g i c a l f u n c t i o n i s A c t i v e O u t p u t B a s e ( this )
class ( O u t p u t B a s e ) , i n t e n t ( in ) : : this
i s A c t i v e O u t p u t B a s e = ( this % m N u m O u t S l i c e s > 0 )
end f u n c t i o n i s A c t i v e O u t p u t B a s e
if ( this % i s A c t i v e () ) then
i s T i m e T o W r i t e O u t p u t B a s e = ( i t e r N u m ==0) . or . ( &
( this % m N u m O u t S l i c e s /= 1) . and . &
( i t e r N u m >= this % m O u t D e l a y ) . and . &
( mod ( this % m N u m I t e r s M a x - iterNum , this % m O u t I n t e r v ) == 0) )
else
i s T i m e T o W r i t e O u t p u t B a s e = . false .
end if
end f u n c t i o n i s T i m e T o W r i t e O u t p u t B a s e
end m o d u l e O u t p u t B a s e _ c l a s s
From this we derive the child types, which handle the peculiarities of each output
format. In this version of the program, the only child is OutputAscii:
4.3 Rayleigh-Bénard (RB) Convection in 2D 159
module OutputAscii_class
use N u m e r i c K i n d s , only : IK , RK , R K _ F M T
use O u t p u t B a s e _ c l a s s
i m p l i c i t none
type , e x t e n d s ( O u t p u t B a s e ) : : O u t p u t A s c i i
private
i n t e g e r ( IK ) : : m S u m m a r y F i l e U n i t , m T e m p F i l e U n i t
c h a r a c t e r ( len =256) : : m S u m m a r y F i l e N a m e , m T e m p F i l e N a m e , &
mFmtStrngFieldFileNames
contains
private
! p u b l i c m e t h o d s w h ich differ from base - c l a s s a n a l o g u e s
procedure , p u b l i c : : init = > i n i t O u t p u t A s c i i
procedure , p u b l i c : : w r i t e O u t p u t = > w r i t e O u t p u t A s c i i
procedure , p u b l i c : : c l e a n u p = > c l e a n u p O u t p u t A s c i i
! internal method (s)
procedure writeSummaryFileHeaderOutputAscii
end type O u t p u t A s c i i c o n t a i n s s u b r o u t i n e i n i t O u t p u t A s c i i ( this , nX ,
nY , n u m O u t S l i c e s , dxD , dtD , &
nItersMax , o u t F i l e P r e f i x , Ra , Pr , m a x M a c h )
class ( O u t p u t A s c i i ) , i n t e n t ( inout ) : : this
i n t e g e r ( IK ) , i n t e n t ( in ) : : nX , nY , nItersMax , n u m O u t S l i c e s
real ( RK ) , i n t e n t ( in ) : : dxD , dtD , Ra , Pr , m a x M a c h
c h a r a c t e r ( len =*) , i n t e n t ( in ) : : o u t F i l e P r e f i x
! .............................
! local
s u b r o u t i n e w r i t e S u m m a r y F i l e H e a d e r O u t p u t A s c i i ( this , f i l e U n i t )
class ( O u t p u t A s c i i ) , i n t e n t ( inout ) : : this
i n t e g e r ( IK ) , i n t e n t ( in ) : : f i l e U n i t
! .............................
! local
s u b r o u t i n e w r i t e O u t p u t A s c i i ( this , rawMacros , ite r N u m )
class ( O u t p u t A s c i i ) , i n t e n t ( inout ) : : this
real ( RK ) , d i m e n s i o n (: , : , 0:) , i n t e n t ( in ) : : r a w M a c r o s
i n t e g e r ( IK ) , i n t e n t ( in ) : : i t e r N u m
! .............................
! local vars
s u b r o u t i n e c l e a n u p O u t p u t A s c i i ( this )
class ( O u t p u t A s c i i ) , i n t e n t ( inout ) : : this
if ( this % i s A c t i v e () ) then
write ( this % m S u m m a r y F i l e U n i t , ’( a ) ’) " # UyMax "
close ( this % m S u m m a r y F i l e U n i t )
call this % O u t p u t B a s e % c l e a n u p ()
end if
end s u b r o u t i n e c l e a n u p O u t p u t A s c i i
end m o d u l e O u t p u t A s c i i _ c l a s s
Listing 4.14 src/Chapter4/lbm2d_mrt_rb_v1.f90 (excerpt)
An instance of the simulation type is declared and then initialized. The arguments
in the initialization call have the following meaning:
• Ra , Pr —dimensionless numbers which determine the dynamics of the system
160 4 Applications
• nY —number of nodes used to simulate the channel’s height; the width is then
computed automatically, based on a pre-defined aspect ratio (declared in module
RBenardSimulation_class)
• simTime —the total time to be simulated by the solver, in multiples of the
reference time
• maxMach —this can be interpreted here as just another model parameter, which
controls another source of model errors (compressibility error); it can be decreased,
to improve the accuracy of the results (the corresponding error term is proportional
to the square of this value)
• numOutSlices —number of time steps to appear in the output files
1. num Out Slices < 0 causes all time steps to be written to disk
2. num Out Slices = 0 suppresses output
3. num Out Slices > 0 results in num Out Slices being written (including one
time slice for the ICs of the simulation).
In Fig. 4.5 we present a sample temperature contour plot, for Ra = 1,900 and
Pr = 7.1 (when the flow is already unstable). The plot was produced with the R-scrips
plotFieldFromAscii.R (also available in the source code repository).
Fig. 4.5 Sample numerical solution for the RB problem, for Ra = 1,900 and Pr = 7.1; the upper
plot is a visualization of the temperature profile at t (d) = 15, while the lower plot shows the evolution
of the maximum vertical velocity in the simulation over the period t (d) ∈ [0, 15]
4.3 Rayleigh-Bénard (RB) Convection in 2D 161
References
1. Barakat, H.Z., Clark, J.A.: On the solution of the diffusion equations by numerical methods.
J. Heat Transf. 88(4), 421–427 (1966)
2. Bouzidi, M., d’Humieres, D., Lallemand, P., Luo, L.S.: Lattice Boltzmann equation on a
two-dimensional rectangular grid. J. Comput. Phys. 172(2), 704–717 (2001)
3. Buckingham, E.: On physically similar systems; illustrations of the use of dimensional
equations. Phys. Rev. 4(4), 345–376 (1914)
4. Budyko, M.I.: The effect of solar radiation variations on the climate of the Earth. Tellus 21A(5),
611–619 (1969)
5. Chen, D., Gerdes, R., Lohmann, G.: A 1-D atmospheric energy balance model developed for
ocean modelling. Theor. Appl. Climatol. 51(1–2), 25–38 (1995)
6. Chen, S., Doolen, G.D.: Lattice Boltzmann method for fluid flows. Annu. Rev. Fluid Mech.
30(1), 329–364 (1998)
7. Chopard, B., Droz, M.: Cellular Automata Modelling of Physical Systems. Cambridge Uni-
versity Press, Cambridge (1998)
8. Courant, R., Friedrichs, K., Lewy, H.: Über die partiellen Differenzengleichungen der mathe-
matischen Physik. Math. Ann. 100(1), 32–74 (1928)
9. Courant, R., Friedrichs, K., Lewy, H.: On the partial difference equations of mathematical
physics. IBM J. Res. Dev. 11(2), 215–234 (1967)
10. Dellar, P.J.: Incompressible limits of lattice Boltzmann equations using multiple relaxation
times. J. Comput. Phys. 190(2), 351–370 (2003)
11. Dellar, P.J.: An interpretation and derivation of the lattice Boltzmann method using Strang
splitting. Comput. Math. Appl. 65(2), 129–141 (2013)
12. Ginzburg, I., d’Humieres, D.: Multireflection boundary conditions for lattice Boltzmann mod-
els. Phys. Rev. E 68(6), 066614 (2003)
13. Guo, Z., Shu, C.: Lattice Boltzmann Method and Its Applications in Engineering. World Sci-
entific Publishing Co., Singapore (2013)
162 4 Applications
14. Guo, Z., Zheng, C., Shi, B.: Discrete lattice effects on the forcing term in the lattice Boltzmann
method. Phys. Rev. E 65(4), 046308 (2002)
15. Haney, R.L.: Surface thermal boundary condition for ocean circulation models. J. Phys.
Oceanogr. 1(4), 241–248 (1971)
16. Kuo, L., Chen, P.: Numerical implementation of thermal boundary conditions in the lattice
Boltzmann method. Int. J. Heat Mass Transf. 52(1–2), 529–532 (2009)
17. Lohmann, G., Gerdes, R., Chen, D.: Stability of the thermohaline circulation in a simple coupled
model. Tellus 48A(3), 465–476 (1996)
18. Mohamad, A.A.: Lattice Boltzmann Method: Fundamentals and Engineering Applications with
Computer Codes. Springer, London (2011)
19. Nakamura, M., Stone, P.H., Marotzke, J.: Destabilization of the thermohaline circulation by
atmospheric eddy transports. J. Clim. 7(12), 1870–1882 (1994)
20. Pletcher, R.H., Tannehill, J.C., Anderson, D.: Computational Fluid Mechanics and Heat
Transfer. CRC Press, Boca Raton (2012)
21. Prange, M., Lohmann, G., Gerdes, R.: Sensitivity of the thermohaline circulation for different
climates—investigations with a simple atmosphere-ocean model. Paleoclimates 2(1), 71–99
(1997)
22. Press, W.H., Teukolsky, S.A., Vetterlin, W.T., Flannery, B.P.: Numerical Recipes in Fortran 77,
2nd Edn. Volume 1: The Art of Scientific Computing. Cambridge University Press (1992). also
available as http://apps.nrbook.com/fortran/index.html
23. Rahmstorf, S.: On the freshwater forcing and transport of the Atlantic thermohaline circulation.
Clim. Dyn. 12(12), 799–811 (1996)
24. Robinson, J.A.: Software Design for Engineers and Scientists. Elsevier, United Kingdom (2004)
25. Rooth, C.: Hydrology and ocean circulation. Prog. Oceanogr. 2(11), 131–149 (1982)
26. Stommel, H.: Thermohaline convection with two stable regimes of flow. Tellus 13A(2), 224–
230 (1961)
27. Strang, G.: Computational Science and Engineering. Wellesley-Cambridge Press, Wellesley
(2007)
28. Succi, S.: The Lattice Boltzmann Equation: for Fluid Dynamics and Beyond. Oxford University
Press, Oxford [u.a.] (2001)
29. Sukop, M.C., Thorne, D.T.: Lattice Boltzmann Modeling: An Introduction for Geoscientists
and Engineers. Springer, Berlin [u.a.] (2006)
30. Tritton, D.J.: Physical Fluid Dynamics. Clarendon Press; Oxford University Press, Oxford
[England]; New York (1988)
31. Wang, J., Wang, D., Lallemand, P., Luo, L.S.: Lattice Boltzmann simulations of thermal con-
vective flows in two dimensions. Comput. Math. Appl. 65(2), 262–286 (2013)
32. Wolf-Gladrow, D.A.: Lattice-gas Cellular Automata and Lattice Boltzmann Models: An Intro-
duction. Springer, New York (2000)
33. Yu, D., Mei, R., Luo, L.S., Shyy, W.: Viscous flow computations with the method of lattice
Boltzmann equation. Prog. Aerosp. Sci. 39(5), 329–367 (2003)
Chapter 5
More Advanced Techniques
In this chapter, we introduce several techniques and tools (build systems, more
efficient I/O, parallelization, etc.), which are commonly used in ESS applications.
Most of these concepts are also relevant for other programming languages. With
such a large list of topics, it is clearly impractical to be comprehensive. Nonetheless,
through the examples, we hope to provide the reader with a reasonable overview of
how these facilities can be used, and some intuition about how they can be combined.
Each of the examples provided so far consisted of a single source file, which contained
the code for the main-program and for any accompanying modules and procedures.
To obtain the final executable, we simply compiled the file manually (see Sect. 1.3).
While this approach is often acceptable for small test programs, it becomes incon-
venient for large applications. A separation of the code into several files (potentially
arranged into a multi-level directory hierarchy) is preferred instead, for a variety of
reasons:
• a single file would become too large to comprehend—multiple files can improve
readability when they are used to demarcate sub-components of the application
(especially when using OOP)
• code reuse (within the application and across multiple application), as well as
collaboration in teams are greatly simplified
• in combination with a software build system (and with some planning), this
approach can prevent compilation times from increasing too much
A price to be paid for these benefits, however, is a more complex compilation
process: whereas in the previous examples we could let the compiler handle trans-
parently the compilation and linking stages, with the multiple source file approach
the programmer needs to be aware of the intermediate object files, libraries, etc. We
briefly review these topics in the next section.
© Springer-Verlag Berlin Heidelberg 2015 163
D.B. Chirila and G. Lohmann, Introduction to Modern Fortran
for the Earth System Sciences, DOI 10.1007/978-3-642-37009-0_5
164 5 More Advanced Techniques
DISCLAIMER
Unfortunately, the procedures for creating/using object files and libraries are
not standardized—this is what makes portable software development with com-
piled languages difficult. We recommend to check the documentation of your
OS and of your compiler, and adjust the steps in this section accordingly. For
brevity, we focus mostly on the Linux system, with the gfortran compiler.
We already mentioned object files in Sect. 1.3. Each of these files (with extension .o
in Linux and OSX and .obj in windows) contains the machine code generated
from the corresponding source code file (after compilation and assembly), but without
any code from other libraries. With the GNU compilers, these files are created by
passing the −c option at the compilation command line. For example, assuming we
have three source code files named util1.f90 , util2.f90 and main.f90
(where the first two contain modules and/or procedures which are used in the third
one, where the main-program resides), we can produce the corresponding object files
with:
$ g f o r t r a n - c u t i l 1 . f90 # p r o d u c e s o b j e c t file util1 . o
$ g f o r t r a n - c u t i l 2 . f90 # util2 . o
$ g f o r t r a n - c main . f90 # main . o
Assuming that these are the only object files in our application, we can link them
into an executable. This step is also initiated by the compiler (when invoked with the
−o <executable_name> option1 ):
$ g f o r t r a n - o main main . o util1 . o util2 . o
1 This flag can actually be omitted, in which case the executable name would default to a.out
(not too informative).
5.1 Multiple Source Files and Software Build Systems 165
and C++ (i.e. helping the compiler to check interfaces). However, a significant
difference is that .mod files are generated automatically, and usually not
portable between different compilers (or even between different releases of
the same compiler). Therefore, it is best to avoid mixing code obtained with
different compilers. This implies that, when switching compilers, we have to
re-compile not only our program, but also the libraries which our code uses. It is
often necessary in such cases to tell each compiler where it can find the .mod
files for its corresponding version of a compiled library. Many compilers allow
this with the −I<path_to_mod_files> directive; this needs to be used
in addition to the −L compiler option, which will be discussed below.
Check the man-page of ar for more information about the command line options,
and about the other operations which are possible.
There are two equivalent methods for making the newly created static library
available to the linker when our final executable is created. The first method (useful
mostly when the library is used only internally within a project2 ) consists of simply
adding the library name to the list of files to be passed to the linker:
$ g f o r t r a n - o main main . o l i b u t i l . a
The second method (handy for libraries needed by many applications) consists of
two sub-steps:
1. If necessary, the directory where the library resides is added (using the
−L<path_to_dir> option) to the list of directories where the linker will
search for libraries. This step may not be necessary if the static library was
2 The term convenience library is also commonly used to denote such a scenario.
166 5 More Advanced Techniques
installed in standard path. For our current example, we would add the current
directory, so the option would become —L$PWD .
2. Also as an option to the linker, the name of the library is added, by using
the −l<abbrev_lib_name> . Here, <abbrev_lib_name> stands for the
name of the library file, from which the lib in the front and the extension
( .a for static libraries) are dropped. In our example, based on the library name
libutil.a the option to be passed to the linker becomes −lutil .
Combined, the second method for presenting libraries to the linker becomes for our
three-files project:
$ g f o r t r a n - o main main . o -L$PWD - l u t i l
This second approach is recommended when using libraries installed in system fold-
ers, where the linker would search by default. Also, if a shared library with that name
is found by the linker, it will use that,3 for efficiency reasons.
When an object file or a static library (using either of the methods above) is made
available to the linker, it will select the entities (procedures, data, etc.) which the
application refers to and just copy them inside the executable. The executable and
the object/library files can then go their separate ways—for example, we could delete
the libraries and we would still be able to execute the program. This decreases the
dependencies on external packages, which is particularly useful when the application
is distributed/deployed to users in binary form.
However, there are also some scenarios where this type of libraries are not well-
suited, due to some serious disadvantages. Let us assume that you have written a
very useful library, and that many developers want to use it in their own programs.
However, making the library static brings along some disadvantages:
• From an efficiency point of view, static libraries are plagued by duplication of code,
which shows up in various places. Depending on the size of the library and on the
number of programs using it, any of the following issues can become significant:
– If users commonly install several programs which use your library, the code
from your library will be duplicated several times on disk.
– In addition, if several programs which use your library are running at the same
time, the same duplication will appear in memory, when the program is executed.
Besides wasting resources again, this can cause various kinds of performance
problems (e.g. applications taking a long time to start, or performing poorly, due
to instruction cache misses). However, to be fair, there are also some situations
where use of static libraries can outperform shared ones (especially when entities
from the library are accessed in time-consuming loops).
3 Many compilers still offer the option to override this behavior, if the developer insists on static
linking; in gfortran, the −static flag can be used for this (or −Wl,−Bstatic and
−Wl,−Bdynamic to toggle static linking on and off for specific libraries).
5.1 Multiple Source Files and Software Build Systems 167
• If you develop an updated version of the library (perhaps to fix a bug or to improve
performance), all the programs using your library have to be re-compiled and re-
distributed to users. This usually makes updates very slow to propagate throughout
the userbase.
To overcome most of the disadvantages discussed above, shared libraries were devel-
oped. Depending on your OS, you may also encounter these named as shared objects,
dynamically linked libraries, frameworks, dynamically linkable libraries, etc. The
details of how these are created and used are (unfortunately) highly OS-specific.
However, the basic idea is the same: instead of copying the library code directly
in the executables, only the names of the libraries that will be needed by the exe-
cutable are recorded inside. When a user eventually tries to run the executable, a
system component known as the dynamic linker will match the libraries needed by
the executable against what is available on the system. This happens before any of
the program’s code is actually executed. If the dynamic linker cannot satisfy all the
requirements of the executable, it will usually cause the entire program to abort, with
an error message.
Shared libraries can solve most of the problems that plagued static libraries, pre-
cisely because they use this extra level of indirection at runtime. Only one copy of
the library is required on disk, no matter how many executables need this code. Also,
assuming we want to run several programs that all use a certain library, the library
code will need to be loaded in memory only for the first program—the OS will then
make this code available4 to the other programs which need it, saving both space in
memory and time (since no re-loading is necessary).5
Creating shared libraries To re-use our three-files example from above, we could
create a shared library (using the GNU compiler) from the files util1.f90 and
util2.f90 in two steps:
1. When creating the initial object files, the −fPIC flag is required by gfortran,
as in:
$ g f o r t r a n - fPIC -c ut i l 1 . f90 # p r o d u c e s o b j e c t file u t i l 1 . o
$ g f o r t r a n - fPIC -c ut i l 2 . f90 # util2 . o
The additional compiler flag enables generation of object code which is said to be
position-independent, which is necessary for enabling true sharing of the library
code (see, e.g., Calcote [3] for details).
4 Only the code is shared—data entities declared by the library are private to each of the programs.
5 We constructed here a positive picture of shared libraries. In practice, things can be “spoiled” by the
potential existence of different versions of the same library (see e.g. Hook [10]). For badly-designed
libraries, these problems can outweight the possible benefits.
168 5 More Advanced Techniques
2. The second step is to create the shared library itself. In contrast to the static
libraries, this step is usually performed through the compiler. For our example,
we can use:
$ g f o r t r a n - s h a r e d - o l i b u t i l . so u t i l 1 . o u t i l 2 . o
There are many other subtleties related to designing, creating and maintaining
shared libraries, which exceed the scope of our basic introduction (for the interested
reader, we suggest Hook [10] as a general discussion, or Kerrisk [11] for Unix-
specific information). Instead, we focus below on how to use shared libraries written
by someone else, since this topic is a common source of frustration (especially for
beginners). This discussion is also relevant for Sect. 5.2.2, which demonstrates how
to work with the netCDF-library (which implements operations on a very popular
data storage format in ESS).
Using shared libraries Two contexts need to be considered when working with a
shared library—the link-time and runtime.
The first phase (at link-time) causes the names of the shared libraries needed by
our application to be recorded within the executable. The syntax for performing this
step is actually almost the same as for static libraries—the only difference is that, if
we choose to specify the full path, we generally need to use different file extensions—
usually .so 6 in Linux, .dylib in OSX or .dll in Windows. If the library
is not in a system-wide path, we can add the appropriate directory to the list of paths
inspected by the linker. For our example, this would lead to the following command
to produce the final executable:
$ g f o r t r a n - o main main . o -L$PWD - l u t i l
The second phase (at runtime) starts when a user issues the command to execute
our application. Even before any code from the application is run, the dynamic linker
will locate all shared libraries needed by our application and let our application know
how it can access them. This step can often fail, especially if the program depends on
shared libraries which are not installed system-wide (in the places where the dynamic
linker usually searches). For example, the executable we produced above may fail to
execute with the error:
$ ./ main
./ main : error w h i l e l o a d i n g s h a r e d l i b r a r i e s : l i b u t i l . so : c a n n o t o p e n s h a r e d
o b j e c t f i l e : No such file or d i r e c t o r y
The dynamic linker does not know where to find the “util” library, and causes the
whole program to abort.
Our readers using Linux may have encountered similar errors with other appli-
cations, although the error can occur on any OS.
Some very useful tools exist for checking which shared libraries are needed by
an executable, and if these would be found by the dynamic linker. On Linux, the
tool to use is ldd . In our case, this would report something like:
$ ldd ./ main
...
l i b u t i l . so = > not f o u n d
...
Here, the —Wl,—rpath,$PWD part will cause the linker to add the working
directory9 to the list of libraries hard-coded within the executable. As long
as these required libraries are not removed, our program will run without any
further interventions.
The −Wl,−−enable−new−dtags relates to issues of priority of library
paths. Without this option, the default (but deprecated) effect on Linux will be
to give higher priority to the paths within the executable. However, the recom-
mended approach nowadays (with this option specified) gives higher priority
to paths in the LD_LIBRARY_PATH environment variable (discussed next).
For example, this allows developers to test a program with a new version of a
library (without re-compiling the program).
7 http://www.dependencywalker.com.
8 Note that in general you will have to replace $PWD to reflect the path of your library.
9 Again, other paths can be specified insted—for example, assuming we have some custom
libraries in /home/my_username/libs , we could use −Wl,−rpath,/home/
my_username/libs .
170 5 More Advanced Techniques
From the previous section, it may be clear to the reader that the build process (includ-
ing creation and use of libraries) can become quite complex for nontrivial projects.
Although there is sometimes educational value in walking through the steps for
building a project (especially when debugging build problems), it would certainly
be a bad use of human resources to type all commands every time a source file is
modified—computers are much better suited for these tasks. Therefore, many tech-
niques and tools were developed to automate this process, as well as other repetitive
tasks which occur in the software development workflow (running of automatic tests,
preparation of final user-installable packages, etc.). In terms of complexity and built-
in functionality, these tools range from simple scripts (see Sect. 5.6.1) to advanced
build systems such as autoconf+automake+libtool,12 CMake, or SCons.
In this section we focus on GNU Make (gmake),13 which is an intermediate-
complexity build system that is sufficient in many cases. Although readers may
eventually use a different build system, some basic familiarity with make is instruc-
tive, since this system encourages thinking explicitly through the basic actions nec-
10 In Windows, the linker will usually also search in the directory where the executable program
resides.
11 Adding paths to this variable in the user’s shell configuration files can cause performance and
essary for creating the final products (whereas other systems may hide some of these
details).
To a first approximation, we can think of make as a program which automatically
constructs some output files from a set of input files, based on some recipes. The
input files are, in most programming projects, the actual source code created by
the programmer, while the output files are often the compiled executable programs.
Using the jargon of make, we call the former (input) dependencies or prerequisites,
and the latter (output) targets. However, to extend this picture, both targets and
dependencies can also be tasks in a more abstract sense, because not all work that is
automated with make fits the file-transformation model.
Since each project is generally unique, it is the task of the programmer to describe
to make how the various entities (source code files, object files, libraries, data files,
executable, etc.) depend on each other. These dependencies are known as rules by
make. Since any entity can take the role of target in one rule and of prerequisite
in another rule,14 it is useful to think of rules as links in one or more directed
acyclic graphs DAGs of dependencies. It is the task of make to construct internal
representations of such graphs and, afterwards, to traverse the links as appropriate
for correctly updating the current target. Of course, to actually perform tangible
actions, there are usually (but not always) some commands which are associated to
specific rules. Here, we should point out that make also has an internal database of
rules, many of which are expressed as generic patterns, so a command may become
associated to a specific rule, even if no command was explicitly specified by the
developer.
To assimilate rules and target inter-dependencies specific to the project, make
searches in the current working directory for a file named GNUmakefile ,
makefile , or Makefile (in that order). For brevity, we will refer to these
collectively as just makefiles. Many projects have a single such file, and this is
also the scenario we will assume in our examples here. However, the situation can
quickly become very complex, especially if the code is spread across several direc-
tories (see Mecklenburg [20] for a make-specific discussion, or Smith [26] for some
useful perspective on these matters).
Assuming there is a makefile in the present working directory, the build process
can be started by simply typing at the command line:
$ make
or, if more control over the targets to build is required, by passing as an argument a
space-separated list of targets, as in:
14 Intermediate object files are a fitting example, since they are created when source code is compiled,
$ make t a r g e t 1 t a r g e t 2
There are many command line options for customizing the behaviour of make in
useful ways—for example:
• If, for some reason, the project uses a non-standard name for the makefile,
which make does not recognize by default, we can point it to the custom file,
using the −f flag. This can happen, for example, when the project needs to be
compiled with multiple toolchains (compilers, linkers, etc.)—a possible make-
based solution to this problem would be to provide a different makefile for
each platform, and let the user select a specific one, as in:
$ make -f M a k e f i l e _ M a c h i n e X _ T o o l c h a i n Y
in the previous phase. After the top-level targets have been selected, make processes
each one, by taking it as the root node of a dependency graph traversal, descending
(recursively) to the leaf nodes, and then making its way back to that root node and
executing the appropriate commands whenever some target is found to be older than
any of its dependencies (as ascertained based on the modification times tracked by
the underlying file system).
For example, if we have an executable my_program which depends on the
file my_program.f90 , make compares the modification times of the two files,
and re-creates the former whenever it finds that it either (a) is older than the source
file or (b) is missing. This criterion causes make to perform the minimal amount of
work necessary to update a target, avoiding a lot of extraneous re-compilation when
only a few files have changed (which is the typical scenario).
5.1.2.3 Example: Using Make with the Climate Box Model Application
15 This is simply a convention, similar to the general recommendation of having one class for each
.h−.cpp pair in C++.
174 5 More Advanced Techniques
18
19 M o d e l C o n s t a n t s . o : M o d e l C o n s t a n t s . f90
20 g f o r t r a n - c M o d e l C o n s t a n t s . f90
21
22 P h y s i c s C o n s t a n t s . o : P h y s i c s C o n s t a n t s . f90
23 g f o r t r a n - c P h y s i c s C o n s t a n t s . f90
24
25 G e o m U t i l s . o : G e o m U t i l s . f90
26 g f o r t r a n - c G e o m U t i l s . f90
27
28 N u m e r i c K i n d s . o : N u m e r i c K i n d s . f90
29 g f o r t r a n - c N u m e r i c K i n d s . f90
30
31 # additional dependencies
32 box_model_euler_main .o ModelState_class .o: NumericKinds .o
33 ModelConstants .o PhysicsConstants .o GeomUtils .o: NumericKinds .o
34
35 box_model_euler_main .o: PhysicsConstants .o ModelConstants .o
36 box_model_euler_main .o: ModelState_class .o
37
38 ModelConstants .o GeomUtils .o: PhysicsConstants .o
39 ModelState_class .o: GeomUtils .o
Listing 5.1 src/Chapter5/BoxModelMultipleFiles/Makefile.v1
(excerpt)
Let us discuss the various elements in this file. To get the less interesting syntax
features out of our way first, note that all text after a hash mark ( # , as in line 32)
is treated as a comment, akin to most Unix shells.16 Another feature in common
with Unix shells is the way text can “spill” over multiple lines, by appending a
backslash ( \ ) at the end of each line to be continued (lines 6–12 above). This is
roughly equivalent to the & -character used for extending a line of code in Fortran.
Although it looks dense, the structure of the file is quite simple: it consists entirely
of what are known as explicit rules (at lines 5–11, 13–14, 16–17, 19–20, 22–23, 25–
26, 28–29, 32, 33, 35–36, 38 and 39). make does not require every rule to have
commands associated with it and, indeed, the last five rules above (lines 32–39) are
like this.
For a rule which does have associated commands (like the second rule, at lines 13–
14), note that the command lines need to be indented with an explicit TAB character
(NOT spaces), to demarcate the commands.17 If this rule is not followed, make will
probably fail to build, often with cryptic errors.
The rules themselves are not surprising: to compile the final executable (first rule,
at lines 5–11), we list all the object files as prerequisites, followed by the command
which invokes the linker (lines 8–11). The next six rules, which specify how to
create each object file, are very similar to each other—only the stem (i.e. filename
without extension) of the filename is changing. Finally, lines 32–39 specify some
additional dependencies, which are mostly dictated in our case by the way Fortran
modules include each other in the various files of our project. In particular, lines
32–33 specify that all other object files depend on NumericKinds.o , since most
of the code depends (directly or indirectly) on this module.
16 An exception to this rule is when the hash is embedded within a command—commands are
passed as they are to the shell (including hashes).
17 This may require additional configuration for some text editors.
5.1 Multiple Source Files and Software Build Systems 175
Finally, note that lines 35–36 have the same target (box_model_euler_
main.o). When make scans our makefile, it will actually combine these two
lines into a single rule. It is often useful (and clearer) to specify a rule with many
prerequisites in several pieces, like this. However, only one of these “sub-rules” can
have commands attached to it, since there should not be more than one way to make
the same target.
Pattern rules, wildcards, and automatic variables As already mentioned, there is
much room for improvement in our previous makefile. Lines 13–29 are a good
“offender” to tackle first. There, the reader may recognize that the same pattern is
repeated six times (only the filename changes). The next version of our makefile
generalizes these rules:
5 box_model_euler : box_model_euler_main .o ModelState_class .o \
6 ModelConstants .o PhysicsConstants .o \
7 GeomUtils .o NumericKinds .o
8 gfortran -o box_model_euler \
9 box_model_euler_main .o ModelState_class .o \
10 ModelConstants .o PhysicsConstants .o \
11 GeomUtils .o NumericKinds .o
12
13 %. o : %. f90
14 g f o r t r a n - c $<
15
16 # additional dependencies
17 box_model_euler_main .o ModelState_class .o: NumericKinds .o
18 ModelConstants .o PhysicsConstants .o GeomUtils .o: NumericKinds .o
19
20 box_model_euler_main .o: PhysicsConstants .o ModelConstants .o
21 box_model_euler_main .o: ModelState_class .o
22
23 ModelConstants .o GeomUtils .o: PhysicsConstants .o
24 ModelState_class .o: GeomUtils .o
Listing 5.2 src/Chapter5/BoxModelMultipleFiles/Makefile.v2
(excerpt)
The new code (lines 13–14 in Listing 5.2) replaces the explicit rules in lines 13–29
of Listing 5.1. In fact, if we would later add Fortran files to our project, the new
rule would be able to build the corresponding object file automatically18 (with the
previous approach, we would need to remember to add yet another explicit rule).
To understand the new code, note that a percent character ( % ) acts as a place-
holder, which matches any number of any characters. When make encounters such a
pattern rule, it will remember it, and try to use it whenever it encounters a target that
it would not know how to build otherwise. Here, our pattern rule “teaches” make how
to produce any object file (with the .o extension) from the corresponding source
code file (with the same stem, but with the .f90 extension), if the latter exists.
Let us analyze now the actual command (line 14), which is executed whenever the
pattern rule matches. From our discussion in Sect. 5.1.1, the gfortran −c part
should look familiar: the compiler is invoked, with the flag for only compiling code,
without linking in anything from external libraries. But which file are we compiling?
18 In case there are exceptions that should not be built like this, make also supports static pat-
tern rules (see Mecklenburg [20])—these are basically pattern rules, with scope restricted to a
certain (user-controllable) list of files.
176 5 More Advanced Techniques
This is specified by the $< part, which is an example of what make calls automatic
variables. These variables are automatically assigned internally by make, whenever
a match of a rule is found, and their scope is restricted to the commands associated
with the rule (if any). make stores into these automatic variables information about
the specific target and prerequisite(s) that the rule matched. This is crucial for writing
generic commands, to be associated with pattern rules.
The specific automatic variable that we used above ( $< ) is expanded in the
command to the filename of the first prerequisite which is, in our case, the Fortran
source code file we wanted to compile. Other interesting automatic variables (see
Mecklenburg [20] or make’s documentation for a comprehensive list) are:
• $@ —name of the current target
• $ˆ —space-separated list of all prerequisites, with duplicates removed
• $+ —same as above, but keeping the duplicates
The dollar sign is actually not part of the name of automatic variables—it is an
operator which expands (“dereferences”) the value of the variable. This syntax also
holds for normal variables, which we will demonstrate soon.
Normal variables Looking at the previous code listing, we notice that there is still
some duplication for the first rule (the names of the object files are written twice in
lines 5–11 of Listing 5.2). We already advocated for reducing code duplication (as
one of the ways to make software more robust), so let us do that here too. As before,
we provide the code first, and explain it later:
6 srcs := b o x _ m o d e l _ e u l e r _ m a i n . f90 M o d e l S t a t e _ c l a s s . f90 \
7 M o d e l C o n s t a n t s . f90 P h y s i c s C o n s t a n t s . f90 \
8 G e o m U t i l s . f90 N u m e r i c K i n d s . f90
9 objs := $( srcs :. f90 =. o )
10 prog := b o x _ m o d e l _ e u l e r
11
12 $( prog ): $( objs )
13 g f o r t r a n - o $@ $^
14
15 %. o : %. f90
16 g f o r t r a n - c $<
17
18 # additional dependencies
19 box_model_euler_main .o ModelState_class .o: NumericKinds .o
20 ModelConstants .o PhysicsConstants .o GeomUtils .o: NumericKinds .o
21
22 box_model_euler_main .o: PhysicsConstants .o ModelConstants .o
23 box_model_euler_main .o: ModelState_class .o
24
25 ModelConstants .o GeomUtils .o: PhysicsConstants .o
26 ModelState_class .o: GeomUtils .o
Listing 5.3 src/Chapter5/BoxModelMultipleFiles/Makefile.v3
(excerpt)
First, in lines 6–8, we instruct make to store a list of all source code files of our
project, into the variable srcs . Note that, unlike Fortran or other languages, make
does not require us to specify the type of variables; indeed, that would be pointless,
since there really is only one type in make, namely character strings (usually con-
taining filenames separated by spaces19 ). When a variable appears in the LHS of an
19 Unfortunately, this can drastically complicate things on Windows, if the file paths/names contain
assignment operator, the name of the variable is written normally. However, when
we want to use (expand) the value of the variable in another place, we usually need
to surround the variable’s name by braces, and precede the resulting construct by a
dollar sign (see, e.g., line 12 above).20
As a slight twist, we did not remove the duplication in the previous makefile
(Listing 5.2) by saving the list of object files into a variable. Instead, we saved in
srcs the list of .f90 files which, being the leafs of the entire dependency graph,
provide a more natural starting point. Then, in line 9 of Listing 5.3, we use a handy
make feature, which “maps” the list of source files onto the list of corresponding
object files ( objs ). Finally, in line 10 we introduce prog , which holds the name
of our final executable.
With these new variables (and using more of the automatic variables), the rule for
linking the executable becomes much more readable (compare lines 12–13 above
with lines 5–11 in Listing 5.2).
Before proceeding with other topics, a few words are in order, regarding the
assignment operators in make. The type of assignment we used above (with the :=
operator) leads to an immediate evaluation of the expression on the RHS. make also
supports recursively expanded variables (created with the = operator), which are
evaluated only when make actually needs the value for proceeding with its work.
We leave this as a topic of further exploration for the interested reader (see e.g.
Mecklenburg [20]).
Improving portability, overriding values at the command line, and phony targets
As a final iteration on our example, we can change a few things in the makefile, to
make it more easily portable to other systems. For example, in Listing 5.3, we hard-
coded (at lines 13 and 16) the commands for compiling and linking the components
of our program. If we needed to use another compiler or different compiler options,
we would need to change the makefile accordingly. However, make provides a
set of intrinsic variables and rules, which we can leverage to make our makefile
more user-friendly, as demonstrated below:
22 srcs := b o x _ m o d e l _ e u l e r _ m a i n . f90 M o d e l S t a t e _ c l a s s . f90 \
23 M o d e l C o n s t a n t s . f90 P h y s i c s C o n s t a n t s . f90 \
24 G e o m U t i l s . f90 N u m e r i c K i n d s . f90
25 objs := $( srcs :. f90 =. o )
26 prog := b o x _ m o d e l _ e u l e r
27
28 $( prog ): $( objs )
29 $( LINK . f ) $^ $( L O A D L I B E S ) $( L D L I B S ) $( O U T P U T _ O P T I O N )
30
31 %. o : %. f90
32 $( C O M P I L E . f ) $< $( O U T P U T _ O P T I O N )
33
34 clean :
35 -$( RM ) *. mod *. o $( prog )
36
37 . PHONY : clean
38
(Footnote 19 continued)
paths. If such a compromise is not acceptable, switching to another build system such as Cross
Platform Make (CMake) or the Software Constraction tool (SCons) may be a more fruitful
strategy.
20 Exceptions are the automatic variables (discussed previously), where the brackets can be (and
39 # additional dependencies
40 $( filter - out N u m e r i c K i n d s .o , $( objs )): N u m e r i c K i n d s . o
41
42 box_model_euler_main .o: PhysicsConstants .o ModelConstants .o
43 box_model_euler_main .o: ModelState_class .o
44
45 ModelConstants .o GeomUtils .o: PhysicsConstants .o
46 ModelState_class .o: GeomUtils .o
47
48 # W A R N I N G : next two values are s p e c i f i c to the GNU c o m p i l e r -- r e a d e r s s h o u l d
49 # adjust this if they are u s i n g a n o t h e r c o m p i l e r / compiler - v e r s i o n .
50 FC := gfortran -4.8
51 F F L A G S := - O2 - std = f 2 0 0 8 t s - p e d a n t i c - Wall
The most important changes compared to the previous makefile appear in lines
29 and 32 of the new Listing 5.4. Here, we use the intrinsic variables LINK.f ,
COMPILE.f , LOADLIBES , LDLIBS , and OUTPUT_OPTION instead of
directly hard-coding the program names. These variables are examples of the recur-
sively expanded variables we already mentioned. Here, the nature of the variables
matters, because they allow us to provide a specific compiler at any point in the
makefile. Specifically, at lines 50 and 51, we define variables FC and FFLAGS ,
which usually stand for Fortran compiler and Fortran compiler flags, respectively.
make will use these to construct COMPILE.f and LINK.f , when the time will
come to evaluate those variables.
A very convenient feature of make variables is that we can override them dirrectly
from the command line,21 when we invoke make. This is achieved by providing
a list of variable assignments; for example, the following invocation would cause
gfortran to use more aggresive optimizations (and to disable warnings), in con-
trast to what we specified in the makefile:
$ make -f M a k e f i l e . v4 F F L A G S = ’ - O3 ’
This feature is also frequently used for switching on additional diagnostics, which
are useful only during debugging sessions.
Another kind of hard-coding in Listing 5.3 was in in lines 19–20, where we told
make that all objects files in our project depend on NumericKinds.o (because
the NumericKinds module selects the precision of most variables used in our
application). However, we need to make NumericKinds.o an exception to this
rule, to avoid infinite recursion. In Listing 5.3 we reconciled these requirements by
simply listing manually all the other object files. However, in our new version (List-
ing 5.4), we use an intrinsic function of make ( filter−out ), to construct this list
programatically; all that is required is to construct another list, taking objs as an
input, and excluding NumericKinds.o —this is exactly what filter−out
does. make has many such intrinsic functions, especially for manipulating strings,
21 make can also use environment variables. However, contrary to options specified on the command
line, those have lower precedence than variables defined within the makefiles.
5.1 Multiple Source Files and Software Build Systems 179
working with filenames, etc. Moreover, developers are also allowed to define their
own functions.22
As a showcase for the last feature of make which we discuss here, we added
(lines 34–35 in Listing 5.4) the clean target. This target, which can be used as
an alternative goal at the command line, removes all of the files that were generated
automatically project (i.e. intermediate files with extension .o or .mod , and the
final program executable). In make parlance, clean is called a phony target, since
we do not have an actual file with this name—it should be thought of as an “abstract”
task. Of course, nothing stops someone from creating a file with this name, which
would probably confuse make. To prevent this problem, all phony targets should
be declared as prerequisites of the special target .PHONY , as we demonstrate on
line 37; with that syntax, we are clarifying to make that “clear” is not to be treated
as a real filename. Phony targets are commonly used by many software packages
and, like clean , some have become quasi-standardized—for example, all (to
build all elements in our project), install , or check (to run any tests that the
package may come with, to check proper functioning of the final executables).
Finally, note that in the actual command of the rule for clean , we precede the
command by a minus sign ( − ). This syntax tells make that it should not abort if
this command is not successful. This is necessary in our present case, since it can
well happen that at least one of the auto-generated files does not exist in the first
place, in which case the removal command will fail, of course.
In the pages above, we presented some basic notions about build systems in gen-
eral (and about make in particular). Here, we provide a short (and very subjective)
overview of build system technologies in general (focusing on those which also
support Fortran).
22 Indeed, make is a Turing-complete language, which means that any imaginable program could
be (in theory) written in the make language itself—just that it may take a lot more effort than using
other languages (which is why make did not make many inroads outside its intended “infrastructure”
role).
180 5 More Advanced Techniques
For small and even medium sized projects, make is a perfectly usable solution,
especially when a single development system with some flavor of Unix is used.23
Gaining some familiarity with make is an excellent way to understand some basic
concepts related to build systems. In addition, since many software projects still rely
on this tool, it is also a time investment that will pay off throughout a developer’s
career.
However, the complexity of the make-based solution can quickly increase, as
soon as:
• we need some more advanced features (such as separate source and build trees in
projects with a nontrivial directory layout), or
• the software needs to compile on multiple machines (with variations in hardware
and/or software configurations)
The problem is not even that make-based systems cannot handle the situations
above—as we hinted in the preceding sections, GNU Make (gmake) in particular
is a very powerful tool, which can be (and has been) successfully used to construct
systems of arbitrary complexity. However, actually achieving this in practice is a
nontrivial task, which is better approached as a distinct software development project
on its own, to be handled in parallel to the actual code of the application. Needless
to say, this is not a task for novices in make’s ways.
As a first approach to some of these problems, many projects began to include
a shell script (usually named configure ), which performed an analysis of the
machine where the software was about to be compiled (“build machine”), and created
the makefile, based on the outcomes of this analysis and on a template makefile
provided by the package’s authors. unix users may have used this command already,
which is part of the standard sequence of commands when compiling a package from
source24 :
$ ./ c o n f i g u r e && make && make i n s t a l l
23 Working on Windows is also feasible, especially if some basic GNU! tools are installed (for
example, as provided by the Cygwin or MinGW projects). However, developers should be prepared
to handle some additional complexity (introduced by using Unix tools into what is essentially a
non-Unix environment).
24 This is different from installing pre-compiled packages through a package manager (such as
apt, rpm or yum, employed by many Linux distributions). In general, installing from source
is only recommended for software which was not adapted to work with such package managers;
unfortunately, many climate models in ESS are in this situation.
5.1 Multiple Source Files and Software Build Systems 181
• checking for optional features: Authors of the software package may want to take
advantage of additional technologies when possible, to enable optional features
(e.g. advanced visualizations) or to improve performance. The second scenario
is common in ESS and high performance computing (HPC) in general, since
hardware vendors often supply versions of commonly used libraries which are
optimized for their systems.
• modifying the makefile, to reflect system characteristics: Once the
configure -script finalized the analysis of the system, it combines this infor-
mation with the makefile-template, to create a final makefile, which is what
the make program actually uses in the next stage.
Autotools suite As the reader may have already guessed, there is nothing easy
(or pleasant) about writing the configure -script manually. In particular, writ-
ing it such that it works correctly across all environments is really challeng-
ing. Fortunately, developers are nowadays spared this effort, thanks to advanced
build systems like the autotools25 suite. Without going into details (see, e.g.,
Calcote [3] for more on autotools), this software suite consists of several pro-
grams and libraries (of which autoconf , automake and libtool are most
prominent). autoconf takes an abstract description (usually from a file named
configure.ac ) of the project’s requirements and optional features, and creates
impressive configure -scripts, which will effortlessly run on most systems. In
a somewhat similar fashion, automake takes a high-level description (usually
from a file named Makefile.am ) of the makefile we want to obtain in the
end, and creates a makefile-template (named Makefile.in ). The maintainer
of the software package usually provides to users the resulting configure -script
and the Makefile.in file. On their side, users run the configure -script,
which performs the already mentioned analysis of the build machine and, based on
the results and the Makefile.in , creates the final makefile . The beauty of
this system is that users are able to configure and compile the software package,
even if they did not install the autotools suite—that is only needed on the package
maintainer’s machine.
While the workflow outlined above is often sufficient, there is an additional com-
ponent, that readers interested in autotools should know about – namely, the creation
of the configure.h file. This C/C++ header file is helpful for backfeeding infor-
mation about the build machine into the project’s source code; it contains definitions
of symbols destined for the preprocessor,26 which can be used to selectively enable
features in the source code.
25 Note that there is no actual program with this name—this is more of an “umbrella” term. Alter-
natively, this collection of tools is also named the GNU build system, because it has become the
de-facto build system in the GNU/Linux world.
26 Most Fortran compilers also allow enabling a C/C++ preprocessor.
182 5 More Advanced Techniques
The third major component of autotools is libtool , which hides from the
developers the idiosyncrasies of the different platforms with respect to how shared
libraries are used.
The “new wave”: SCons and CMake In closing our discussion of build systems,
we should also mention the “competition” to autotools. Noteworthy candidates in
this category are CMake and the SCons. While we refrain from giving specific rec-
ommendations on which system to use, these alternatives may be worth considering
for some of our readers. For Fortran developers, a feature which both SCons and
CMake provide (but was notably missing from autotools at the time of this writing) is
automatic dependency analysis for Fortran code, especially when using modules.27
CMake is actually a meta-build-system, since it supports multiple generators.
To understand the difference, autotools always create in the end a versatile make-
based framework (in a fraction of the time that would be needed if writing the
framework from scratch). This works less well with non-Unix platforms (especially
in Windows at the time this book was written). CMake is more versatile in this
sense, because it also supports creating, for example, native build systems specifi-
cations (e.g. Microsoft Visual Studio and OSX XCode projects). In terms
of features, there is significant overlap with autotools. Also, CMake defines its own
programming language, which again implies a learning curve (although the syntax is
allegedly friendlier than for makefiles or the shell-scripts-with-macros used by
autotools).
SCons is another build system that is roughly equivalent to autotools in terms
of features. Similar to CMake, SCons also has built-in support for non-(Unix)
platforms. A primary focus of the system is build correctness, which is implemented
by also tracking aspects that many other systems miss by default (e.g. changes in
include or library paths, or in compiler flags will trigger a re-compilation of the
affected object files in SCons—see Smith [26]). However, perhaps the most popular
“selling point” of SCons is that it is written as a domain-specific language (DSL)
embedded within the Python programming language, which makes it very easy to
extend, especially for developers which already employ this language for other tasks.
We recommend Smith [26] to readers interested in build systems, for a good
overview and comparisons of these technologies. Also, see Martin and Hoffman
[16] for CMake, and Knight [12] for SCons (as well as the corresponding websites
dedicated to these tools).
5.2 Input/Output
Earlier in Chap. 2, we presented some forms of file-based I/O. Those are, however,
inconvenient for nontrivial application (and even more so for large scale modelling in
ESS). Notable weaknesses of those simple I/O-techniques are that they are both not
27 In principle, this facility is often provided by the compiler and, indeed, it works quite well with
C(++) code. However, gfortran had, at the time of this writing, only primitive support for this,
which shifts the burden more on the build systems.
5.2 Input/Output 183
While reading data from a simple ASCII file (as discussed in Chap. 2), one has to
ensure that the values are read into the right variables, and in the right order, to
match the contents of the input file. Since there is no easy way to document the data
within the file itself, working with such data can become frustrating and error-prone.
The concept of namelist-I/O in Fortran was designed to help in these scenarios,
especially when small amounts of data are involved (e.g. when loading/saving the
model parameters in ESS).
There are two components to consider when working with a namelist: namelist
groups (in the Fortran code), and the .nml files themselves (where data is stored).
We will address both issues below, and afterwards provide a more realistic usage
example (by extending the heat diffusion solver from Sect. 4.1).
Namelist groups are defined via statements in the Fortran application. The statements
can only appear in the declarations part of program units. The general syntax for
declaring such a group is28 :
! D e c l a r a t i o n s for var1 , ... , varn
n a m e l i s t / n a m e l i s t _ g r o u p _ n a m e / var1 [ , var2 , ... , varn ]
! Other d e c l a r a t i o n s ...
! E x e c u t a b l e s t a t e m e n t s of the ( sub ) p r o g r a m
In essence, this tells the compiler that var1 … varn should be treated as a unit in
I/O-statements that use this namelist. To illustrate, here is how we would define
a group which links together two scalar variables (of types logical and real),
an array, and a user-defined type:
8 ! user - d e f i n e d DT
9 type G e o L o c a t i o n
10 real : : mLon , mLat
11 end type G e o L o c a t i o n
12
13 ! Variable - d e c l a r a t i o n s
14 l o g i c a l : : flag = . false .
15 i n t e g e r : : i n F i l e I D =0 , o u t F i l e I D =0
16 real : : t h r e s h o l d = 0.8
17 real , d i m e n s i o n (10) : : a r r a y = 4.8
18 type ( G e o L o c a t i o n ) : : m y P o s = G e o L o c a t i o n (8.81 , 5 3 . 0 8 )
19
28 Note that we use the same convention as in earlier chapters, denoting by square brackets any
optional elements (i.e. the brackets themselves should not appear in actual code).
184 5 More Advanced Techniques
Once the namelist has been defined, it can be used in read- and write-
statements. For example, we could write the current program state in a file:
26 ! W r i t e c u r r e n t data - v a l u e s to a namelist - file
27 open ( n e w u n i t = outFileID , file = " d e m o _ n a m e l i s t _ w r i t e . nml " )
28 w r i t e ( outFileID , nml = m y _ n a m e l i s t )
29 close ( outFileID )
Listing 5.6 src/Chapter5/demo_namelist.f90 (excerpt)
where a possible input file (created by us with a regular text editor) would be:
4 & my_namelist
5 ! C o m m e n t s can be added on d i s t i n c t lines ...
6 myPos % mLon = 9.72 , ! ... or at the end of a line .
7 myPos % mLat = 52.37 ,
8 array = 6*9.1 , ! shorthand - n o t a t i o n for c o n s t a n t
9 ! s e c t i o n s in an array .
10 a r r a y (1) = 2.9 ! o v e r r i d e s p r e v i o u s s p e c i f i c a t i o n for
11 ! first a r r a y e l e m e n t
12 /
Listing 5.8 src/Chapter5/demo_namelist_read.nml (excerpt)—a simple
namelist file
Note that we can specify components of the namelist in any order, and even omit
some of these components—these features are summarized below.
Structure of namelist files When creating (or interpreting) a new namelist file
like the one shown in Listing 5.8, there are several simple syntax rules to consider.
First, the ampersand character ( & ) should appear, followed (without any intervening
space) by the name of the namelist (in our case—my_namelist). After this, the
actual information is specified, as key-value pairs (such as var_name = var_
value ). Each pair can appear on a distinct line, or several of them can be aggregated
in a line, separated with commas ( , ). Finally, a slash ( / ) marks the end of the
namelist-specification.
5.2 Input/Output 185
As a more complex use case for namelists, let us consider how we can improve
the procedure of reading model parameters for the application discussed in Sect. 4.1.
In that version of the code, the parameters were specified in a non-descriptive ASCII
file, reads:
100.
75.
50.
25.
200
1.15 E -6
30.
Listing 5.9 src/Chapter4/config_file_formatted.in —previous version
of input file, for the heat diffusion solver (Chap. 4)
This is not a robust approach, since there is no information (in the file itself) about
what each line of input represents. We can easily improve this, by modifying the con-
structor (= initializer) of the Config-type. The changes we need to make (relative
to the program src/Chapter4/solve_heat_diffusion.f90 ) are actually
minimal, and concentrated in the initializer function ( createConfig ):
186 5 More Advanced Techniques
48 module Config_class
49 use N u m e r i c K i n d s
50 i m p l i c i t none
51 private
52
53 type , p u b l i c : : C o n f i g
54 real ( RK ) : : m D i f f u s i v i t y = 1.15 E -6 _RK , & ! s a n d s t o n e
55 ! NOTE : " p h y s i c a l " u n i t s here ( C e l s i u s )
56 mTempA = 100. _RK , &
57 mTempB = 75. _RK , &
58 mTempC = 50. _RK , &
59 mTempD = 25. _RK , &
60 m S i d e L e n g t h = 30. _RK
61 i n t e g e r ( IK ) : : mNx = 200 ! # of points for square side - length
62 end type C o n f i g
63
64 ! Generic IFACE for user - d e f i n e d CTOR
65 interface Config
66 module procedure createConfig
67 end i n t e r f a c e C o n f i g
68
69 contains
70 type ( C o n f i g ) f u n c t i o n c r e a t e C o n f i g ( c f g F i l e P a t h )
71 c h a r a c t e r ( len =*) , i n t e n t ( in ) : : c f g F i l e P a t h
72 integer : : cfgFileID
73
74 ! C o n s t a n t to act as safeguard - marker , a l l o w i n g us to c h e c k if v a l u e s were
75 ! a c t u a l l y o b t a i n e d from the N A M E L I S T .
76 ! NOTE : ’ -9999 ’ is an integer which can be * e x a c t l y * r e p r e s e n t e d in the
77 ! m a n t i s s a of single -/ double - p r e c i s i o n IEEE reals . This means that the
78 ! expression :
79 ! int ( aReal , IK ) == MISS
80 ! will be TRUE as long as
81 ! ( a ) ’ aReal ’ was i n i t i a l i z e d with MISS and
82 ! ( b ) other i n s t r u c t i o n s ( e . g . NAMELIST - I / O here ) did not modify the
83 ! value of ’ aReal ’.
84 i n t e g e r ( IK ) , p a r a m e t e r : : MISS = -9999
85
86 ! We need local - variables , to mirror the ones in the N A M E L I S T
87 real : : s i d e L e n g t h = MISS , d i f f u s i v i t y = MISS , &
88 t e m p A = MISS , t e m p B = MISS , t e m p C = MISS , tempD = MISS
89 i n t e g e r : : nX = MISS
90 ! NAMELIST definition
91 n a m e l i s t / h e a t _ d i f f u s i o n _ a d e _ p a r a m s / s i d e L e n g t h , d i f f u s i v i t y , nX , &
92 tempA , tempB , tempC , t e m p D
93
94 open ( n e w u n i t = cfgFileID , file = trim ( c f g F i l e P a t h ) , s t a t u s = ’ old ’ , a c t i o n = ’ read ’ )
95 read ( cfgFileID , nml = h e a t _ d i f f u s i o n _ a d e _ p a r a m s )
96 close ( cfgFileID )
97
98 ! For d i a g n o s t i c s : echo i n f o r m a t i o n back to t e r m i n a l .
99 w r i t e (* , ’ (" > > S T A R T : N a m e l i s t we read < <") ’ )
100 w r i t e (* , nml = h e a t _ d i f f u s i o n _ a d e _ p a r a m s )
101 w r i t e (* , ’ (" > > END : N a m e l i s t we read < <") ’ )
102
103 ! A s s i m i l a t e data read from N A M E L I S T into new object ’ s i n t e r n a l s t a t e .
104 ! NOTE : Here , we make use of the safeguard - constant , so that d e f a u l t v a l u e s
105 ! ( from the type - d e f i n i t i o n ) are o v e r w r i t t e n only if the user p r o v i d e d
106 ! r e p l a c e m e n t v a l u e s ( in the N A M E L I S T ).
107 if ( int ( sideLength , IK ) /= MISS ) c r e a t e C o n f i g % m S i d e L e n g t h = s i d e L e n g t h
108 if ( int ( diffusivity , IK ) /= MISS ) c r e a t e C o n f i g % m D i f f u s i v i t y = d i f f u s i v i t y
109 if ( nX /= MISS ) c r e a t e C o n f i g % mNx = nX
110 if ( int ( tempA , IK ) /= MISS ) c r e a t e C o n f i g % m T e m p A = t e m p A
111 if ( int ( tempB , IK ) /= MISS ) c r e a t e C o n f i g % m T e m p B = t e m p B
112 if ( int ( tempC , IK ) /= MISS ) c r e a t e C o n f i g % m T e m p C = t e m p C
113 if ( int ( tempD , IK ) /= MISS ) c r e a t e C o n f i g % m T e m p D = t e m p D
114 end f u n c t i o n c r e a t e C o n f i g
115 end m o d u l e C o n f i g _ c l a s s
As necessary infrastructure for namelist I/O, we add several local variables (lines
86–89), which are packaged into the namelist definition (lines 90–92). In lines 94–96
the namelist is used and, as a debugging facility, the final status of variables in the
namelist group is printed on-screen.
The rest of the new code (lines 74–84 and 103–113) is necessary to account for
the possibility of incomplete namelist files. As already mentioned, this feature is
very useful for simplifying interaction with the code. For example, if the user only
needs to change the diffusivity of the material (while keeping all other parameters
at default values), the input file should contain just the entry for the new diffusivity.
To support such partial updates of the configuration, however, we need a mechanism
for checking if a parameter’s value was actually obtained from the namelist file. Our
5.2 Input/Output 187
simple approach here is to initialize all numeric members of the namelist group with
a special value (MISS=−9999), which is known to reside well outside the valid range
for the simulation parameters. Note that MISS is an integer, but it can also be used
to mark floating-point variables as “dirty” (un-initialized).29 All local variables will
start in this state, and will be transferred as simulation parameters only if updated
during the namelist-read command (at line 95).
As a sample namelist-based input file, we have:
1 & heat_diffusion_ade_params
2 ! Physical parameters .
3 diffusivity = 1.15 e -6 , ! thermal - d i f f u s i v i t y coeff ( m ^2/ s )
4 ! NOTE : commenting - out line below will cause default - value to be picked
5 s i d e L e n g t h = 10. ! l e n g t h of square - side ( m )
6
7 ! Constant - t e m p e r a t u r e b o u n d a r y c o n d i t i o n s .
8 t e m p A = 100. ,
9 tempB = 75. ,
10 tempC = 50. ,
11 tempD = 25. ,
12
13 ! Numerical parameters .
14 nX = 300
/
15
Although namelists are really useful in many cases (e.g. for providing model
parameters), they are unsuited for handling larger volumes of data, due to the same
types of storage and computing-time inefficiencies which affect ASCII files30 (as dis-
cussed in Chap. 2). Since large volumes of data are very common in ESS, developers
were historically forced to use various forms of binary I/O. However, while such
approaches reduce the efficiency problems, they spawned considerable difficulties
for scientific collaboration, as most research groups developed their own practices
for storing such data, making datasets from different scientists more challenging to
compare (on technical grounds) than necessary. Standardization efforts were clearly
29 This works because the absolute value of MISS is still small enough to fit into the mantissa of
the common floating-point formats. Given our choice of numeric kinds, this ensures that we can
compare the integer part of the real variables against MISS (lines 107–108 and 110–113), without
having to worry about floating-point approximation of numbers. In general, however, note that
direct comparisons of real variables should be avoided whenever possible, since this can easily
introduce bugs (see also discussion in Sect. 2.3.2).
30 Indeed, namelist files are ASCII files, just that they require a special format.
188 5 More Advanced Techniques
necessary (for the benefit of all stakeholders), and the World Meteorological orga-
nization (WMO) pioneered such work. While those early solutions improved the sit-
uation, they still had some technical problems (see, e.g., Caron [4]). In response
to these concerns, the Network Common Data Format (netCDF) data formats were
created, supported by the University Corporation for Atmospheric Research (UCAR).
In this section, we focus on these latter technologies, which have become the de-facto
standard, especially for modelling work in ESS.
In addition to being platform- and language-independent, the netCDF-formats
also permit efficient I/O31 and creating self-describing datasets.32 Another notewor-
thy aspect is that UCAR aims to keep the software backwards-compatible so that,
once created, a netCDF-file can still be accessed by future versions of the software.
As high-level components in the netCDF “ecosystem”, we can identify:
1. First, we have the data formats themselves, which are public specifications of
how data is to be stored. Two formats (named classic and 64-bit offset) are also
open standards of the Open Geospatial Consortium (OGC).
2. In the second layer, we have software libraries (similar to what we described in
Sect. 5.1.1), which can read and write data in the netCDF-formats. These are also
provided and maintained by UCAR, as a courtesy for application developers.33 In
fact, UCAR provides several such libraries, in two “strands”: one for the JAVA-
language, and a second strand for compiled native languages. In the second strand
we have a C core library, with Application Programming Interfaces (APIs) for
several languages (C, C++, Fortran 90 and the older Fortran 77). These “wrapper”-
libraries depend on the C core library,34 and so do the many third-party packages
which are available for using netCDF within scripting languages (Python, R,
IDL, Perl, MATLAB, etc.).
The use of the common core in the second strand also ensures that programs
written in different languages can exchange data via netCDF-files.
3. Finally, in the third layer, we have the applications which use netCDF. Most
models in ESS can be included in this broad category, as well as utility packages
which facilitate post-processing and plotting of results:
• manipulation at the command line: Readers familiar with Unix will find the
cdo and nco packages useful, since they enable powerful manipulations
from the command line, and can also be used for automated post-processing
(with shell scripts).
31 Depending on the data access patterns of the application, some knowledge about the representation
developers, who know what the data actually stands for—the advantage of netCDF is that it enables
embedding such information (“metadata”) within the same file which holds the binary data.
33 This is an excellent example of how libraries are useful—in this case, they relieve most scientists
from having to worry about how their data is mapped to bits in the computer and the other way
around.
34 The dependency is important, for example, if the libraries need to be compiled from source for
some reason.
5.2 Input/Output 189
To understand how the various functions in the API fit together, it is useful to have
an overview of the high-level concepts in the netCDF data model:
1. dataset: In netCDF terminology, a dataset represents the top-level entity, to
which variables, dimensions, or attributes belong. In this model, for each dataset
we have a corresponding file on the user’s computer (which contains, for example,
some measurements or model output).
190 5 More Advanced Techniques
35 For unstructured meshes, it is possible to define an additional dimension based on the index of
the element/vertex.
36 In fact, the HDF5 library is a prerequisite when using the netCDF-4 format, since netCDF uses
the idea, groups provide a mechanism for organizing the data hierarchically. These
are similar in spirit to the Unix directory tree: each dataset has a root group, which
can have as “children” the usual variables, dimensions, and attributes, as well as
other groups (which enables a multi-level hierarchy tree). Each sub-group can be
viewed as a separate dataset (with its private variables and attributes), except that
dimensions are also visible from children sub-groups.
6. user-defined types: Another netCDF-4 feature (which we also do not cover in
detail) is the possibility of defining custom types in addition to the ones permitted
in the classic format (NF90_INT, NF90_FLOAT, etc.). In principle, such custom
types may be useful for storing data which does not fit the usual netCDF model
(although they also increase complexity, and may severely restrict the selection
of software that can read the data).
(discussed in Chap. 2). Most entities (the dataset itself, variables, dimensions and
attributes39 ) have such IDs, which are tracked internally by the library. The user’s
interaction with these variables commonly follows one of the patterns below:
• The library returns an ID when a new entity is created, or when the user calls
some function from the inquire-family (to search for an entity by name, etc.). For
example, we get a dataset ID after creating a new dataset with nf90_create .
Similarly, if we read an existing file and we know that there is a dimension named
“lat” in the dataset, we can use the function nf90_inq_dimid to retrieve the
ID of that dimension.
• Once we acquired an ID-value, we can use this in other library calls, to operate
(usually—read, write, or further inquire) on entities. For example, when writing
data to a file (with nf90_put_var ), we need to pass in the previously acquired
IDs of the parent dataset and of our variable. The same IDs are needed for the
opposite operation of reading data from a pre-existing file (except that we need to
call function nf90_get_var ).
Finally, note that some of the library functions require both input and output IDs
as arguments. While the actual values of all IDs are maintained internally by the
netCDF-library, users still need to declare and keep track of these variables, which
can lead to some complexity. Therefore, it is a good idea to separate the code which
interacts with the library from the “core” of the applications. This can be achieved
by either grouping the library calls into separate functions (our approach below), or
using the new block / end block construct (not available in all compilers at
the time of writing), which allows grouping ID-declarations closer to the library calls
that need them.
Error handling When using I/O libraries such as netCDF, there are many points
where problems can appear: file system or quota limitations may be reached, network-
attached storage (NAS) systems may go offline (for cluster users), and sometimes
even hard disks may fail. To report such situations to the developer, all functions in
the netCDF-library return an error code. This mechanism is somewhat similar to
exceptions in other programming languages (e.g. C++), except that here the program
would continue (by default) for as long as possible. It has become common practice
to define a wrapper subroutine, through which all library calls are made. The version
we use here is:
! error - c h e c k i n g w r a p p e r for n e t C D F o p e r a t i o n s
subroutine ncCheck ( status )
! ...............
i n t e g e r ( I3B ) , i n t e n t ( in ) : : s t a t u s
if ( s t a t u s /= n f 9 0 _ n o e r r ) then
write (* , ’ ( a ) ’) trim ( n f 9 0 _ s t r e r r o r ( s t a t u s ) )
stop " ERROR ( netCDF - r e l a t e d ) See m e s s a g e a b o v e for d e t a i l s ! A b o r t i n g ... "
end if
end s u b r o u t i n e n c C h e c k
Listing 5.12 Wrapper subroutine for calling netCDF functions used in this book
39 Attributes are a special case, since their values are retrieved by name. However, the IDs still exist
(denoted as attribute numbers in the documentation) and can be useful for writing generic software,
which can handle arbitrary netCDF-files.
5.2 Input/Output 193
In this section, we outline the steps for creating a netCDF-dataset. After some
general considerations, we apply this technique to the RB-LBM code developed in
Chap. 4, to significantly improve the I/O efficiency of that program.
To write a new dataset, we first need to create it by calling nf90_create . The
netCDF-library will then continue to track this dataset internally and allow us to
interact with it, until we call the function nf90_close . Between these two calls,
the dataset is said to be in one of two possible modes, as follows40 :
1. define-mode: Immediately after creation with nf90_create , the dataset will
be in this mode. At this point, the general structure of the dataset (as well as any
metadata) needs to be defined. Depending on the specifics of the dataset, this is
achieved with a combination of calls to nf90_put_att (to define attributes),
to nf90_def_dim (define dimensions), and to nf90_def_var (define
variables).
As mentioned earlier, a convention that is used frequently in this stage is to define,
for each dimension, a 1D variable with the same name as the dimension. These
are also known as dimension variables, and provide the one-to-one mappings
between discrete indices (i, j, k, etc.) in the variable arrays and actual coordinate
values (for example, longitude, latitude, and depth/height in many ESS models,
assuming a structured mesh).
To start writing actual data values (including for the dimension variables), we
have to specifically instruct the netCDF-library to leave define-mode and enter
data-mode, by calling the function nf90_enddef .
2. data-mode: The second phase consists of actually writing variable values to our
dataset (including dimension variables, if any were declared). The most important
function at this time is nf90_put_var . This can be used to write either all
values in the variable at once or a subset of the variable (lower-dimensional
“slice”, or even individual scalar value).
Finally, when there is no more data to be added to our dataset, we signal the end
of this stage by closing the file (with the nf90_close ) function.
Example: adding netCDF-output support for the LBM-MRT solver: Earlier
in Chap. 4 we presented an application which solved the 2D Rayleigh-Bénard (RB)
problem, using the lattice Boltzmann method (LBM). That initial version of the appli-
cation could only write results in ASCII files (with an ad-hoc structure, which required
a customized parser – see the R script Chapter4/plotFieldFromAscii.R ).
However, we prepared the ground for improving the I/O of the application, by
separating the control logic for writing the output into a base type (OutputBase),
40 For brevity, we only describe the most common use case, when these modes are used in a simple
linear sequence. However, the netCDF-library also allows switching back and forth between these
two modes.
194 5 More Advanced Techniques
from which the type41 OutputAscii was derived. In this later type we isolated
the portions of the I/O code that were specific to the ASCII format. We now return
to this example, to add support for netCDF-output. The natural approach is to
define a similar type (we will call it OutputNetcdf), which is also derived from
OutputBase. The code is provided below; note that we also split the application
into several files, as demonstrated with the box model earlier in this chapter, to make
the components of the application easier to understand.
Also, note that all variables with “ID” at the end of their names are initialized by
the netCDF-library, during the procedure call where they are first used.
Most of the new code is in the file Chapter5/lbm2d_mrt_rb_v2/Output
Netcdf_class.f90 , which contains the module OutputNetcdf_class. As
usual, in the first part of the module we have the definition of the new type (derived
from OutputBase):
8 module OutputNetcdf_class
9 use N u m e r i c K i n d s , only : IK , I3B , RK
10 use O u t p u t B a s e _ c l a s s
11 use n e t c d f
12 i m p l i c i t none
13
14 type , e x t e n d s ( O u t p u t B a s e ) : : O u t p u t N e t c d f
15 private
16 ! i n t e r n a l h a n d l e r s for n e t C D F o b j e c t s
17 i n t e g e r ( I3B ) : : mNcID , m P r e s s V a r I D , mUxVarID , mUyVarID , mTempVarID , &
18 mUyMaxVarID , mTimeVarID
19 contains
20 private
21 ! p u b l i c m e t h o d s which differ from base - class a n a l o g u e s
22 procedure , p u b l i c : : init = > i n i t O u t p u t N e t c d f
23 procedure , p u b l i c : : w r i t e O u t p u t = > w r i t e O u t p u t N e t c d f
24 procedure , p u b l i c : : c l e a n u p = > c l e a n u p O u t p u t N e t c d f
25 ! internal method
26 procedure prepareFileOutputNetcdf
27 end type O u t p u t N e t c d f
! . . . . . . . . . . . . . . . ( c o n t i n u e s below ) . .....
28
Note that we need to use netcdf (line 11 above), so that the compiler will recognize
the netCDF functions which we will invoke later. As internal variables for each
instance of our new DT, we have some integers, which keep track of the netCDF-
IDs (mNcID is the ID of the file/dataset, and the rest are variable IDs). Also, we bind to
the generic interfaces procedures which are specific to this type—these are discussed
below.
First, we have initOutputNetcdf:
30 ! . . . . . . . . . . . . . . . ( c o n t i n u e d from above ) ......
31 contains
32 s u b r o u t i n e i n i t O u t p u t N e t c d f ( this , nX , nY , n u m O u t S l i c e s , dxD , dtD , &
33 nItersMax , o u t F i l e P r e f i x , Ra , Pr , m a x M a c h )
34 c l a s s ( O u t p u t N e t c d f ) , i n t e n t ( i n o u t ) : : this
35 i n t e g e r ( IK ) , i n t e n t ( in ) : : nX , nY , n u m O u t S l i c e s , n I t e r s M a x
36 real ( RK ) , i n t e n t ( in ) : : dxD , dtD , Ra , Pr , m a x M a c h
37 c h a r a c t e r ( len =*) , i n t e n t ( in ) : : o u t F i l e P r e f i x
38
39 ! i n i t i a l i z e parent - type
40 call this % O u t p u t B a s e % i n i t ( nX , nY , n u m O u t S l i c e s , dxD , dtD , &
41 nItersMax , o u t F i l e P r e f i x , Ra , Pr , m a x M a c h )
42
41 Here, “type” is the equivalent of what we would name “class” in C++ or Java.
5.2 Input/Output 195
43 if ( this % i s A c t i v e () ) then
44 call this % p r e p a r e F i l e O u t p u t N e t c d f ()
45 end if
46 end s u b r o u t i n e i n i t O u t p u t N e t c d f
! . . . . . . . . . . . . . . . ( c o n t i n u e s below ) ......
47
This is the analogue of initOutputAscii from the previous chapter. We also call
the “init” subroutine of the underlying base type. However, the actual netCDF-file
initialization is delegated to the subroutine prepareFileOutputNetcdf:
49 ! . . . . . . . . . . . . . . . ( c o n t i n u e d from above ) ......
50 s u b r o u t i n e p r e p a r e F i l e O u t p u t N e t c d f ( this )
51 c l a s s ( O u t p u t N e t c d f ) , i n t e n t ( i n o u t ) : : this
52
53 ! V a r i a b l e s to store t e m p o r a r y IDs r e t u r n e d by the n e t C D F l i b r a r y ; no need
54 ! to save these , since they are only needed when w r i t i n g the file - header .
55 ! NOTES : - we have 3 d i m e n s i o n IDs (2 D = space + 1 D = time )
56 ! - HOWEVER , there is no ’ tVarID ’ , since this ID is needed later ( to
57 ! append values to this UNLIMITED - axis ) , so it is stored in the
58 ! i n t e r n a l s t a t e of the type ( in ’ mTimeVarID ’)
59 i n t e g e r ( I3B ) : : d i m I D s (3) , xDimID , yDimID , tDimID , &
60 xVarID , y V a r I D
61
62 ! create the netCDF - file ( N F 9 0 _ C L O B B E R o v e r w r i t e s file if it a l r e a d y exists ,
63 ! while N F 9 0 _ 6 4 B I T _ O F F S E T e n a b l e s 64 bit - offset mode )
64 call n c C h e c k ( n f 9 0 _ c r e a t e ( &
65 path = trim ( a d j u s t l ( this % m O u t F i l e P r e f i x )) // " . nc " , &
66 c m o d e = ior ( N F 9 0 _ C L O B B E R , N F 9 0 _ 6 4 B I T _ O F F S E T ) , ncid = this % mNcID ) )
67
68 ! global a t t r i b u t e s
69 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , N F 9 0 _ G L O B A L , " C o n v e n t i o n s " , " CF -1.6 " ) )
70 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , N F 9 0 _ G L O B A L , S P A C E _ U N I T S _ S T R , &
71 " c h a n n e l h e i g h t $L$" ) )
72 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , N F 9 0 _ G L O B A L , T I M E _ U N I T S _ S T R , &
73 " d i f f u s i v e time - s c a l e $\ frac { L ^2}{\ kappa }$" ) )
74 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , N F 9 0 _ G L O B A L , P R E S S _ U N I T S _ S T R , &
75 "$\ frac {\ rho_0 \ kappa ^2}{ L ^2}$" ) )
76 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , N F 9 0 _ G L O B A L , V E L _ U N I T S _ S T R , &
77 "$\ frac {\ kappa }{ L }$" ) )
78 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , N F 9 0 _ G L O B A L , T E M P _ U N I T S _ S T R , &
79 " t e m p e r a t u r e - d i f f e r e n c e b e t w e e n h o r i z o n t a l w a l l s $\ theta_b -\ t h e t a _ t $" ) )
80 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , N F 9 0 _ G L O B A L , " Ra " , this % mRa ) )
81 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , N F 9 0 _ G L O B A L , " Pr " , this % mPr ) )
82 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , N F 9 0 _ G L O B A L , " m a x M a c h " , this % m M a x M a c h ) )
83
84 ! define d i m e n s i o n s ( netCDF will return ID for each )
85 call n c C h e c k ( n f 9 0 _ d e f _ d i m ( this % mNcID , " x " , this % mNx , x DimID ) )
86 call n c C h e c k ( n f 9 0 _ d e f _ d i m ( this % mNcID , " y " , this % mNy , y DimID ) )
87 call n c C h e c k ( n f 9 0 _ d e f _ d i m ( this % mNcID , " t " , N F 9 0 _ U N L I M I T E D , t D i m I D ) )
88 ! define c o o r d i n a t e s
89 call n c C h e c k ( n f 9 0 _ d e f _ v a r ( this % mNcID , " x " , NF90_REAL , xDimID , x V a r I D ) )
90 call n c C h e c k ( n f 9 0 _ d e f _ v a r ( this % mNcID , " y " , NF90_REAL , yDimID , y V a r I D ) )
91 call n c C h e c k ( n f 9 0 _ d e f _ v a r ( this % mNcID , " t " , NF90_REAL , tDimID , &
92 this % m T i m e V a r I D ) )
93 ! assign units - a t t r i b u t e s to c o o r d i n a t e vars
94 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , xVarID , " u n i t s " , " 1 " ) )
95 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , xVarID , " l o n g _ n a m e " , S P A C E _ U N I T S _ S T R ) )
96 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , yVarID , " u n i t s " , " 1 " ) )
97 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , yVarID , " l o n g _ n a m e " , S P A C E _ U N I T S _ S T R ) )
98 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , this % mTimeVarID , " u n i t s " , " 1 " ) )
99 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , this % mTimeVarID , " l o n g _ n a m e " , &
100 TIME_UNITS_STR ) )
101
102 ! dimIDs - array is used for p a s s i n g the IDs c o r r e s p o n d i n g to the d i m e n s i o n s
103 ! of the v a r i a b l e s
104 d i m I D s = [ xDimID , yDimID , t D i m I D ]
105
106 ! d e f i n e the v a r i a b l e s : to save space , we store most r e s u l t s as N F 9 0 _ R E A L ;
107 ! however , for the ’ mUyMax ’ - field , we need NF90_DOUBLE , to d i s t i n g u i s h the
108 ! 1 st b i f u r c a t i o n in the Rayleigh - Benard system
109 call n c C h e c k ( &
110 n f 9 0 _ d e f _ v a r ( this % mNcID , " p r e s s _ d i f f " , NF90_REAL , dimIDs , this % m P r e s s V a r I D ))
111 call n c C h e c k ( &
112 n f 9 0 _ d e f _ v a r ( this % mNcID , " t e m p _ d i f f " , NF90_REAL , dimIDs , this % m T e m p V a r I D ))
113 call n c C h e c k ( &
114 n f 9 0 _ d e f _ v a r ( this % mNcID , " u_x " , NF90_REAL , dimIDs , this % m U x V a r I D ))
115 call n c C h e c k ( &
116 n f 9 0 _ d e f _ v a r ( this % mNcID , " u_y " , NF90_REAL , dimIDs , this % m U y V a r I D ))
117 call n c C h e c k ( &
118 n f 9 0 _ d e f _ v a r ( this % mNcID , " m a x _ u _ y " , N F 9 0 _ D O U B L E , tDimID , this % m U y M a x V a r I D ))
119
120 ! assign units - a t t r i b u t e s to output - v a r i a b l e s
121 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , this % mPressVarID , " u n i t s " , " 1 " ) )
122 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , this % mPressVarID , " l o n g _ n a m e " , &
123 PRESS_UNITS_STR ) )
124 call n c C h e c k ( n f 9 0 _ p u t _ a t t ( this % mNcID , this % mTempVarID , " u n i t s " , " 1 " ) )
196 5 More Advanced Techniques
144 ! . . . . . . . . . . . . . . . ( c o n t i n u e s below ) ......
This is where we encounter the first calls to the netCDF-library. To report any errors,
we wrap all library calls with the ncCheck-subroutine we already presented in List-
ing 5.12. After creating the file with nf90_create (lines 64–66 in Listing 5.15),
the dataset enters define-mode. Note that in several parts of the netCDF-library
it is possible to combine several options by ior ing them—we used this tech-
nique while creating the dataset, to combine the options NF90_CLOBBER and
NF90_64BIT_OFFSET.
In lines 69–82 we write a few global attributes (by passing the flag NF90_GLOBAL
to function nf90_put_att), to document what the dataset contains. Afterwards,
in lines 85–87, we call nf90_def_dim, to define the dimensions for the variables
in our dataset. The first two (“x” and “y”) are “normal” dimensions, in the sense
that their lengths are fixed when the dataset is created (depending on the mesh size
calculated from the simulation parameters and the stability/accuracy criteria). On the
other hand, the third dimension (“t”) is declared as unlimited,42 by specifying the
special value NF90_UNLIMITED instead of a length. This allows us, in principle,
to re-open the dataset later and append more data to the variables which include this
dimension.
In lines 89–92 we define (by calling nf90_def_var) the three variables cor-
responding to each dimension. Note that the ID returned for the time variable
(this%mTimeVarID) is the only one which will not be lost when the subroutine
terminates—the IDs of the space variables are not necessary at later stages, since
their values are known already when prepareFileOutputNetcdf is executed
(indeed, those variables are written in this procedure, as we shall soon see).
In lines 94–100, we write some more attributes (this time—attached to the dimen-
sion variables).
Then, in line 104, we assemble a 1D array of dimension IDs, which we use in lines
109–116, when we define the variables for the core output field of our simulation (the
last variable, however, represents a simple time series, so it only needs tDimID—see
lines 117–118). As the reader may expect already, we document these variables also,
with calls to nf90_put_att (lines 121–135).
42 Sometimes, the term record dimension is used with the same meaning.
5.2 Input/Output 197
Since there is no more metadata to be written, we end define-mode (and enter data-
mode by calling nf90_enddef on line 138. Immediately after that (lines 141–142),
we use the subroutine nf90_put_var to write the variables for the spatial axes,
since they are not dependent on time. Here, the procedure nf90_put_var is used
to write all values of the variable arrays at once. However, as we will discuss later,
the same procedure also allows writing of single variables, or of subsections of
an array—indeed, while the previous library calls dutifully prepared the “context”,
nf90_put_var takes all the credit, because it is the procedure which actually
writes our simulation data on disk.
As the prepareFileOutputNetcdf-subroutine terminates, our dataset will
remain in data-mode, so that we can later write the time-dependent data (i.e. actual
model output and the corresponding time values).
With the dataset prepared by the subroutine prepareFileOutputNetcdf
discussed above, it is time to show the subroutine which actually writes the simu-
lation output, as this becomes available during our time sweep. This is the role of
writeOutputNetcdf:
146 ! . . . . . . . . . . . . . . . ( c o n t i n u e d from above ) ......
147 s u b r o u t i n e w r i t e O u t p u t N e t c d f ( this , rawMacros , i t e r N u m )
148 c l a s s ( O u t p u t N e t c d f ) , i n t e n t ( i n o u t ) : : this
149 real ( RK ) , d i m e n s i o n (: , : , 0:) , i n t e n t ( in ) : : r a w M a c r o s
150 i n t e g e r ( IK ) , i n t e n t ( in ) : : i t e r N u m
151 ! local v a r i a b l e s
152 real ( RK ) : : c u r r T i m e
153
154 if ( this % i s T i m e T o W r i t e ( i t e r N u m ) ) then
155 ! i n c r e m e n t o u t p u t time - slice if it is time to g e n e r a t e o u t p u t
156 this % m C u r r O u t S l i c e = this % m C u r r O u t S l i c e + 1
157
158 ! E v a l u a t e c u r r e n t d i m e n s i o n l e s s - time (0.5 due to Strang - s p l i t t i n g )
159 if ( i t e r N u m == 0 ) then
160 c u r r T i m e = 0. _RK
161 else
162 c u r r T i m e = ( iterNum -0.5 _RK )* this % mDtD
163 end if
164 ! append value to U N L I M I T E D time - d i m e n s i o n
165 call n c C h e c k ( n f 9 0 _ p u t _ v a r ( this % mNcID , this % m T i m e V a r I D , &
166 v a l u e s = currTime , start =[ this % m C u r r O u t S l i c e ]) )
167
168 this % mUyMax = m a x v a l ( abs ( r a w M a c r o s (: , : , 1)) )
169
170 ! write data ( scaled to d i m e n s i o n l e s s units ) to file
171 ! - d i m e n s i o n l e s s pressure - d i f f e r e n c e
172 call n c C h e c k ( n f 9 0 _ p u t _ v a r ( this % mNcID , this % mPressVarID , &
173 v a l u e s = r a w M a c r o s (: ,: ,0)* this % m D R h o S o l v e r 2 P r e s s D i m l e s s , &
174 start =[1 , 1 , this % m C u r r O u t S l i c e ] , count =[ this % mNx , this % mNy , 1]) )
175 ! - d i m e n s i o n l e s s temperature - d i f f e r e n c e
176 call n c C h e c k ( n f 9 0 _ p u t _ v a r ( this % mNcID , this % m T e m p V a r I D , &
177 v a l u e s = r a w M a c r o s (: ,: ,3) , &
178 start =[1 , 1 , this % m C u r r O u t S l i c e ] , count =[ this % mNx , this % mNy , 1]) )
179 ! - d i m e n s i o n l e s s Ux
180 call n c C h e c k ( n f 9 0 _ p u t _ v a r ( this % mNcID , this % mUxVarID , &
181 v a l u e s = r a w M a c r o s (: ,: ,1)* this % m V e l S o l v e r 2 V e l D i m l e s s , &
182 start =[1 , 1 , this % m C u r r O u t S l i c e ] , count =[ this % mNx , this % mNy , 1]) )
183 ! - d i m e n s i o n l e s s Uy
184 call n c C h e c k ( n f 9 0 _ p u t _ v a r ( this % mNcID , this % mUyVarID , &
185 v a l u e s = r a w M a c r o s (: ,: ,2)* this % m V e l S o l v e r 2 V e l D i m l e s s , &
186 start =[1 , 1 , this % m C u r r O u t S l i c e ] , count =[ this % mNx , this % mNy , 1]) )
187 ! - max < Uy > ( for b i f u r c a t i o n test c r i t e r i o n )
188 call n c C h e c k ( n f 9 0 _ p u t _ v a r ( this % mNcID , this % mUyMaxVarID , &
189 values = this % mUyMax * this % m V e l S o l v e r 2 V e l D i m l e s s , s t a r t =[ this % m C u r r O u t S l i c e ]))
190 end if
191 end s u b r o u t i n e w r i t e O u t p u t N e t c d f
192 ! . . . . . . . . . . . . . . . ( c o n t i n u e s below )......
The subroutine operates primarily with the data array rawMacros (passed from
the RBenardSimulation-instance which owns this-instance of OutputNet
cdf). This is a 3D array, where the first two dimensions are for space, and the
198 5 More Advanced Techniques
43Note that this also causes the array mTVals (which pre-computed the time coordinates for the
output slices in OutputBase) to become obsolete.
44 The need for this explicit “tear-down” process may be eliminated if the compiler supports the
final procedure attribute (Fortran 2008). Using that feature, it would be enough to mark the
cleanupOutputNetcdf procedure as final in line 24 of Listing 5.13, and the compiler
would remember to call this when the OutputNetcdf instance goes out of scope.
5.2 Input/Output 199
194 ! . . . . . . . . . . . . . . . ( c o n t i n u e d from above ) ......
195 s u b r o u t i n e c l e a n u p O u t p u t N e t c d f ( this )
196 c l a s s ( O u t p u t N e t c d f ) , i n t e n t ( i n o u t ) : : this
197 if ( this % i s A c t i v e () ) then
198 call n c C h e c k ( n f 9 0 _ c l o s e ( this % mNcID ) )
199 call this % O u t p u t B a s e % c l e a n u p ()
200 end if
201 end s u b r o u t i n e c l e a n u p O u t p u t N e t c d f
202 ! ...............
203
204 ! error - c h e c k i n g w r a p p e r for n e t C D F o p e r a t i o n s
205 subroutine ncCheck ( status )
206 ! ...............
207 end s u b r o u t i n e n c C h e c k
208 end m o d u l e O u t p u t N e t c d f _ c l a s s
Listing 5.17 src/Chapter5/lbm2d_mrt_rb_v2/OutputNetcdf_
class.f90 (excerpt)—subroutines cleanupOutputNetcdf, ncCheck (and end of the
OutputNetcdf_class module)
Finally, the last function in the module (but with implementation omitted in the
listing above) is ncCheck, which is our wrapper subroutine for error checking.
Next, we need to actually use, in the RBenardSimulation_class-module,
the new type of “output sink” presented above. A straightforward approach,45 which
we also use below, is to simply replace previous occurrences of OutputAscii with
OutputNetcdf. Specifically, this implies that we now have to use the new module
(line 4 below), and to declare the mOutSink member of the RBenardSimulation
type to be of type(OutputNetcdf) (see line 25 below):
1 module RBenardSimulation_class
2 use N u m e r i c K i n d s , only : IK , RK
3 use M r t S o l v e r B o u s s i n e s q 2 D _ c l a s s
4 use O u t p u t N e t c d f _ c l a s s
5 i m p l i c i t none
6
7 ! Fixed simulation - p a r a m e t e r s
8 real ( RK ) , p a r a m e t e r : : &
9 ! To allow the 1 st i n s t a b i l i t y to develop , the aspect - ratio needs to be a
10 ! m u l t i p l e of $ \ frac {2 \ pi }{ k_C } $ , where $ k_C = 3.117 $ ( see [ S h a n 1 9 9 7 ]).
11 A S P E C T _ R A T I O = 2*2.0158 , &
12 ! See [ W a n g 2 0 1 3 ] for j u s t i f i c a t i o n of these p a r a m e t e r s .
13 S I G M A _ K = 3. _RK - sqrt (3. _RK ) , &
14 S I G M A _ N U _ E = 2. _RK * (2. _RK * sqrt (3. _RK ) - 3. _RK ) , &
15 T E M P _ C O L D _ W A L L = -0.5 , T E M P _ H O T _ W A L L = +0.5
16
17 type : : R B e n a r d S i m u l a t i o n
18 private
19 i n t e g e r ( IK ) : : mNx , mNy , & ! l a t t i c e size
20 mNumIters1CharTime , mNumItersMax , &
21 m N u m O u t S l i c e s ! user - s e t t i n g
22
23 type ( M r t S o l v e r B o u s s i n e s q 2 D ) : : m S o l v e r ! a s s o c i a t e d s o l v e r ...
24 ! NEW ( V e r s i o n 2): Use ’ O u t p u t N e t c d f ’ sink i n s t e a d of ’ O u t p u t A s c i i ’
25 type ( O u t p u t N e t c d f ) : : m O u t S i n k ! ... and output - w r i t e r
26
27 contains
28 private
29 procedure , p u b l i c : : init = > i n i t R B e n a r d S i m u l a t i o n
30 procedure , p u b l i c : : run = > r u n R B e n a r d S i m u l a t i o n
31 procedure , p u b l i c : : c l e a n u p = > c l e a n u p R B e n a r d S i m u l a t i o n
32 end type R B e n a r d S i m u l a t i o n
33
34 ! ......................
35 end m o d u l e R B e n a r d S i m u l a t i o n _ c l a s s
45 A more elegant approach would be to allow users to seamlessly switch between the two
types of sinks—for example, by adding an optional flag to the function which initializes a
RBenardSimulation-instance.
200 5 More Advanced Techniques
To place some numbers behind our claim for higher performance of the netCDF-
format relative to ASCII, on our test machine46 we found that writing all timesteps
for 5 characteristic time intervals (with the rest of the parameters the same as in
Listing 4.15), resulted in:
• ASCII -output: 7.2Gb of data, written in 488 s (producing over 100,000 files, and
writing only the temperature field)
• netCDF-output: 6.1Gb of data, written in 110 s (while producing a single file,
and writing all output fields, i.e., four times more simulation data than the ASCII
version)
After normalizing by the amount of simulation data written, this means that the
ASCII version required roughly 5 times more storage space and more than 17 times
more computer time. While the performance numbers will depend in general on the
hardware and on how often output is written, this is a good example of what can be
encountered in practice.
In this section, we discuss the steps for reading a netCDF-dataset of known structure.
This assumption does sacrifice some generality in the interest of keeping the code
simple.47 However, for many programs developed in ESS such a compromise is
reasonable. The application which we will later discuss in more depth uses this
approach for reading the World Ocean Atlas 2009 temperature dataset ([14]).
To read data from a pre-existing dataset, we first need to open it by calling the
nf90_open-procedure. This is similar to nf90_create discussed earlier, except
that the dataset will be set to data-mode directly (the define-mode is skipped by
default). With the dataset opened, we can start reading information stored inside.
When the names of the dimensions, variables and attributes inside the dataset are
known (as we assume here) this information retrieval process typically consists of
two phases: inquiring for IDs (based on a known dimension/variable name), and
then retrieving the (meta)data (based on the previously acquired ID). The specific
procedure calls for each of the entities that can appear in a classic netCDF-dataset
are:
• dimensions: Based on the dimension name, the ID of the dimension can be found
with a call to nf90_inq_dimid. Then, based on that ID, we can use the proce-
dure nf90_inquire_dimension to determine the length of the dimension.
46 Intel i7 (“Sandy Bridge” generation) CPU, 16Gb RAM, 7200 RPM spinning HDD.
47 Some of the programs which support the netCDF-formats need to be able to work with any
netCDF-files that users may provide (visualization software such as ncview or even the trusty
ncdump are good examples here). In such cases, the developers of the software can assume
very little about the structure of the input datasets; instead, this information needs to be gathered
dynamically at runtime. Note that the netCDF-library also has facilities for this later task, although
we do not cover them here.
5.2 Input/Output 201
48 Available at http://data.nodc.noaa.gov/thredds/fileServer/woa/WOA09/NetCDFdata/tempe
rature_annual_1deg.nc (02/21/2014).
49 Users of Unix-variants and of the (Cygwin) environment can use ncdump −h
temperature_annual_1deg.nc | less .
202 5 More Advanced Techniques
λi φi
E N
where {λiW , λiE } and {φiS , φiN } represent the longitude and latitude extents respec-
tively. It is not difficult to show that the contribution of di to the area of a cell is very
small (∼10−6 %). This allows us to further simplify the expression for Si :
Si = R 2E λiE − λiW sin φiN − sin φiS , (5.3)
Based on this, we define the mean seawater temperature at level k as the weighted
mean:
θi Si
θk = i , (5.4)
i Si
where the index i runs over the set of all ocean grid cells at level k (the ocean grid
cells are those where temperature is not equal to _FillValue).
Even this simple application has several phases (reading data, computing the
area of each cell, computing the weighted average, etc.). The World Ocean Atlas
2009 also contains information for other variables. Therefore, it is worthwile
to make our implementation generic enough to cope with similar datasets (for
salinity, dissolved oxygen, etc.—see also Exercise 5.20). We achieve this here
by using the object-oriented programming (OOP) approach. Most of the code
(see file Chapter5/read_noaa_data_netCDF/OceanData_class.f90 )
is for implementing the data type OceanData, its “init”-function, and the type-
bound procedure (“methods”). The basic structure of this module (omitting procedure
implementations) is:
module OceanData_class
use n e t c d f
use N u m e r i c K i n d s
use G e o m U t i l s
i m p l i c i t none
type , p u b l i c : : O c e a n D a t a
private
! dimension - l e n g t h s
i n t e g e r : : mNumLon , mNumLat , m N u m D e p t h
! a r r a y s to hold data
real ( R_SP ) , d i m e n s i o n (:) , a l l o c a t a b l e : : mLonVals , mLatVals , m D e p t h V a l s
real ( R_SP ) , d i m e n s i o n (: ,:) , a l l o c a t a b l e : : m L o n B n d s V a l s , m L a t B n d s V a l s
real ( R_SP ) , d i m e n s i o n (: ,: ,:) , a l l o c a t a b l e : : m D a t a V a l s
! additional metadata
real ( R_SP ) : : m D a t a F i l l V a l u e
contains
private
procedure , p u b l i c : : g e t D e p t h s
procedure , p u b l i c : : g e t M e a n D e p t h P r o f i l e
! internal
procedure : : cellHasValidData
procedure : : getCellArea
end type O c e a n D a t a
interface OceanData
5.2 Input/Output 203
The readers should hopefully feel comfortable with the structure of the application,
which is similar to that used in several previous examples (e.g. in Chap. 4). Therefore,
here we only discuss in detail the part where the data is read from the netCDF-file.
In particular, notice that each instance of our new type OceanData encapsulates
several data arrays, which need to be filled from the input dataset. This task is per-
formed by our “init”-function newOceanData, which creates a new OceanData-
instance, based on the name of the netCDF-file and on the name of the variable to
be read from that file (in our case, those will be“temperature_annual_1deg.nc” and
“t_an” respectively). This function is:
41 type ( O c e a n D a t a ) f u n c t i o n n e w O c e a n D a t a ( fileName , d a t a F i e l d N a m e ) r e s u l t ( res )
42 c h a r a c t e r ( len =*) , i n t e n t ( in ) : : fileName , d a t a F i e l d N a m e
43 ! local vars
44 i n t e g e r : : ncID , lonDimID , latDimID , d e p t h D i m I D , lonVarID , latVarID , &
45 depthVarID , dataVarID , lonBndsVarID , latBndsVarID
46
47 call n c C h e c k ( n f 9 0 _ o p e n ( path = fileName , mode = N F 9 0 _ N O W R I T E , n c id = ncID ) )
48
49 ! Read - in D i m e n s i o n s :
50 ! ( A ) r e t r i e v e dimension - IDs
51 call n c C h e c k ( n f 9 0 _ i n q _ d i m i d ( ncID , name = " lon " , d i m i d = l o n D i m I D ) )
52 call n c C h e c k ( n f 9 0 _ i n q _ d i m i d ( ncID , name = " lat " , d i m i d = l a t D i m I D ) )
53 call n c C h e c k ( n f 9 0 _ i n q _ d i m i d ( ncID , name = " d e p t h " , d i m i d = d e p t h D i m I D ) )
54 ! ( B ) read dimension - l e n g t h s
55 call n c C h e c k ( n f 9 0 _ i n q u i r e _ d i m e n s i o n ( ncID , lonDimID , len = res % m N u m L o n ) )
56 call n c C h e c k ( n f 9 0 _ i n q u i r e _ d i m e n s i o n ( ncID , latDimID , len = res % m N u m L a t ) )
57 call n c C h e c k ( n f 9 0 _ i n q u i r e _ d i m e n s i o n ( ncID , d e p t h D i m I D , len = res % m N u m D e p t h ) )
58
59 ! Can a l l o c a t e memory , now that dimension - l e n g t h s are k n o w n
60 a l l o c a t e ( res % m L o n V a l s ( res % m N u m L o n ) , res % m L a t V a l s ( res % m N u m L a t ) )
61 a l l o c a t e ( res % m D e p t h V a l s ( res % m N u m D e p t h ) , res % m L o n B n d s V a l s (2 , res % m N u m L o n ) )
62 a l l o c a t e ( res % m L a t B n d s V a l s (2 , res % m N u m L a t ) )
63 a l l o c a t e ( res % m D a t a V a l s ( res % mNumLon , res % mNumLat , res % m N u m D e p t h ) )
64
65 ! Read - in Dimension - V a r i a b l e s :
66 ! ( A ) r e t r i e v e variable - IDs
67 call n c C h e c k ( n f 9 0 _ i n q _ v a r i d ( ncID , " lon " , l o n V a r I D ) )
68 call n c C h e c k ( n f 9 0 _ i n q _ v a r i d ( ncID , " lat " , l a t V a r I D ) )
69 call n c C h e c k ( n f 9 0 _ i n q _ v a r i d ( ncID , " d e p t h " , d e p t h V a r I D ) )
70 ! ( B ) read variable - a r r a y s
71 call n c C h e c k ( n f 9 0 _ g e t _ v a r ( ncID , lonVarID , res % m L o n V a l s ) )
72 call n c C h e c k ( n f 9 0 _ g e t _ v a r ( ncID , latVarID , res % m L a t V a l s ) )
73 call n c C h e c k ( n f 9 0 _ g e t _ v a r ( ncID , d e p t h V a r I D , res % m D e p t h V a l s ) )
74
75 ! Read - in Bounds - V a r i a b l e s ( for lon / lat )
76 ! ( A ) r e t r i e v e variable - IDs
77 call n c C h e c k ( n f 9 0 _ i n q _ v a r i d ( ncID , " l o n _ b n d s " , lonBndsVarID ) )
78 call n c C h e c k ( n f 9 0 _ i n q _ v a r i d ( ncID , " l a t _ b n d s " , latBndsVarID ) )
79 ! ( B ) read variable - a r r a y s ( here , 2 D a r r a y s )
80 call n c C h e c k ( n f 9 0 _ g e t _ v a r ( ncID , l o n B n d s V a r I D , res % m L o n B n d s V a l s ) )
81 call n c C h e c k ( n f 9 0 _ g e t _ v a r ( ncID , l a t B n d s V a r I D , res % m L a t B n d s V a l s ) )
82
83 ! Read - in data - field - V a r i a b l e ( and a s s o c i a t e d a t t r i b u t e " _ F i l l V a l u e ")
84 call n c C h e c k ( n f 9 0 _ i n q _ v a r i d ( ncID , trim ( a d j u s t l ( d a t a F i e l d N a m e )) , d a t a V a r I D ) )
85 call n c C h e c k ( n f 9 0 _ g e t _ a t t ( ncID , dataVarID , " _ F i l l V a l u e " , &
86 res % m D a t a F i l l V a l u e ) )
87 call n c C h e c k ( n f 9 0 _ g e t _ v a r ( ncID , dataVarID , res % m D a t a V a l s ) )
88
89 call n c C h e c k ( n f 9 0 _ c l o s e ( ncID ) )
90 end f u n c t i o n n e w O c e a n D a t a
depth [m]
3000
4000
5000
6000
0 2 4 6 8 10 12 14 16 18 20
Fig. 5.1 Depth profile of seawater temperature, obtained by averaging in time (annual) and in space
(at each depth level) of the [14] dataset
For our purposes here (demonstrating how to read a netCDF-file), the interesting
code begins at line 47, where the dataset is opened. By choosing NF90_NOWRITE as
the mode-argument, we protect the file from accidental overwriting of data (possible
alternative modes are NF90_WRITE, for appending data to existing datasets or
NF90_SHARE for allowing data to be read by a process while another process is
writing50 ).
In lines 51–53, we obtain the IDs of the dimensions relevant to our task. These
IDs are then used in lines 55–57, where the sizes of the dimensions are read. After
preparing the arrays which will hold our data (lines 60–63), we proceed with reading
the variables containing the dimension information. This again involves a two-step
approach, whereby the functions nf90_inq_varid and nf90_get_var are
called for each variable (lines 66–73). The exact same approach is used (lines 76–
81) to read the bounds of the cells and, in lines 84–87, to finally read the temperature
field. The only peculiarity for this last operation is that for the temperature field
(t_an) we also need to read the special attribute which documents missing values
(lines 85–86).
Since no additional information needs to be read, we can close the file (line 89).
The curious reader will find a plot of the extracted temperature profile in Fig. 5.1.
50Modes can also be combined (when this makes sense) using the ior-function, as discussed in
Listing 5.15.
5.2 Input/Output 205
Exercise 20 (Extracting the salinity profile) Modify the code for reading the
World Ocean Atlas 2009 dataset (see directory Chapter5/read_noaa_
data_netCDF), to extract the salinity profile instead.
Hint:
for this task, it should be sufficient to modify the file Chapter5/read_noaa
_data_netCDF/read_noaa_woa_2009_data.f90, which contains the
main-program of the application. The required dataset (salinity_annual
_1deg.nc) can also be found at the same location as the temperature data
(http://data.nodc.noaa.gov/thredds/fileServer/woa/WOA09/NetCDFdata/tem
perature_annual_1deg.nc, as of 28.02.2014).
51This estimate assumes uniform refinement. Some models also use adaptive mesh refinement,
which is often more economic.
206 5 More Advanced Techniques
and also relatively easy to learn for beginners (Sect. 5.3.4). Finally, in Sect. 5.3.5, we
apply OpenMP to some of the example applications from Chap. 4, to demonstrate
this technology “in vivo” for the readers.
An “obvious” solution (at least in theory) for improving performance is to use mul-
tiple processors, which share the work of the application. Indeed, especially for
large supercomputers, this approach has become standard practice since the early
days. However, until approximately 2002, parallelism was less common outside
these large facilities, and relatively few developers used such technologies regularly.
One reason for this limited adoption was that parallel programming certainly has
a learning curve; for example, many types of bugs52 which are not possible in the
“serial world” can appear. In addition, parallel programs are often more difficult to
develop and understand, relative to their serial counterparts. Another major reason
was that hardware manufacturers managed, with each new generation of machines,
to significantly improve performance even for serial programs. This “free” speedup
was good enough for many developers, who could simply rely on the next hardware
upgrade cycle to improve the performance of their applications. Withoug delving
into details of computer architecture, we can distinguish several broad classes of
hardware innovations, which supported these performance increases:
1. increasing CPU clock rates: Hardware vendors have continued to find new ways
to increase the operating frequency of the CPUs. Assuming, for simplicity, that
each instruction (e.g. integer addition) takes a constant number of CPU cycles,
decreasing the duration of each cycle would probably increase the number of
instructions that can be completed in a given time. However, this “frequency
race” had to stop [27], as power limitations were reached.
2. CPU architecture advances: Unbeknownst to many programmers, parallelism
has long been present at the hardware level, even for “single core” CPUs. The
underlying idea is to re-arrange and group the instructions of the serial application,
such that at least some of the work is parallelized, while still preserving the
comfortable illusion of serial execution for the developers. These techniques are
known as instruction-level parallesim (ILP); examples in this class are instruction
pipelining, out-of-order execution, small-scale vectorization,53 etc.
52 Here, we can distinguish between correctness bugs (the program produces false results) and
performance bugs (the program is not using the resources of the underlying hardware efficiently).
53 The main idea here is to have instructions which operate on arrays instead of on scalar values.
Here, by vectorization we mean the single instruction, multiple data (SIMD) units of modern CPUs.
The term “vector computer” is also used, to refer to systems which implement the SIMD idea on a
much larger scale (see, e.g., Hager and Wellein [8] for details). While these machines have many
features which make them attractive for scientific computing, they became a niche product by the
time of our writing. Instead, hardware with vector-like capabilities, such as the SIMD units and
general-purpose graphics processing units (GPGPUs), are becoming increasingly popular.
5.3 A Taste of Parallelization 207
With the “bad news” out of the way, a positive aspect is that vendors still manage
to increase the number of transistors that can be placed on a chip; this is also known
as Moore’s “law” [22]. These additional transistors are nowadays used to support
explicit parallelism at all levels, from consumer hardware to supercomputers—serial
computers have become the exception. Given these tendencies and considering, in
addition, that computational demands in ESS (and other fields) are likely to continue
increasing in the future, most scientific programmers need to add parallelization to
their skills.
208 5 More Advanced Techniques
There are several plausible reasons for considering parallelism. For example, we
might be interested in minimizing the time to solution, being able to solve larger
problems, increasing throughput, or decreasing the power expended for achieving a
result.
For simplicity, we focus here mostly on the first goal (minimizing time to solution),
where multiple execution units are made to work in parallel, to solve a problem of
constant size faster. Note that very often the second goal (solving a larger problem)
is also quite common (for example, when switching to a higher-resolution grid in a
ESS model).
When the size of the problem is kept fixed, we can define speedup (also known
as “scalability”) as a simple metric for the effectiveness of a parallelized program:
T1
S(N ) = (5.5)
TN
where T1 represents the necessary computing time when using a single execution
unit (e.g. single core), and TN is the time when using N execution units. Ideally, we
would have linear speedup (also known as “perfect speedup” or “perfect scaling”):
However, real-world speedup is often less.54 To quantify how much less, parallel
efficiency is commonly calculated, as:
S(N ) S(N )
(N ) = = (5.7)
S(N )ideal N
There are multiple reasons why good speedup may not be achievable. A first such
reason is that not all work in a program may be parallelizable. For example, in an ESS
model, some model parameters may need to be read at the beginning of a simulation,
prior to any calculation. Consider Fig. 5.2a, where we sketch these different types
of workloads for an application running on a single processor/core. Let us denote
54 Interestingly, it is also possible (in rare cases) to get superlinear speedup, where S(N ) >
S(N )ideal . This can happen, for example, if a problem is too large to fit inside the cache of one
processor, but small enough to fit into the aggregated caches of the N processors.
5.3 A Taste of Parallelization 209
Fig. 5.2 Simplified scenarios for division of work in parallelization: a initial serial application,
with some “parallelizable” work, b parallel execution with good load balancing, and c parallel
execution with unbalanced workloads (some processors spend significant amounts of time waiting
for latecomers). N represents the number of processing units (cores)
by T1s the time spent on the non-parallelizable tasks55 of the program (labeled as
p
“serial” in the diagram), and by T1 the remaining time, spent on tasks which could
be parallelized (labeled as “par” in the diagram); by definition, we have:
p
T1 = T1s + T1 (5.8)
T1s
fs ≡ (5.10)
T1
55 For simplicity, in our sketch we placed the serial fraction at the beginning of the program’s
runtime. However, periods of serial execution (“serialization”) are often distributed more widely
throughout the runtime of the program.
210 5 More Advanced Techniques
Fig. 5.3 Parallel speedup, as predicted by Amdahl’s law, when no load inbalance occurs. For
illustration, we present three values of the serial work fraction: a f S = 5 % (green), b f S = 10 %
(red), and c f S = 20 % (cyan). For each curve, we show the range of processor counts where
efficiency drops below 50 % (hatched area), and the maximum achievable speedup Smax (continuous
line)
T1 1
S o (N ) = = (5.11)
TNo f s + 1−N fs
which is also known as Amdahl’s law [1]. We illustrate the predicted speedup in
Fig. 5.3. Note that, even with our optimistic assumption of ideal load balance, there
is an upper limit on the achievable speedup (Smax = 1/ f S ). Also, for f S 1, parallel
efficiency drops below 50 % as soon as we achieve half of the maximum speedup.
The expected speedup is even worse if the work inside the parallel region cannot
be distributed equally among the processors (see Fig. 5.2c). Fortunately, for many
applications it is more useful to increase the problem size (while keeping the amount
of work per processor roughly constant). This scenario, also known as weak scaling,56
leads to much more encouraging speedup numbers. We recommend the books of
McCool et al. [19] and of Hager and Wellein [8] for more advanced performance
models, which consider the weak scaling scenario, as well as other important factors
56The situation for Amdahl’s law, where the total problem size is kept constant, is known as strong
scaling.
5.3 A Taste of Parallelization 211
57 This is the source of most complexity in OpenMP. In general, communication (implicit or explicit)
is the point where all parallelization technologies claim the attention of the programmer.
58 Note that MPI can also be used for shared memory machines.
212 5 More Advanced Techniques
duced by the Fortran 2008 standard), which provide native support for paralleliza-
tion in Fortran. This belongs to the class of languages known as Partitioned Global
Address Space (PGAS), which combine aspects of both MPI and OpenMP, with
very concise semantics. Despite being a very interesting new language feature in
Fortran, it is beyond the scope of our text (interested readers can consult, e.g.,
Metcalf et al. [21] for more information).
• OpenCL and OpenACC are newer standards, catering for the increasing popular-
ity of GPGPU and other compute-accelerators such as the Intel Xeon Phi. OpenCL
is implemented as a C/C++-language dialect, while OpenACC can be viewed as
a set of pragmas (compatible with C, C++, as well as Fortran), similar in spirit to
OpenMP.
Interestingly, many HPC applications today use a hybrid approach to parallelization,
combining two or more of the parallel programming models above. The boundaries
between these models are also becoming less distinct as the standards are evolving;
for example, the more recent versions of OpenMP (4.0) also introduced support
for SIMD vectorization and for compute-accelerators such as GPGPUs. We do not
cover these features in this text, but interested readers may want to keep an eye on this
technology as compiler support matures, since it could provide a unified platform
for all types of parallelism within a node.
Our approach for illustrating OpenMP-concepts will depend on code examples. Here,
we summarize some of the “infrastructure” provided by OpenMP, to make it easier
for the reader to follow these examples.
5.3 A Taste of Parallelization 213
64 OpenMP only aims to make this process more palatable than working directly with low-level OS
threading libraries.
65 This is not necessary for C and C++, which use curly brackets to surround blocks of code.
5.3 A Taste of Parallelization 215
Assuming for simplicity that there is no load imbalance within the parallel regions, the
basic execution model for a program using OpenMP can be viewed as a generaliza-
tion of the scenario we used for deriving Amdahl’s law (in Sect. 5.3.2.1). The main
216 5 More Advanced Techniques
worker
threads
fork join
par par
program
start serial serial ... serial
program
end
par par
par master
thread
Fig. 5.4 Schematic of the OpenMP execution model. The gray line represents the master thread.
During the serial sections in the program, only the master thread is working. However, when a
parallel section is encountered, a team of threads is forked, which work together with the master
until the next serial section is encountered. Threads are given integer IDs starting at zero, with
ID 0 assigned to the master thread
print-statement, on line 6). When run, the program above will print our message
several times.
How many threads are there? Exactly how many times the message will be printed
on your system (i.e. how many threads will be in the team) depends on several factors.
Usually, the OpenMP runtime library will take this number equal to the number of
(logical) cores in the system. However, programmers may request a different number
of threads, by specifying a value to the OMP_NUM_THREADS environment variable.
For example, on Unix systems using the bash shell, we may use:
$ O M P _ N U M _ T H R E A D S =2 ./ h e l l o _ w o r l d _ p a r 1
to request two threads just for one run, or export the value, to set the requested number
of threads globally (will apply to all programs started from that shell instance):
$ e x p o r t O M P _ N U M _ T H R E A D S =2
The advantage of the environment variable is that users have the freedom to decide
how many threads to use. However, it is also possible to specify the number of threads
in the code itself, by adding a num_threads-clause to the parallel-directive.
For example, the following programs asks the user to specify a number of threads at
runtime66 :
1 program hello_world_par2
2 i m p l i c i t none
3 integer : : nThreads
4
5 w r i t e (* , ’( a ) ’ , a d v a n c e = ’ no ’) " n T h r e a d s = "
6 read * , n T h r e a d s
7
8 !$omp p a r a l l e l n u m _ t h r e a d s ( n T h r e a d s )
9 p r i n t * , " Hello , w o r l d of M o d e r n F o r t r a n ! P a r a l l e l too ! "
10 !$omp end p a r a l l e l
11 end p r o g r a m h e l l o _ w o r l d _ p a r 2
Listing 5.22 src/Chapter5/hello_world_par2.f90
Note that we used the term requested above. As it turns out, for security reasons, the
runtime may allocate a lower number of threads than requested. Therefore, if the
number of threads appears within the code, one should always check the actual num-
ber of threads, using the omp_get_num_threads -function, which we demon-
strate later.
run any faster! For example, in Listings 5.21 and 5.22, we were not sharing the
work but rather doing the same work, in parallel, multiple times: the threads exe-
cuted exactly the same instructions. This is, of course, not very useful—in practice,
we want each thread to execute a subtask. In other words, we want each thread
to execute a different code path or, for array-oriented problems, to apply the same
instructions but to different sub-partitions of the array. In this section, we will discuss
several methods for achieving this in OpenMP.
Differentiation by IDs and the SPMD pattern One method to assign different tasks
to different threads with OpenMP is to manually divide the work, based on the thread
ID. To illustrate, here is how we could extend the program from Listing 5.22, so that
we get different messages from the master and worker threads:
1 program hello_world_par3
2 use o m p _ l i b
3 i m p l i c i t none
4 integer : : nThreads
5
6 write (* , ’ ( a ) ’ , a d v a n c e = ’ no ’) " n T h r e a d s = "
7 read * , n T h r e a d s
8
9 !$omp p a r a l l e l n u m _ t h r e a d s ( n T h r e a d s )
10 if ( o m p _ g e t _ t h r e a d _ n u m () == 0 ) then
11 write (* , ’ (a ,x , i0 ,x , a ) ’) " Hello from MASTER ( team has " , &
12 o m p _ g e t _ n u m _ t h r e a d s () , " t h r e a d s ) "
13 else
14 write (* , ’ (a ,x , i0 ) ’) " Hello from W O R K E R n u m b e r " , o m p _ g e t _ t h r e a d _ n u m ()
15 end if
16 !$omp end p a r a l l e l
17 end p r o g r a m h e l l o _ w o r l d _ p a r 3
First, in line 2 we use the omp_lib module, which allows us to access the
runtime library. For simplicity, we do not worry about sequential equivalence in
this example. Due to the omp parallel directive, all threads will start by eval-
uating line 10, where the function omp_get_thread_num is used for the
logical expression of the if-statement. The master thread will then execute the
first write-statement (lines 11–12), where we use another OpenMP-function,
omp_get_num_threads , to get the total number of threads in the team, includ-
ing the master. As already noted, this number may well be different from nThreads,
so we always have to check the actual number. Unlike the master thread, the workers
will execute the other write-statement (line 14). This pattern for distributing work
is also known as single program, multiple data SPMD, and it may look familiar to
readers with some MPI67 experience.
Since it assumes that the distribution of tasks is done by the programmer (based on
the thread ID), the SPMD pattern is a quite general approach for parallel computing.
The disadvantage, on the other hand, is that the code can become quite verbose,
especially when the number of tasks is not exactly divisible by the number of threads.
OpenMP includes many worksharing constructs, which greatly simplify this task.
Parallel sections A simple worksharing construct is sections, which is
useful when we can identify a small (and fixed) number of subtasks in our algorithm.
67 There, a similar idea is used for distributing the work, only that we refer to MPI ranks instead of
thread IDs.
5.3 A Taste of Parallelization 219
For example, assume that we have two tasks (A and B), and we want to run them in
parallel (without being concerned with which thread executes which of the tasks).
Our implementation based on sections would look like:
1 s u b r o u t i n e d o T a s k A ()
2 i m p l i c i t none
3 w r i t e (* , ’( a ) ’) " W o r k i n g hard on task A ! "
4 end s u b r o u t i n e d o T a s k A
5
6 s u b r o u t i n e d o T a s k B ()
7 i m p l i c i t none
8 w r i t e (* , ’( a ) ’) " W o r k i n g hard on task B ! "
9 end s u b r o u t i n e d o T a s k B
10
11 program demo_par_sections
12 i m p l i c i t none
13
14 !$omp p a r a l l e l n u m _ t h r e a d s (2)
15 !$omp s e c t i o n s
16 !$omp s e c t i o n
17 call d o T a s k A ()
18 !$omp s e c t i o n
19 call d o T a s k B ()
20 !$omp end s e c t i o n s
21 !$omp end p a r a l l e l
22 end p r o g r a m d e m o _ p a r _ s e c t i o n s
Listing 5.24 src/Chapter5/demo_par_sections.f90
Applicability of omp do
For a loop to be correctly parallelizable using the omp do construct, most
of the limitations we discussed previously for the do concurrent con-
struct (Sect. 2.6.7.2) should also be satisfied. To summarize, there should be
no inter-dependencies between the iterations of the loop, so that the com-
piler is free to execute those iterations in any order. Although (unlike for
do concurrent ) the compiler may not complain even if there is a viola-
tion of this principle, programmers need to be aware of this possible pitfall.
In Sect. 5.3.5.3, we will demonstrate how to use the omp do construct to paral-
lelize the LBM solver we developed in the previous sections.
single Inside a parallel-region, it is possible to isolate some code so that
it is executed only by a single thread. Although this may seem counter-intuitive,68
it sometimes makes sense. A common usage is for initializing a global variable in
which we store the actual number of threads in the current team,69 as in:
1 program demo_omp_single
2 use o m p _ l i b
3 i m p l i c i t none
4 integer : : nThreads
5
6 !$omp p a r a l l e l
7 !$omp s i n g l e
8 n T h r e a d s = o m p _ g e t _ n u m _ t h r e a d s ()
9 w r i t e (* , ’ (2( a , x , i0 , x ) , a ) ’) " T h r e a d " , o m p _ g e t _ t h r e a d _ n u m () , &
10 " says : team has " , nThreads , " t h r e a d s "
11 !$omp end s i n g l e
12
13 ! r e m a i n i n g code within parallel - r e g i o n e x e c u t e d by all
14 w r i t e (* , ’( a , x , i0 , x , a ) ’) " T h r e a d " , o m p _ g e t _ t h r e a d _ n u m () , &
Whichever thread (master or worker) “arrives” at the single-region first will exe-
cute the code inside the construct, while the other skip the code inside the construct
and wait for that thread to finish. Afterwards, the entire team will execute the remain-
ing code inside the parallel-region.
Compact forms for worksharing constructs A very common pattern when work-
ing with OpenMP is to have a parallel-region which simply wraps around a
worksharing construct (such as sections or do ). For such situations, OpenMP
supports abbreviated notations ( parallel sections and parallel do ),
to make the code more readable.
For example, here is how we could use this feature for the example in Listing 5.24:
!$omp p a r a l l e l n u m _ t h r e a d s (2)
!$omp s e c t i o n s
!$omp s e c t i o n
call d o T a s k A ()
!$omp s e c t i o n
call d o T a s k B ()
!$omp end s e c t i o n s
!$omp end p a r a l l e l
Listing 5.27 Verbose form of omp sections .
!$omp p a r a l l e l s e c t i o n s n u m _ t h r e a d s (2)
!$omp s e c t i o n
call d o T a s k A ()
!$omp s e c t i o n
call d o T a s k B ()
!$omp end p a r a l l e l s e c t i o n s
Listing 5.28 Equivalent, compact form of omp sections .
Similarly, the example in Listing 5.25 can also be made more concise:
!$omp p a r a l l e l
!$omp do
do i =1 , n u m E l e m s
arr ( i ) = sin ( arr ( i ) )
end do
!$omp end do
!$omp end p a r a l l e l
Listing 5.29 Verbose form of omp do .
!$omp p a r a l l e l do
do i =1 , n u m E l e m s
arr ( i ) = sin ( arr ( i ) )
end do
!$omp end p a r a l l e l do
Listing 5.30 Equivalent, compact form of omp do .
222 5 More Advanced Techniques
There is no compact version for omp single, since it makes no sense to start a
team of threads and then to assign work only to a single thread from the team.
As a final note, while the compact versions are clearly easier to read, the reader
should also remember that they are less general (when we need some code which is
still inside the parallel-regions but outside the worksharing constructs, we have
to use the verbose form).
The reader may have noticed that, for the OpenMP-examples so far, we largely
avoided working too much with variables inside parallel-regions.70 Obviously,
in real applications we need more flexibility, and OpenMP would not be very useful
if it were so restrictive. However, before we present more realistic examples, we need
to discuss some rules governing the scope of data. This knowledge will allow us to
avoid conflict situations, such as having multiple threads trying to update the same
variable at the same time (this scenario belongs to a class of problems unique to
concurrent/parallel software, known as race conditions71 ). This is a crucial aspect,
and it is also where OpenMP differs significantly from MPI.
Automatically shared variables Consider Fig. 5.5, where we illustrate some of
the aspects related to data access in the OpenMP-model, along with sample code
snippets. When a user launches a program, all code and data used by the program is
grouped by the OS into a process. The threads of the process also reside within this
context. From the point of view of the threads, unless specified otherwise, variables
or constants are shared if they were declared:
1. prior to the parallel-region (but in the same program unit)
2. in the data section of an imported module (as a public entity)
3. with the save-attribute in a procedure which is called within the parallel-
region.
For example, in Fig. 5.5, variable x would be shared by the threads, because
it was declared in the same program unit as the parallel-region, and there is no
clause to special “privatize” it.
Automatically private variables In addition to shared-data, the threads also
have private-regions of memory,72 which cannot be accessed by other threads
70 In particular, we had assignments to variables only in Listing 5.25 (when each thread was
guaranteed to write to different portions of the arr-array, at line 13), and in Listing 5.26 (where
the assignment was performed by a single thread, since it occured within a omp single region,
at line 8).
71 In general, a race condition can occur whenever we have parallel tasks involving a variable, and
local variables of a procedure would also reside (multiple threads are often supported by splitting
the stack).
5.3 A Taste of Parallelization 223
Context of Process
- instructions (of program and shared-libraries) shared
- global variables (e.g. in data-section of MODULEs) integer :: x
x
- heap memory (dynamically-allocated data) real :: y
...
!$omp parallel num_threads(2) private(y)
...
Context of Thread #0: Context of Thread #1: call someWork()
- thread ID - thread ID ...
- stack-variables & stack-pointer - stack-variables & stack-pointer !$omp end parallel
- program counter - program counter subroutine someWork()
- register-values y z - register-values y z
integer :: z
... private ... private ... .f90
end subroutine someWork
Fig. 5.5 Schematic of the OpenMP execution context (memory regions) for a program launched
by the user
normally. In Fortran,73 there are two common situations when variables automatically
become part of this private-memory:
1. if they are local to procedures called from the parallel-regions, or
2. if the variable represents the index of a loop (do, implied-do, or forall) that
is preceded by an omp do directive.
For example, the first rule applies to variable z in Fig. 5.5: when threads will
start executing the subroutine someWork , they will each get their own copy of
z . On the other hand, if z would have been declared with the save -attribute,
it would be shared by all threads. The second rule applies to Listing 5.25, where
the loop index needs to be private for the parallel function evaluation to work
properly.
Also in the private memory areas, the OpenMP-runtime stores for each thread
some internal bookkeeping information, such as the thread ID (which we already
encountered) and an individual program counter (since different threads will gener-
ally execute different instructions in each cycle74 ).
Explicitly controlling scope of variables In addition to the implicit scoping rules
mentioned above, OpenMP also allows programmers to further refine the data scope
(i.e. to select what data is shared by the threads, and what is private to each
thread). When none of the two cases from the previous paragraph applies, the
implicit assumption in OpenMP is that variables are shared. This is often not the
intended behavior, so programmers can also “privatize” variables, by adding them
73 In C and C++, it is also allowed to have variable declarations inside the parallel-regions, and
those are also made private. However, this mechanism is currently not supported in Fortran.
74 Even if there are no divergent program flow paths (such as ifs) inside the parallel-
section, there is always some “system noise”, so that threads are not guaranteed to work perfectly
synchronized.
224 5 More Advanced Techniques
When a variable is made private to each thread for the duration of a parallel-
region, the OpenMP-standard allows the new thread-local variables to have any
random value – the assumption is that the programmer will take care of initializa-
tions (e.g. somewhere at the beginning of the region), before the values are used for
any computations. Similarly, when the parallel-region ends, the values of any
variables which are private to a thread are effectively lost.
Of course, for most real-world algorithms we need the threads to communicate
with the larger context of the process. OpenMP accomodates this need with several
mechanisms, of which we only demonstrate a few.
firstprivate and lastprivate We begin with two simple patterns which
occur so frequently that OpenMP provides special support:
• initializing variables with firstprivate : It is often necessary to initialize
a private-variable with the value of the variable before “privatization” (in the
single-threaded region). This is the role of the firstprivate -clause, which
is a superset of private.
• propagating the “last” value with lastprivate : A second common require-
ment is to propagate the “final” value of a private-variable outside a paral-
lel-region, so that the next single-threaded region can use this value. This can be
achieved with the lastprivate -clause. Note that this only works when there
is a natural sequential ordering of the tasks (e.g. omp do and omp sections).
To illustrate these clauses (and how they differ from private), consider the
following example:
226 5 More Advanced Techniques
1 program demo_first_last_private
2 use o m p _ l i b
3 i m p l i c i t none
4 i n t e g e r : : x =1 , y =2 , z =3 , i =4
5
6 w r i t e (* , ’ (4( a , i0 )) ’) " A ( s e r i a l ) : : x = " , x , &
7 ", y =", y, ", z =", z, ", i =", i
8
9 w r i t e (* ,*) ! output - s e p a r a t o r
10
11 !$omp p a r a l l e l p r i v a t e ( x ) s h a r e d ( y ) &
12 !$omp f i r s t p r i v a t e ( z )
13 w r i t e (* , ’ (5( a , i0 )) ’) &
14 " B ( p a r a l l e l ) : : T h r e a d " , o m p _ g e t _ t h r e a d _ n u m () , &
15 " says : x = " , x , " , y = " , y , " , z = " , z , &
16 ", i =", i
17 ! a s s i g n to p r i v a t e v a r i a b l e
18 x = 2* o m p _ g e t _ t h r e a d _ n u m ()
19
20 w r i t e (* , ’ (5( a , i0 )) ’) &
21 " C ( p a r a l l e l ) : : T h r e a d " , o m p _ g e t _ t h r e a d _ n u m () , &
22 " says : x = " , x , " , y = " , y , " , z = " , z , &
23 ", i =", i
24 !$omp end p a r a l l e l
25
26 w r i t e (* ,*) ! output - s e p a r a t o r
27
28 !$omp p a r a l l e l do s h a r e d ( y )
29 do i =1 , 42
30 y = y + i ! *** BUG ! *** ( data - race )
31 end do
32 !$omp end p a r a l l e l do
33
34 w r i t e (* , ’ (4( a , i0 )) ’) " D ( s e r i a l ) : : x = " , x , &
35 ", y =", y, ", z =", z, ", i =", i
36
37 !$omp p a r a l l e l s e c t i o n s l a s t p r i v a t e ( i )
38 !$omp s e c t i o n
39 i = 11
40 !$omp s e c t i o n
41 i = 22
42 !$omp s e c t i o n
43 i = 33
44 !$omp end p a r a l l e l s e c t i o n s
45
46 w r i t e (* , ’ (4( a , i0 )) ’) " E ( s e r i a l ) : : x = " , x , &
47 ", y =", y, ", z =", z, ", i =", i
48 end p r o g r a m d e m o _ f i r s t _ l a s t _ p r i v a t e
Listing 5.31 src/Chapter5/demo_first_last_private.f90
parallel-region ends (line 24), the original value75 of the variable is shown
(at checkpoints D and E ). Also, because x is not explicitly mentioned in data
clauses for the second and third parallel-regions, it becomes shared.
• y : For the first two parallel-regions (lines 11 and 28), we declare this variable
as shared, which causes all threads to access the same memory location. Since
the value is not updated in the first parallel-region, we get the initial value
(y=2) at checkpoints B and C . However, in the second parallel-region, at
line 30, we update this shared value. This is a classic example of a data race,
caused by the fact that the loop iterations are inter-dependent. While this do-loop
would be perfectly valid when executed serially, the result becomes undetermined
when running in parallel, because nothing stops here two threads from updating
y at the same time. Therefore, the result at checkpoint D will be in general
non-deterministic, and dependent on the number of threads (which we encourage
the reader to try). The last parallel-region does not change y , so the same
value is reported at checkpoint E . However, its status is also shared (but now
due to implicit rules).
• z : The variable z is declared as firstprivate on line 12 which causes the
initial value to be copied inside the private-versions of this variable, which the
compiler creates for each thread. Note that this was not the case for x . For the
rest of the program (lines 26–47), z becomes shared due to the implicit rules.
• i : Finally, i is silently shared for the first region. However, for the second
region (lines 27–32), it becomes a private-variable, because it is the index of
the loop. Since the loop has a pre-determined range of values to iterate through,
there is no need for initialization. In the last region (lines 37–44), i is declared
as lastprivate, which will cause the value from the sequentially last task (33)
to be copied outside, as reported at checkpoint E .
there are many in OpenMP (critical, atomic, barrier, etc.). However, these
techniques are beyond the scope of our text—in the following case studies, we will
use shared-arrays, but we restrict the update operations, so that there are no con-
flicts or dependencies between the individual node updates for each timestep.
In this section, we provide more realistic use cases for OpenMP, by adding parallelism
to the applications we presented in Sect. 4.1 (heat diffusion solver) and Sect. 4.3
(LBM-MRT solver for the Rayliegh-Bénard (RB) problem). Since both applications
received further improvements earlier in this chapter, we choose those versions as
starting points for parallelization.
One of the important advantages of using OpenMP is that parallelism can often be
added incrementally, by profiling the application after each significant change, to
check where most of the computing time is spent.
A profiler is an application which can analyze our program, to characterize various
aspects of its behaviour (performance “hotspots”, call graph, etc.). One such tool
is the GNU Proflier gprof (open source, available on most Unix-systems, but
without dedicated support for OpenMP at the moment). Profiling a serial program
with gprof involves repetitions of three basic steps:
1. compile/link with gprof support: On most Unix platforms, gprof requires
us to add the −pg flag to both the compilation and linking stages. This will
cause the final executable to contain additional code for tracking function call
times.
2. running the program: For the second step, we need to run our program as usual.
The main difference is that the program will also create gmon.out , which is a
binary file (not human-readable) where the profiling result is stored.
3. inspecting the result: Last, we invoke the gprof program itself, which parses
gmon.out and produces a human-readable summary. Several options are per-
mitted at this stage, to display several aspects of the analysis. For our purposes
here,77 we will use the following syntax for this stage:
$ g p r o f - p - b ./ p r o g r a m _ n a m e ./ gmon . out
In addition to gprof, readers will probably find more advanced profiling tools,
especially on HPC systems. Many such tools are supported by the hardware vendors
77 We encourage the reader to check the official website for more information.
5.3 A Taste of Parallelization 229
As a first practical example, let us consider again the simple solver for the 2D heat
diffusion equation, which we developed in Sect. 4.1 and extended in Sect. 5.2.1. In
this section we describe how we can apply OpenMP to obtain a “low hanging fruit”
improvement in performance.
Profiling of the serial program Before we invest any effort into parallelization,
we need to determine unequivocally the hotspots in our program. Here, we do this
using gprof. The three steps mentioned in Sect. 5.3.5.1 are applied here too, to the
serial version of the application as in Sect. 5.2.1. First, we compile the program with
profiling support:
$ g f o r t r a n - O2 - m a r c h = n a t i v e - pg -o s o l v e _ h e a t _ d i f f u s i o n _ v 2 { ,. f90 }
This creates the file gmon.out , which we analyze with the command:
$ g p r o f - p - b ./ s o l v e _ h e a t _ d i f f u s i o n _ v 2 ./ gmon . out
78 However, note that in typical ESS applications there will often be small fluctuations in the results,
due to the non-associativity of floating-point operations [23]; therefore, some tolerances may need
to be allowed when comparing results.
230 5 More Advanced Techniques
0.00 10.64 0.00 1 0.00 0.00 __config_class_MOD_createconfig
In the output, the first column displays the percentage of total time that was spent
in each procedure (we can recognize some of the type-bound procedures of the
Solver type in the last column). We notice that most of the effort is spent (in
almost equal proportions) executing the subroutines advanceU and advanceV.
Since these update the two sub-solution fields, which form the core of our algorithm,
the profiling result will probably not surprise the reader. However, it is generally
a good idea to profile often, since intuition often fails, especially in more complex
applications.
Parallelization with OpenMP The reader may have already noticed that fields U
and V can be updated at the same time, without any data conflicts. Therefore, the par-
allelization “effort” involves, in this case, nothing more than adding some directives
for parallel sections in the subroutine run shown below (with indentation used to
mark the nesting of OpenMP-constructs). Because the two tasks are already pack-
aged as subroutines, the data scope for parallelization is quite simple—the “class”
instance (this-variable) is shared by the threads, due to the default scoping rules:
191 s u b r o u t i n e run ( this ) ! m e t h o d for time - m a r c h i n g
192 c l a s s ( S o l v e r ) , i n t e n t ( inout ) : : this
193 i n t e g e r ( IK ) : : k ! dummy index ( time - m a r c h i n g )
194
195 do k =1 , this % m N u m I t e r s M a x ! MAIN loop
196 ! simple progress - m o n i t o r
197 if ( mod (k -1 , ( this % m N u m I t e r s M a x - 1 ) / 1 0 ) == 0 ) then
198 write (* , ’( i5 , a ) ’) nint (( k * 1 0 0 . 0 ) / this % m N u m I t e r s M a x ) , " % "
199 end if
200
201 ! NEW : O p e n M P p r a g m a s b e l o w
202 !$omp p a r a l l e l n u m _ t h r e a d s (2)
203 !$omp s e c t i o n s
204 !$omp s e c t i o n
205 call this % a d v a n c e U () ! task for 1 st thread
206
207 !$omp s e c t i o n
208 call this % a d v a n c e V () ! task for 2 nd t h r e a d
209 !$omp end s e c t i o n s
210 !$omp end p a r a l l e l
211 this % m C u r r I t e r = this % m C u r r I t e r + 1 ! t r a c k i n g time step
212 end do
213 end s u b r o u t i n e run
Listing 5.32 src/Chapter5/solve_heat_diffusion_v3/solve_heat_
diffusion_v3.f90 (excerpt)
This modification alone brought a speedup of ∼1.9 when using two threads79 on our
test machine, which is encouraging. However, it turns out to be more difficult to scale
our chosen numerical algorithm beyond two threads, because the “semi-implicit”
algorithm of Barakat and Clark [2] severely restricts the number of node-update
sequences which lead to a correct result. An interesting class of such sequences form
the basis of the wavefront parallelization technique [8], which could be used in this
case. However, this is beyond the scope of this text.
The heat diffusion solver is a good case in point, showing that an algorithm
which may perform well in serial can lead to difficulties during parallelization. For
example, in this particular case, a parallel iterative algorithm (e.g. [9]) may lead to
better utilization of the hardware.
For our last showcase for OpenMP, we will parallelize the LBM solver, which we
introduced in Sect. 4.3 and extended in Sect. 5.2.2.4 (by adding netCDF-support).
Unlike the previous example, this application can attain good scalability without
having to restructure the entire algorithm (although, as we will show, there are still
some potential traps along the way to good performance).
Profiling of the serial program Similar to the previous case study, the first step
is to profile the serial version (lbm2d_mrt_rb_v2). As already noted, we only
write output for the initial and final timesteps, since we focus here just on acceler-
ating the raw computations; this is achieved by setting numOutSlices=2 in file
src/Chapter5/lbm2d_mrt_rb_v2/lbm2d_mrt_rb_v2.f90 . To enable
profiling, we need to append the −pg flag to variables FFLAGS and LDFLAGS
(see file src/Chapter5/lbm2d_mrt_rb_v2/Makefile.profiling ). We
generate the human-readable version of the profile, using steps similar to the previous
test case. A sample result on our system is80 :
Flat p r o f i l e :
. . . ( more f u n c t i o n s here , but which do not take much time ) . . .
80To make the output fit in the page, we removed the column indicating the number of calls, and
we also made the names more compact.
232 5 More Advanced Techniques
Parallelization with OpenMP Our simple approach for parallelizing this application
will consist of parallelizing the spatial sweep in the advanceTimeMrtSolver
Boussinesq2D type-bound procedure. The new version is shown below:
173 f u n c t i o n a d v a n c e T i m e M r t S o l v e r B o u s s i n e s q 2 D ( this ) r e s u l t ( res )
174 use o m p _ l i b
175 c l a s s ( M r t S o l v e r B o u s s i n e s q 2 D ) , i n t e n t ( i n o u t ) : : this
176 ! local vars
177 i n t e g e r ( IK ) : : x , y , i , old , new , res
178 i n t e g e r ( IK ) , d i m e n s i o n ( 0 : 1 ) : : dest
179 real ( RK ) : : f l u i d M o m s (0:8) , t e m p M o m s (0:4) , &
180 f l u i d E q M o m s (0:8) , t e m p E q M o m s ( 0 : 4 )
181 integer , save : : n u m T h r e a d s = -9999
182
183 ! initializations
184 dest = 0; f l u i d M o m s = 0. _RK ; t e m p M o m s = 0. _RK
185 f l u i d E q M o m s = 0. _RK ; t e m p E q M o m s = 0. _RK
186 old = this % mOld ; new = this % mNew
187
188 !$omp p a r a l l e l &
189 !$omp s h a r e d ( this , old , new , n u m T h r e a d s ) p r i v a t e (x , y , i ) &
190 !$omp f i r s t p r i v a t e ( fluidMoms , tempMoms , fluidEqMoms , tempEqMoms , dest )
191
192 !$omp s i n g l e
193 if ( n u m T h r e a d s == -9999 ) then
194 n u m T h r e a d s = o m p _ g e t _ n u m _ t h r e a d s ()
195 end if
196 !$omp end s i n g l e
197
198 !$omp do
199 do y =1 , this % mNy
200 do x =1 , this % mNx
201 call this % c a l c L o c a l M o m s M r t S o l v e r B o u s s i n e s q 2 D ( x , y , fluidMoms , t e m p M o m s )
202
203 ! add 1 st - half of force term ( Strang s p l i t t i n g )
204 f l u i d M o m s (2) = f l u i d M o m s (2) + this % m A l p h a G *0.5 _RK * t e m p M o m s (0)
205
206 ! save m o m e n t s r e l a t e d to o u t p u t
207 this % m R a w M a c r o s ( x , y , :) = &
208 [ f l u i d M o m s (0) , f l u i d M o m s (1) , f l u i d M o m s (2) , t e m p M o m s (0) ]
209
210 call this % c a l c L o c a l E q M o m s M r t S o l v e r B o u s s i n e s q 2 D ( dRho = f l u i d M o m s (0) , &
211 uX = f l u i d M o m s (1) , uY = f l u i d M o m s (2) , temp = t e m p M o m s (0) , &
212 fluidEqMoms = fluidEqMoms , tempEqMoms = tempEqMoms )
213
214 ! c o l l i s i o n ( in moment - space )
215 f l u i d M o m s = f l u i d M o m s - this % m R e l a x V e c F l u i d * ( f l u i d M o m s - f l u i d E q M o m s )
216 tempMoms = tempMoms - this % m R e l a x V e c T e m p * ( t e m p M o m s - t e m p E q M o m s )
217
218 ! add 2 nd - half of force term ( Strang s p l i t t i n g )
219 f l u i d M o m s (2) = f l u i d M o m s (2) + this % m A l p h a G *0.5 _RK * t e m p M o m s (0)
220
221 ! map m o m e n t s back onto DFs ...
222 ! ... fluid
223 do i =0 , 8
224 this % mDFs (i , x , y , old ) = d o t _ p r o d u c t ( M _ I N V _ F L U I D (: , i ) , f l u i d M o m s )
225 end do
226 ! ... temp
227 do i =0 , 4
228 this % mDFs ( i +9 , x , y , old ) = d o t _ p r o d u c t ( N _ I N V _ T E M P (: , i ) , t e m p M o m s )
229 end do
230
231 ! stream to new array ...
232 ! ... fluid
233 do i =0 , 8
234 dest (0) = mod ( x + E V _ F L U I D (1 , i )+ this % mNx -1 , this % mNx )+1
235 dest (1) = y + E V _ F L U I D (2 , i )
236 ! STREAM ( also s t o r i n g r u n a w a y DFs in Y - buffer space )
237 this % mDFs (i , dest (0) , dest (1) , new ) = this % mDFs (i , x , y , old )
238 if ( dest (1) == 0 ) then
239 if ( E V _ F L U I D (2 , i ) /= 0 ) then
240 ! apply bounce - back @ b o t t o m
241 this % mDFs ( O P P O S I T E _ F L U I D ( i ) , x , y , new ) = &
242 this % mDFs (i , dest (0) , dest (1) , old )
243 end if
244 e l s e i f ( dest (1) == this % mNy +1 ) then
245 if ( E V _ F L U I D (2 , i ) /= 0 ) then
246 ! apply bounce - back @top
247 this % mDFs ( O P P O S I T E _ F L U I D ( i ) , x , y , new ) = &
248 this % mDFs (i , dest (0) , dest (1) , old )
249 end if
250 end if
251 end do
252 ! ... temp
253 do i =0 , 4
254 dest (0) = mod ( x + E V _ T E M P (1 , i )+ this % mNx -1 , this % mNx )+1
255 dest (1) = y + E V _ T E M P (2 , i )
256 ! STREAM ( also s t o r i n g r u n a w a y DFs in Y - buffer space )
257 this % mDFs ( i +9 , dest (0) , dest (1) , new ) = this % mDFs ( i +9 , x , y , old )
258 if ( dest (1) == 0 ) then
259 ! apply anti - bounce - back @ b o t t o m
260 this % mDFs ( O P P O S I T E _ T E M P ( i )+9 , x , y , new ) = &
261 - this % mDFs ( i +9 , dest (0) , dest (1) , old ) + &
262 2. _RK * sqrt (3. _RK )* this % m D i f f u s i v i t y * this % m T e m p H o t W a l l
263 e l s e i f ( dest (1) == this % mNy +1 ) then
264 ! apply anti - bounce - back @top
265 this % mDFs ( O P P O S I T E _ T E M P ( i )+9 , x , y , new ) = &
266 - this % mDFs ( i +9 , dest (0) , dest (1) , old ) + &
267 2. _RK * sqrt (3. _RK )* this % m D i f f u s i v i t y * this % m T e m p C o l d W a l l
5.3 A Taste of Parallelization 233
268 end if
269 end do
270 end do
271 end do
272 !$omp end do
273 !$omp end p a r a l l e l
274
275 ! swap ’ pointers ’ ( for lattice - a l t e r n a t i o n )
276 call swap ( this % mOld , this % mNew )
277
278 res = n u m T h r e a d s
279 end f u n c t i o n a d v a n c e T i m e M r t S o l v e r B o u s s i n e s q 2 D
105
106 call this % m O u t S i n k % w r i t e O u t p u t ( this % m S o l v e r % g e t R a w M a c r o s () , c u r r I t e r N u m )
107 end do
108
109 toc = o m p _ g e t _ w t i m e () ! p a r a l l e l
110
111 n u m M L U P S = this % m N u m I t e r s M a x * real ( this % mNx * this % mNy , RK ) / (1.0 e6 *( toc - tic ))
112 w r i t e (* , ’ (/ , a , f0 .2 , a ) ’) " P e r f o r m a n c e I n f o r m a t i o n : a c h i e v e d " , &
113 numMLUPS , " M L U P S ( mega - lattice - updates - per - s e c o n d ) "
114 w r i t e (* , ’(a , i0 ,a , f0 .4 , a ) ’ ) " [ < nThreads > " , r e a l N u m T h r e a d s , &
115 " </ nThreads > < perf > " , numMLUPS , " </ perf > ] "
116 end s u b r o u t i n e r u n R B e n a r d S i m u l a t i o n
Exercise 22 (Querying for the number of threads) For our second case study
(see Listings 5.33 and 5.34) we made some effort to query the runtime system,
and then propagate the number of threads to higher levels of our application.
81Note that we only wrote output for the initial and final timesteps; output writing will generally
degrade scalability, due to Amdahl’s law.
5.3 A Taste of Parallelization 235
This concludes our brief coverage of OpenMP. Naturally, we only presented a small
subset of the features available to the user (even if we only considered Version 3.1).
From the (long) list of features which we did not explain but which may be crucial
for many applications, we can mention:
• support for dynamic parallelism (task-construct),
• reductions,
• explicit synchronization (barrier, critical, etc.),
• “privatization” of global data (threadprivate), or
• techniques for performance optimization (load balancing, memory model, affinity,
etc.).
The interested reader can consult, for example, Hager and Wellein [8] (an introduction
to parallelization in general), Chapman et al. [6] and Chandra et al. [5] (for more
on OpenMP), or Mattson et al. [18] and McCool et al. [19] (for related software
engineering issues).
Also, at the time of writing, Version 4.0 was already published, which offers
many more features worth considering, especially as compiler support matures.
Although many applications are written in a single language, a fact is that various
programs and libraries were written by different programmers, with different prefer-
ences for specific languages. Besides subjective reasons such as individual expertise
and preferences of the programmer, this variety also reflects the fact that different
programming languages have their own strengths and weaknesses. For example, com-
piled languages like Fortran and C/C++ are suitable for performance-critical code,
while an interpreted language (like R, Python, or MATLAB) would be preferred,
e.g., for interactive data analysis.82 For the application developer, there are often
good reasons to combine different languages. Since the C programming language
82 Due to the need to interpret code at runtime, scripting languages often introduce some performance
penalty. However, in many situations (data analysis, algorithm prototyping, etc.) it is perfectly
acceptable to trade some performance for higher programmer productivity. Also, many scripting
languages allow some form of code compilation, so the distinction is not so clear-cut.
236 5 More Advanced Techniques
83 It is also possible (and common) to combine complete programs written in different languages,
by exchanging information via files on disk or interprocess communication mechanisms (such as
Unix pipes), and steering the execution of the programs through some scripts (e.g. shell scripts).
However, here we refer to the case when we want to link together object files obtained from different
compiled languages.
84 For many compiler suites (including gcc), the Fortran and C compilers actually have common
components, with different programming languages being supported by different frontends, which
translate the code into a language-neutral intermediate representation.
5.4 Interoperability with C 237
First, consider the situation when the main-program is written in Fortran, and the
function we want to call is implemented in C, as follows:
5 # i n c l u d e < std i o . h >
6
7 void t e s t _ p r o c _ c _ v 1 () {
8 p r i n t f ( " Hello from \" t e s t _ p r o c _ c _ v 1 ( C )\" , "
9 " i n v o k e d from \" d e m o _ f o r t _ v 1 ( Fort )\"!\ n " );
10 }
Listing 5.36 src/Chapter5/interop/f_calls_c_v1/test_proc_c_
v1.c
This is a little more interesting, since we have some new elements. In order to allow
the Fortran compiler to perform error checking, we explicitly define the interface
of the C function, with an interface-block (lines 8–12). This is similar to what
we discussed in Sect. 3.2.3, with the addition of the bind-attribute (line 10), which
causes the Fortran compiler to produce object code which is compatible with the con-
ventions of the companion C compiler. Inside the parentheses of the bind-clause,
the first element should be C . In principle, the second element (corresponding to
name ) is optional. However, we prefer to always specify it, even if the string is iden-
tical to the name of the procedure within the interface-block, as in Listing 5.37.
This avoids potential problems due to mixed letter-case.85
85 Remember that case variations are generally discarded by Fortran compilers, but not so by C
Linking the final executable The two compilation commands above would have
produced two object files. As discussed in Sect. 5.1, we can invoke the linker, to
combine the object files (and the external libraries they need) into an executable. On
our platform, the command is:
$ g f o r t r a n - o d e m o _ f o r t _ v 1 d e m o _ f o r t _ v 1 . o t e s t _ p r o c _ c _ v 1 . o - lc
Note that we also link against the C standard library ( libc on Linux), which is
necessary for printf.86
Next, consider the reverse situation, when the main-program is written in C, but we
need to invoke a subroutine written in Fortran—for example:
5 s u b r o u t i n e t e s t _ p r o c _ f o r t _ v 1 () &
6 bind (C , name = ’ t e s t _ p r o c _ f o r t _ v 1 ’)
7 w r i t e (* , ’( a ) ’) ’ Hello from " t e s t _ p r o c _ f o r t _ v 1 ( Fort )" ,&
8 & i n v o k e d from d e m o _ c _ v 1 ( C )"! ’
9 end s u b r o u t i n e t e s t _ p r o c _ f o r t _ v 1
Listing 5.38 src/Chapter5/interop/c_calls_f_v1/test_proc_
fort_v1.f90
To make the procedure callable from C-code, we need to specify again the bind-
attribute (line 6). We generate the corresponding object file with:
$ g f o r t r a n - c t e s t _ p r o c _ f o r t _ v 1 . f90
86This library is also automatically added by the compiler, so −lc may be skipped in this case.
However, we added it explicitly here, to facilitate comparison with the next example (Sect. 5.4.1.2).
5.4 Interoperability with C 239
Since the procedure is external to the translation unit, we need a forward declaration
for it (line 8), just like we needed to add an interface-block in Listing 5.37. As
usual, we generate the object file with:
$ gcc - c d e m o _ c _ v 1 . c
Linking the final executable Similar to Sect. 5.4.1.1, we invoke the linker (now
through the C compiler, to obtain the final executable:
$ gcc - o d e m o _ c _ v 1 d e m o _ c _ v 1 . o t e s t _ p r o c _ f o r t _ v 1 . o - l g f o r t r a n
Note that we now need to link in the Fortran standard library ( libgfortran ),
for our invocation of write to work (Listing 5.38, line 7).
Any variables that are shared in some way with C obviously need to be of types which
are accepted by the C compiler. From the Fortran side, we ensure this is the case by
selecting special kinds for the intrinsic types. These special kind type parameters
are defined within the intrinsic module iso_c_binding . With the exception
of unsigned integer-types (which are not supported in Fortran), we have there
kind-values for translating most common types; for example, integer(c_int)
in Fortran is compatible with int in C, integer(c_long) with long ,
real(c_float) with float , real(c_double) with double , etc. Note
that the compiler may not support interoperability for all of the types; in such situa-
tions, a negative kind-value will be returned, which will cause a compilation error
when used later for variable declarations (see, e.g., Metcalf et al. [21] for a discussion
of possible negative values and their meanings).
240 5 More Advanced Techniques
We now extend the example from Sect. 5.4.1.1, adding a scalar and an array as
procedure arguments. The new version of the C function is:
5 # i n c l u d e < std i o . h >
6
7 void t e s t _ p r o c _ c _ v 2 ( int n , d o u b l e arr [ 3 ] [ 2 ] ) {
8 int i , j ;
9
10 p r i n t f ( " Hello from \" t e s t _ p r o c _ c _ v 2 ( C )\" , "
11 " i n v o k e d from \" d e m o _ f o r t _ v 2 ( Fort )\"!\ n " );
12 p r i n t f ( " n = % d \ n " , n );
13 for ( j =0; j <3; j ++) {
14 for ( i =0; i <2; i ++) {
15 p r i n t f ( " arr [% d ,% d ,% d ] = %8.2 f \ n " , j , i , arr [ j ][ i ]);
16 }
17 }
18 }
Listing 5.40 src/Chapter5/interop/f_calls_c_v2/test_proc_c_
v2.c
At line 12 we print the received value for n, and the nested loops at lines 13–17 do
the same for the array arr.
The corresponding main-program (Fortran) is:
5 program demo_fort_v2
6 use i s o _ c _ b i n d i n g , only : c_int , c _ d o u b l e
7 i m p l i c i t none
8
9 integer : : i , j ! dummy indices
10 i n t e g e r ( c _ i n t ) : : n _ f o r t = 17
11 real ( c _ d o u b l e ) , d i m e n s i o n (2 ,3) : : a r r _ f o r t
12
13 ! I F A C E to C - f u n c t i o n .
14 interface
15 s u b r o u t i n e t e s t _ p r o c _ c _ v 2 ( n_c , a r r _ c ) &
16 bind (C , name = ’ t e s t _ p r o c _ c _ v 2 ’ )
17 use i s o _ c _ b i n d i n g , only : c_int , c _ d o u b l e
18 i n t e g e r ( c _ i n t ) , i n t e n t ( in ) , v a l u e : : n_c
19 real ( c _ d o u b l e ) , d i m e n s i o n (2 ,3) , i n t e n t ( in ) : : a r r _ c
20 end s u b r o u t i n e t e s t _ p r o c _ c _ v 2
21 end i n t e r f a c e
22
23 ! initialize ’ arr_fort ’ with some data
24 do j =1 ,3
25 do i =1 ,2
26 a r r _ f o r t ( i , j ) = real ( i * j , c _ d o u b l e )
27 end do
28 end do
29
30 call t e s t _ p r o c _ c _ v 2 ( n_fort , a r r _ f o r t ) ! Fort - call - > C
31 end p r o g r a m d e m o _ f o r t _ v 2
Listing 5.41 src/Chapter5/interop/f_calls_c_v2/demo_fort_v2.
f90
For the reverse scenario, when the main-program is written in C, we can invoke a
Fortran subroutine such as:
5 s u b r o u t i n e t e s t _ p r o c _ f o r t _ v 2 ( n , arr ) &
6 bind (C , name = ’ t e s t _ p r o c _ f o r t _ v 2 ’)
7 use i s o _ c _ b i n d i n g , only : c_int , c _ d o u b l e
8 i n t e g e r ( c _ i n t ) , i n t e n t ( in ) , v a l u e : : n
9 real ( c _ d o u b l e ) , d i m e n s i o n (2 ,3) , i n t e n t ( in ) : : arr
10 integer : : i , j ! dummy indices
11
12 w r i t e (* , ’ ( a ) ’ ) ’ Hello from " t e s t _ p r o c _ f o r t _ v 2 ( Fort )" ,&
13 & i n v o k e d from d e m o _ c _ v 2 ( C )"! ’
14 w r i t e (* , ’ ( a , i0 ) ’ ) " n = " , n
15 do j =1 ,3
16 do i =1 ,2
17 w r i t e (* , ’ (2( a , i0 ) ,a , f8 .2) ’) " arr [ " , i , " , " , j , " ] = " , arr ( i , j )
18 end do
19 end do
20 end s u b r o u t i n e t e s t _ p r o c _ f o r t _ v 2
Listing 5.42 src/Chapter5/interop/c_calls_f_v2/test_proc_
fort_v2.f90
Most observations from the previous section also apply here, including the value
type attribute, which we now have to specify inside the subroutine (line 8).
The corresponding C main-program, and the corresponding procedure forward
declaration, are:
5 # include < stdlib .h >
6
7 /* d e c l a r a t i o n of F o r t r a n p r o c e d u r e */
8 void t e s t _ p r o c _ f o r t _ v 2 ( int n_f , d o u b l e a r r _ f [ 3 ] [ 2 ] ) ;
9
10 int main () {
11 int i , j , n_c =17;
12 double arr_c [3][2];
13
14 /* i n i t i a l i z e ’ arr_c ’ with some data */
15 for ( j =0; j <3; j ++) {
16 for ( i =0; i <2; i ++) {
87 As long as they are representable integers and the resulting array fits into memory.
242 5 More Advanced Techniques
17 a r r _ c [ j ][ i ] = ( d o u b l e ) ( i +1)*( j +1);
18 }
19 }
20
21 t e s t _ p r o c _ f o r t _ v 2 ( n_c , a r r _ c ); /* C - calls - > Fort */
22
23 return EXIT_SUCCESS ;
24 }
Listing 5.43 src/Chapter5/interop/c_calls_f_v2/demo_c_v2.c
This concludes our introduction to interoperability issues. In practice, the reader may
also encounter more advanced scenarios, which we do not cover here. For example,
it is also possible to pass between Fortran and C character-strings and dynamic
arrays, or to make global data interoperable. Some additional type definitions and
intrinsic procedures (also defined in the iso_c_binding-module) are relevant
for such tasks—for more information, we refer to other texts (such as Clerman and
Spector [7], Markus [15], or Metcalf et al. [21]).
For a long time, Fortran programs had no standard mechanism for interacting with the
OS, although vendor-specific mechanisms were available. To remove this potential
source of portability problems, recent versions of the standard added some intrinsic
procedures to streamline this process. In this section, we use some of these proce-
dures, to demonstrate passing command line arguments and launching (“forking”)
another program directly from a running Fortran application.
There are two possible approaches for obtaining the command line arguments
from the Fortran runtime system.
Option (A): read entire invocation command line As a first option, it is possible
to obtain from the runtime system the whole command line, including the name of
the program and the complete list of arguments. This information can be obtained
by calling the intrinsic subroutine get_command, which has the syntax:
call g e t _ c o m m a n d ([ c o m m a n d =] s t r i n g \ _val , [ l e n g t h =] s t r i n g \ _len , &
[ s t a t u s =] c m d _ s t a t )
where the arguments (all of them optional) represent:
Since this subroutine provides the “raw” command line, it has the drawback of
forcing programmers to write their own code for parsing the command string.
Option (B): read arguments one-by-one The second method for obtaining the com-
mand line arguments consists of first asking the number of arguments, by calling the
intrinsic function command_argument_count. This returns an integer,
with the number of CLI-arguments, not including the program name. With this infor-
mation, the programmer can then retrieve individual arguments based on an index,
with the get_command_argument-subroutine, with the syntax:
call g e t _ c o m m a n d _ a r g u m e n t ( [ n u m b e r =] arg_idx , [ v a l u e =] s t r i n g \ _val , &
[ l e n g t h =] s t r i n g \ _len , [ s t a t u s =] c m d _ s t a t )
where the arguments (all optional, except the first one) represent:
• number=arg_idx : integer (intent(in)), containing the index of the argu-
ment to be retrieved (value of 0 can also be used, to get the name of the program
itself)
• value=string_val : string (intent(out)), where the value of the argument
will be placed (again, truncated or zero-padded if the argument is larger or shorter
than the string’s length)
• length=string_len and status=cmd_stat have similar roles as in the first
method (but now applied to individual CLI-arguments)
With this second approach, the task of splitting the list of arguments (also known as
tokenization in the programming jargon) is accomplished by the compiler. However,
the programmer still has to write some code to validate and interpret the arguments
(which can be somewhat tedious for options which accept values, e.g., something
244 5 More Advanced Techniques
like −n=123).89 We demonstrate the two methods of reading arguments in the file
src/Chapter5/reading_cli_arguments.f90 (see the source code repos-
itory).
89 C++ programmers do not need to do this, since they can use libraries such as Boost.
Program_Options.
90 Although the ability to launch programs asynchronously looks very appealing (indeed, this can
even be viewed as a primitive form of parallelization), its usefulness is limited in practice, since
there is currently no standard mechanism to check if the program actually terminated and, if so,
what exit status was returned. See [21] for more details on this feature.
5.5 Interacting with the Operating System (OS) 245
In closing, we briefly mention several other tools that may become useful for your
projects. Note that, depending on the current (expected) size of the software, not all
technologies mentioned here may pay off.
In this book we focus on using Fortran as a tool for solving computational problems.
However, it is not feasible to write all types of programs in Fortran. The language
is not suitable, for example, for applications where code is changing rapidly, and
there is a need to quickly test the outcome. Likewise, it is more economic to choose
another language when extensive functionality from a specific problem domain (such
as graphics or process manipulation) is necessary, which is not available in Fortran,
or in a 3rd-party library callable from Fortran.
A common practice to resolve this tension is to develop multi-language appli-
cations, so that the strengths of each language can be exploited. We already dis-
cussed some form of this in Sect. 5.4. Given the supremacy of C and C++ in high-
performance systems programming, there is a large set of useful libraries which
become available to Fortran programmers through such an inter-language bridge.
However, this does not solve the problem of applications where requirements change
rapidly (such as exploratory visualization and data analysis), and can also increase
the complexity of the final applications.
Another common combination in relation to Fortran is to use a scripted language
for the tasks which are more cumbersome to implement directly in Fortran (such as
file system manipulation, or computational steering); the scripts then delegate the
numerically intensive tasks to Fortran programs. Most of the scripted languages do
not need to be compiled, which allows to immediately get feedback from individ-
ual commands. The traditional scripting languages in Unix are shell scripts (like
bash, zsh, ksh, or tcsh). For ESS applications, which need to run on super-
computers, it is often necessary to invoke the executable indirectly, from what is
known as a “job script”. Such scripts typically use a (system-dependent) variation of
one of the languages above. For an introduction to such languages see, for example,
Robbins and Beebe [25]. Also in Windows there are several native technologies,
such as Windows Script Host, the CMD shell, and Windows PowerShell
(see, e.g., Knittel [13] for details). Although the shells may often seem to be prim-
itive as programming languages, their distinctive advantage is that they seamlessly
integrate with the rest of the system. On Unix in particular, a fundamental principle
246 5 More Advanced Techniques
is to write programs that do a particular task well, and to design these so that they
can easily communicate with one another, through streams of text91 (see Raymond
[24]). The shells were designed to fit into this picture as “glue”-languages, which
make invocation of programs and pipelining of output easier.
Another class of scripted languages that can be used with Fortran are the more
general-purpose R, Python, MATLAB, and octave. These offer some valuable
tools, such as support for advanced statistics, visualization, and computational
steering.
Due to the long history of Fortran, it is natural that a large collection of programs
and libraries has been created. Many of these are available, under open-source or
commercial licenses, and are of high quality. Therefore, it makes sense to consider,
when evaluating the requirements of a new application, if any of these libraries
and programs could be used, to reduce the development costs. We provide a short
overview here, for some of these libraries that are relevant to ESS. Note that, given
the capabilities of modern Fortran to interoperate with C (as discussed in Sect. 5.4),
it is also possible to use software libraries that were written in C.92
First of all, it is recommended to search for the desired functionality in the set of
intrinsic procedures of Fortran. We could only cover a small subset of these here,
so we refer to more advanced texts like Metcalf et al. [21] (especially Chap. 8 and
Appendix A therein) for a complete list.
Within the universe of software packages, an important role (especially for ESS)
is occupied by Linear Algebra routines. The de facto standard library in this domain,
especially for working with dense and banded matrices, is Linear Algebra PACKage
(LAPACK). To ensure good performance on many platforms, this relies heavily on
BLAS, which is a library for performing the lower-level computations. The latter is
actually a collection of libraries, with a conventional interface—this allows hardware
vendors to provide optimized versions of these libraries for their own platforms, such
as Accelerate from Apple, Core Math Library (ACML) from AMD, Engineering
Scientific Soubroutine Library (ESSL) from IBM, Intel ® Math Kernel Library (MKL)
from intel, etc. (consult the documentation of your system to see what is available).
An alternative is Automatically Tuned Linear Algebra Software (ATLAS) [28],
which can generate optimized versions of BLAS.
Categorized collections of Fortran libraries like netlib or Guide to Available
Mathematical Software (GAMS) are good places to consult for other libraries.
91In particular, well-known tools for manipulating text are grep, awk, and sed.
92However, it may be necessary to write a thin “wrapper layer” of interface-blocks, based
on the documentation of the C API of the library.
5.6 Useful Tools for Scaling Software Projects 247
5.6.3 Visualization
Data visualization is very important in ESS. However, this is a vast field in itself,
and we can only provide some pointers here. The issue of visualization relates to the
hierarchical approach to I/O that we highlighted throughout the text, since choosing
a suitable data format is a crucial prerequisite:
• ASCII files with minimal formatting are suitable for small and low-dimensional
datasets (up to two independent coordinates). Most tools can operate on such files.
• netCDF-files are recommended (especially in ESS), since they are also widely
supported, and tools like the Climate Data Operators (CDO) or the netCDF Oper-
ators (NCO) can be used for preprocessing the data files, when they would be too
large to be comfortably used directly in the visualization software.
The concrete software package to use for visualization depends more on other
factors, such as additional mathematical/graphics features that may be necessary:
• for simple visualizations, gnuplot is very suitable, especially for ASCII files
• interpreted languages, such as R, the Generic Mapping Tools (GMT) or MATLAB
can be useful for more complex analyses, as they were either designed with ESS
applications in mind (GMT), or accommodate a large set of packages for specialized
functions (R, MATLAB); all of them can also operate on netCDF-data
• for 3D volume datasets, tools like Parallel Visualization Application (ParaView)
(also supporting netCDF-files) can be used
Software projects are, inherently, very dynamic: new features are added, parts of
the code are restructured93 for clarity or performance optimization, old bugs are
fixed and, unfortunately, new bugs are introduced, etc. This process naturally leads
to several versions, which can add management costs (for example, when trying to
determine which change led to a certain bug).
To diminish these additional costs, version control systems were invented. They
provide the concept of a “repository”, which is where all revisions of the project are
stored. As the project is evolving, developers can mark completion of certain mile-
stones with “commits”, usually accompanied by related comments. Any “committed”
version can then be easily retrieved, and also compared against other versions (par-
ticularly useful). There is a lot of variation in the exact mechanics of these operations,
and in the way special situations (like collaboration) are handled—these depend a
lot on the type of system used. A rough classification of these systems identifies two
classes,94 either of which may be preferred, depending on the needs of the project
and on the background of the developers:
• centralized systems (e.g. subversionn): a central server is designated, where
all project contributors upload their changes, and from which they get the latest
revision of the code (“trunk”). With this hierarchy, it is always straightforward
to locate the latest revision of the code. One limitation due to this server-client
architecture is that network access becomes necessary for most operations.
• distributed systems ((git), mercurial, or monotone, etc.): with these sys-
tems, every developer commands a fully functional clone of the repository. This
relaxes the constraint of constant network access, and also provides developers the
means to test ideas in local “branches”, before sharing them with others (often,
such ideas are complex enough to benefit from version control on their own, which
is easy to do in distributed systems). A possible disadvantage is that, since no two
developers will make the same changes, the repositories can easily diverge in time,
which may be a problem if a common version is desired (but these systems usually
also provide excellent tools for synchronization).
5.6.5 Testing
94 The separation is not so clear-cut, since when a “centralized” system is used by a single developer
there is no need for a dedicated server; also, the “distributed” systems can be used in a server-client
fashion.
95 See, for example, the FORTRAN Unit Test Framework (fruit).
5.6 Useful Tools for Scaling Software Projects 249
References
1. Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing
capabilities. In: Proceedings of the 18–20 April 1967. Spring Joint Computer Conference, pp.
483–485. AFIPS’67 (Spring), ACM (1967)
2. Barakat, H.Z., Clark, J.A.: On the solution of the diffusion equations by numerical methods.
J. Heat Transf. 88(4), 421–427 (1966)
3. Calcote, J.: Autotools: A Practitioner’s Guide to GNU Autoconf, Automake, and Libtool. No
Starch Press, San Francisco (2010)
4. Caron, J.: On the suitability of BUFR and GRIB for archiving data. In: AGU Fall Meeting
Abstracts, vol. 1, p. 1619 (2011)
5. Chandra, R., Dagum, L., Maydan, D., McDonald, J., Menon, R.: Parallel Programming in
OpenMP. Morgen Kaufmann Publishers, San Francisco (2000)
6. Chapman, B., Jost, G.: Using OpenMP: Portable Shared Memory Parallel Programming. The
MIT Press, Cambridge (2007)
7. Clerman, N.S., Spector, W.: Modern Fortran: Style and Usage. Cambridge University Press,
Cambridge (2011)
8. Hager, G., Wellein, G.: Introduction to High Performance Computing for Scientists and Engi-
neers. CRC Press, Boca Raton (2010)
9. Hao, W., Zhu, S.: Parallel iterative methods for parabolic equations. Int. J. Comput. Math.
86(3), 431–440 (2009)
10. Hook, B.: Write Portable Code: An Introduction to Developing Software for Multiple Platforms.
No Starch Press, San Francisco (2005)
11. Kerrisk, M.: The Linux Programming Interface: A Linux and UNIX System Programming
Handbook. No Starch Press, San Francisco (2010)
12. Knight, S.: Building software with SCons. Comput. Sci. Eng. 7(1), 79–88 (2005)
13. Knittel, B.: Windows 7 and Vista Guide to Scripting, Automation, and Command Line Tools.
Que Publishing, Upper Saddle River (2010)
14. Locarnini, R.A., Mishonov, A.V., Antonov, J.I., Boyer, T.P., Garcia, H.E., Baranova, O.K.,
Zweng, M.M., Johnson, D.R.: World Ocean Atlas 2009 Volume 1: Temperature. In: Levitus,
S. (ed.) NOAA Atlas NESDIS 68. U.S. Government Printing Office, Washington, D.C., p. 184
(2010), also available as http://www.nodc.noaa.gov/OC5/indprod.html
15. Markus, A.: Modern Fortran in Practice. Cambridge University Press, Cambridge (2012)
16. Martin, K., Hoffman, B.: Mastering CMake, 6th edn. Kitware Inc, New York (2013)
17. Martorell, X., Tallada, M., Duran, A., Balart, J., Ferrer, R., Ayguade, E., Labarta, J.: Tech-
niques supporting threadprivate in OpenMP. In: Parallel and Distributed Processing Sympo-
sium, IPDPS 2006. 20th International, p. 7 (Apr 2006)
250 5 More Advanced Techniques
18. Mattson, T.G., Sanders, B.A., Massingill, B.: Patterns for Parallel Programming. Addison-
Wesley Professional, Boston (2004)
19. McCool, M., Reinders, J., Robison, A.: Structured Parallel Programming: Patterns for Efficient
Computation, 1st edn. Morgan Kaufmann, San Francisco (2012)
20. Mecklenburg, R.: Managing Projects with GNU Make (Nutshell Handbooks), 3rd edn. O’Reilly
Media, Sebastopol (2004)
21. Metcalf, M., Reid, J., Cohen, M.: Modern Fortran Explained. Oxford University Press, Oxford
(2011)
22. Moore, G.E.: Cramming more components onto integrated circuits. Electronics 38(8), 114–117
(1965)
23. Overton, M.L.: Numerical Computing with IEEE Floating Point Arithmetic. Society for Indus-
trial and Applied Mathematics, Philadelphia (2001)
24. Raymond, E.S.: The Art of UNIX Programming. Addison-Wesley Professional, Boston (2003)
25. Robbins, A., Beebe, N.: Classic Shell Scripting. O’Reilly Media, Sebastopol (2005)
26. Smith, P.: Software Build Systems: Principles and Experience. Addison-Wesley Professional,
Boston (2011)
27. Sutter, H.: The free lunch is over: a fundamental turn toward concurrency in software. Dr.
Dobb’s J. 30(3), 202–210 (2005)
28. Whaley, R.C., Petitet, A.: Minimizing development and maintenance costs in supporting per-
sistently optimized BLAS. J. Softw. Pract. Exp. 35(2), 101–121 (2005)