Eijkhout HPCtutorials
Eijkhout HPCtutorials
Victor Eijkhout
Public draft This book is open for comments. What is missing or incomplete or unclear? Is material
presented in the wrong sequence? Kindly mail me with any comments you may have.
You may have found this book in any of a number of places; the authoritative download location is https:
//theartofhpc.com/ That page also links to lulu.com where you can get a nicely printed copy.
4
Contents
Victor Eijkhout 5
Contents
Exercises
lesson Topic Book Slides in-class homework
1 Unix 1 unix 1.42
2 Git 5
3 Programming2 programming 2.3 2.4
4 Libraries 2 programming
5 Debugging 11 root code
6 LATEX 15 15.13
7 Make 3 3.1, 3.2
A good part of being an effective practitioner of High Performance Scientific Computing is what can be
called ‘HPC Carpentry’: a number of skills that are not scientific in nature, but that are still indispensable
to getting your work done.
The vast majority of scientific programming is done on the Unix platform so we start out with a tutorial
on Unix in chapter 1, followed by an explanation of the how your code is handled by compilers and linkers
and such in chapter 2.
Next you will learn about some tools that will increase your productivity and effectiveness:
• The Make utility is used for managing the building of projects; chapter 3.
• Source control systems store your code in such a way that you can undo changes, or maintain
multiple versions; in chapter 5 you will see the subversion software.
• Storing and exchanging scientific data becomes an important matter once your program starts to
produce results; in chapter 7 you will learn the use of HDF5.
• Visual output of program data is important, but too wide a topic to discuss here in great detail;
chapter 9 teaches you the basics of the gnuplot package, which is suitable for simple data plotting.
We also consider the activity of program development itself: chapter 10 considers how to code to prevent
errors, and chapter 11 teaches you to debug code with gdb. Chapter 13 contains some information on how
to write a program that uses more than one programming language.
Finally, chapter 15 teaches you about the LATEX document system, so that you can report on your work in
beautifully typeset articles.
Many of the tutorials are very hands-on. Do them while sitting at a computer!
Table 1 gives a proposed lesson outline for the carpentry section of a course. The article by Wilson [24]
is a good read on the thinking behind this ‘HPC carpentry’.
6 HPC Carpentry
Chapter 1
Unix intro
Unix is an Operating System (OS), that is, a layer of software between the user or a user program and the
hardware. It takes care of files and screen output, and it makes sure that many processes can exist side by
side on one system. However, it is not immediately visible to the user.
Most of this tutorial will work on any Unix-like platform, however, there is not just one Unix:
• Traditionally there are a few major flavors of Unix: ATT or System V , and BSD. Apple has Darwin
which is close to BSD; IBM and HP have their own versions of Unix, and Linux is yet another
variant. These days many Unix versions adhere to the POSIX standard. The differences between
these are deep down and if you are taking this tutorial you probably won’t see them for quite a
while.
• Within Linux there are various Linux distributions such as Red Hat or Ubuntu. These mainly differ
in the organization of system files and again you probably need not worry about them.
• The issue of command shells will be discussed below. This actually forms the most visible differ-
ence between different computers ‘running Unix’.
1.1 Shells
Most of the time that you use Unix, you are typing commands which are executed by an interpreter called
the shell. The shell makes the actual OS calls. There are a few possible Unix shells available
• Most of this tutorial is focused on the sh or bash shell.
• For a variety of reasons (see for instance section 3.5), bash-like shells are to be preferred over the
csh or tcsh shell. These latter ones will not be covered in this tutorial.
• Recent versions of the Apple Mac OS have the zsh as default. While this shell has many things in
common with bash, we will point out differences explicitly.
7
1. Unix intro
1.2.1.1 ls
Without any argument, the ls command gives you a listing of files that are in your present location.
Exercise 1.1. Type ls. Does anything show up?
Intended outcome. If there are files in your directory, they will be listed; if there are none,
no output will be given. This is standard Unix behavior: no output does not mean that
something went wrong, it only means that there is nothing to report.
Exercise 1.2. If the ls command shows that there are files, do ls name on one of those. By
using an option, for instance ls -s name you can get more information about name.
Things to watch out for. If you mistype a name, or specify a name of a non-existing file,
you’ll get an error message.
The ls command can give you all sorts of information. In addition to the above ls -s for the size, there
is ls -l for the ‘long’ listing. It shows (things we will get to later such as) ownership and permissions,
as well as the size and creation date.
Remark 1 There are several dates associated with a file, corresponding to changes in content, changes in
permissions, and access of any sort. The stat command gives all of them.
1.2.1.2 cat
The cat command (short for ‘concatenate’) is often used to display files, but it can also be used to create
some simple content.
Exercise 1.3. Type cat > newfilename (where you can pick any filename) and type some
text. Conclude with Control-d on a line by itself: press the Control key and hold it
while you press the d key. Now use cat to view the contents of that file: cat newfilename.
8 HPC Carpentry
1.2. Files and such
Intended outcome. In the first use of cat, text was appended from the terminal to a file;
in the second the file was cat’ed to the terminal output. You should see on your screen
precisely what you typed into the file.
Things to watch out for. Be sure to type Control-d as the first thing on the last line of
input. If you really get stuck, Control-c will usually get you out. Try this: start creating
a file with cat > filename and hit Control-c in the middle of a line. What are the
contents of your file?
Remark 2 Instead of Control-d you will often see the notation ^D. The capital letter is for historic reasons:
you use the control key and the lowercase letter.
1.2.1.3 man
The primary (though not always the most easily understood) source for unix commands is the man com-
mand, for ‘manual’. The descriptions available this way are referred to as the manual pages.
Exercise 1.4. Read the man page of the ls command: man ls. Find out the size and the time /
date of the last change to some files, for instance the file you just created.
Intended outcome. Did you find the ls -s and ls -l options? The first one lists the
size of each file, usually in kilobytes, the other gives all sorts of information about a file,
including things you will learn about later.
The man command puts you in a mode where you can view long text documents. This viewer is common
on Unix systems (it is available as the more or less system command), so memorize the following ways
of navigating: Use the space bar to go forward and the u key to go back up. Use g to go to the beginning
fo the text, and G for the end. Use q to exit the viewer. If you really get stuck, Control-c will get you out.
Remark 3 If you already know what command you’re looking for, you can use man to get online information
about it. If you forget the name of a command, man -k keyword can help you find it.
1.2.1.4 touch
The touch command creates an empty file, or updates the timestamp of a file if it already exists. Use ls
-l to confirm this behavior.
Victor Eijkhout 9
1. Unix intro
Exercise 1.6. Rename a file. What happens if the target name already exists?
Files are deleted with rm. This command is dangerous: there is no undo. For this reason you can do rm -i
(for ‘interactive’) which asks your confirmation for every file.
See section 1.2.4 for more aggressive removing.
1.2.2 Directories
Purpose. Here you will learn about the Unix directory tree, how to manipulate it and
how to move around in it.
A unix file system is a tree of directories, where a directory is a container for files or more directories. We
will display directories as follows:
/..............................................The root of the directory tree
bin ................................................... Binary programs
home ....................................... Location of users directories
The root of the Unix directory tree is indicated with a slash. Do ls / to see what the files and directories
there are in the root. Note that the root is not the location where you start when you reboot your personal
machine, or when you log in to a server.
10 HPC Carpentry
1.2. Files and such
Exercise 1.9. The command to find out your current working directory is pwd. Your home di-
rectory is your working directory immediately when you log in. Find out your home
directory.
Intended outcome. You will typically see something like /home/yourname or /Users/yourname.
This is system dependent.
Do ls to see the contents of the working directory. In the displays in this section, directory names will be
followed by a slash: dir/ but this character is not part of their name. You can get this output by using ls
-F, and you can tell your shell to use this output consistently by stating alias ls=ls -F at the start of
your session. Example:
/home/you/
adirectory/
afile
Remark 4 If you need to create a directory several levels deep, you could
mkdir sub1
cd sub1
mkdir sub2
cd sub2
## et cetera
but it’s shorter to use the -p option (for ‘parent’) and write:
mkdir -p sub1/sub2/sub3
The command for going into another directory, that is, making it your working directory, is cd (‘change
directory’). It can be used in the following ways:
• cd Without any arguments, cd takes you to your home directory.
• cd <absolute path> An absolute path starts at the root of the directory tree, that is, starts
with /. The cd command takes you to that location.
• cd <relative path> A relative path is one that does not start at the root. This form of the cd
command takes you to <yourcurrentdir>/<relative path>.
Exercise 1.11. Do cd newdir and find out where you are in the directory tree with pwd. Con-
firm with ls that the directory is empty. How would you get to this location using an
absolute path?
Victor Eijkhout 11
1. Unix intro
Intended outcome. pwd should tell you /home/you/newdir, and ls then has no output,
meaning there is nothing to list. The absolute path is /home/you/newdir.
Exercise 1.12. Let’s quickly create a file in this directory: touch onefile, and another direc-
tory: mkdir otherdir. Do ls and confirm that there are a new file and directory.
Intended outcome. You should now have:
/home/you/
newdir/...................................................you are here
onefile
otherdir/
The ls command has a very useful option: with ls -a you see your regular files and hidden files, which
have a name that starts with a dot. Doing ls -a in your new directory should tell you that there are the
following files:
/home/you/
newdir/...................................................you are here
.
..
onefile
otherdir/
The single dot is the current directory, and the double dot is the directory one level back.
Exercise 1.13. Predict where you will be after cd ./otherdir/.. and check to see if you were
right.
Intended outcome. The single dot sends you to the current directory, so that does not
change anything. The otherdir part makes that subdirectory your current working di-
rectory. Finally, .. goes one level back. In other words, this command puts your right
back where you started.
Since your home directory is a special place, there are shortcuts for cd’ing to it: cd without arguments,
cd ~, and cd $HOME all get you back to your home.
Go to your home directory, and from there do ls newdir to check the contents of the first directory you
created, without having to go there.
Exercise 1.14. What does ls .. do?
Intended outcome. Recall that .. denotes the directory one level up in the tree: you should
see your own home directory, plus the directories of any other users.
Let’s practice the use of the single and double dot directory shortcuts.
Exercise 1.15. From your home directory:
mkdir -p sub1/sub2/sub3
cd sub1/sub2/sub3
touch a
12 HPC Carpentry
1.2. Files and such
What is the difference between cp -r newdir somedir where somedir is an exiting directory, and cp
-r newdir thirddir where thirddir is not an existing directory?
1.2.3 Permissions
Purpose. In this section you will learn about how to give various users on your system
permission to do (or not to do) various things with your files.
Unix files, including directories, have permissions, indicating ‘who can do what with this file’. Actions
that can be performed on a file fall into three categories:
• reading r: any access to a file (displaying, getting information on it) that does not change the file;
• writing w: access to a file that changes its content, or even its metadata such as ‘date modified’;
• executing x: if the file is executable, to run it; if it is a directory, to enter it.
The people who can potentially access a file are divided into three classes too:
• the user u: the person owning the file;
• the group g: a group of users to which the owner belongs;
• other o: everyone else.
(For more on groups and ownership, see section 1.13.1.)
The nine permissions are rendered in sequence
user group other
rwx rwx rwx
Victor Eijkhout 13
1. Unix intro
For instance rw-r--r-- means that the owner can read and write a file, the owner’s group and everyone
else can only read.
Permissions are also rendered numerically in groups of three bits, by letting r = 4, w = 2, x = 1:
rwx
421
Common codes are 7 = rwx and 6 = rw. You will find many files that have permissions 755 which stands
for an executable that everyone can run, but only the owner can change, or 644 which stands for a data file
that everyone can see but again only the owner can alter. You can set permissions by the chmod command:
chmod <permissions> file # just one file
chmod -R <permissions> directory # directory, recursively
Examples:
chmod 766 file # set to rwxrw-rw-
chmod g+w file # give group write permission
chmod g=rx file # set group permissions
chod o-w file # take away write permission from others
chmod o= file # take away all permissions from others.
chmod g+r,o-x file # give group read permission
# remove other execute permission
This is a legitimate shell script. What happens when you type ./com? Can you get the script executed?
In the three permission categories it is clear who ‘you’ and ‘others’ refer to. How about ‘group’? We’ll go
into that in section 1.13.
Exercise 1.18. Suppose you’re an instructor and you want to make a ‘dropbox’ directory for
students to deposit homework assignments in. What would be an appropriate mode for
that directory? (Assume that you have co-teachers that are in your group, and who also
need to be able to see the contents. In other words, group permission should be identical
to the owner permission.)
Remark 5 There are more obscure permissions. For instance the setuid bit declares that the program should
run with the permissions of the creator, rather than the user executing it. This is useful for system utilities
14 HPC Carpentry
1.3. Text searching and regular expressions
such passwd or mkdir, which alter the password file and the directory structure, for which root privileges
are needed. Thanks to the setuid bit, a user can run these programs, which are then so designed that a user
can only make changes to their own password entry, and their own directories, respectively. The setuid bit is
set with chmod: chmod 4ugo file.
1.2.4 Wildcards
You already saw that ls filename gives you information about that one file, and ls gives you all files in
the current directory. To see files with certain conditions on their names, the wildcard mechanism exists.
The following wildcards exist:
* any number of characters
? any character.
Example:
%% ls
s sk ski skiing skill
%% ls ski*
ski skiing skill
The second option lists all files whose name start with ski, followed by any number of other characters’;
below you will see that in different contexts ski* means ‘sk followed by any number of i characters’.
Confusing, but that’s the way it is.
You can use rm with wildcards, but this can be dangerous.
rm -f foo ## remove foo if it exists
rm -r foo ## remove directory foo with everything in it
rm -rf foo/* ## delete all contents of foo
Zsh note. No match Removing with a wildcard rm foo* is an error of there are no such files. Set setopt
+o nomatch to allow no matches to occur.
For this section you need at least one file that contains some amount of text. You can for instance get
random text from http://www.lipsum.com/feed/html.
The grep command can be used to search for a text expression in a file.
Exercise 1.19. Search for the letter q in your text file with grep q yourfile and search for it
in all files in your directory with grep q *. Try some other searches.
Intended outcome. In the first case, you get a listing of all lines that contain a q; in the
second case, grep also reports what file name the match was found in: qfile:this
line has q in it.
Victor Eijkhout 15
1. Unix intro
Things to watch out for. If the string you are looking for does not occur, grep will simply
not output anything. Remember that this is standard behavior for Unix commands if there
is nothing to report.
In addition to searching for literal strings, you can look for more general expressions.
^ the beginning of the line
$ the end of the line
. any character
* any number of repetitions
[xyz] any of the characters xyz
This looks like the wildcard mechanism you just saw (section 1.2.4) but it’s subtly different. Compare the
example above with:
%% cat s
sk
ski
skill
skiing
%% grep "ski*" s
sk
ski
skill
skiing
In the second case you search for a string consisting of sk and any number of i characters, including zero
of them.
Some more examples: you can find
• All lines that contain the letter ‘q’ with grep q yourfile;
• All lines that start with an ‘a’ with grep "^a" yourfile (if your search string contains special
characters, it is a good idea to use quote marks to enclose it);
• All lines that end with a digit with grep "[0-9]$" yourfile.
Exercise 1.20. Construct the search strings for finding
• lines that start with an uppercase character, and
• lines that contain exactly one character.
Intended outcome. For the first, use the range characters [], for the second use the period
to match any character.
Exercise 1.21. Add a few lines x = 1, x = 2, x = 3 (that is, have different numbers of
spaces between x and the equals sign) to your test file, and make grep commands to
search for all assignments to x.
The characters in the table above have special meanings. If you want to search that actual character, you
have to escape it.
Exercise 1.22. Make a test file that has both abc and a.c in it, on separate lines. Try the com-
mands grep "a.c" file, grep a\.c file, grep "a\.c" file.
16 HPC Carpentry
1.4. Other useful commands: tar
Intended outcome. You will see that the period needs to be escaped, and the search string
needs to be quoted. In the absence of either, you will see that grep also finds the abc
string.
will display the characters in position 2–5 of every line of myfile. Make a test file and verify this example.
Maybe more useful, you can give cut a delimiter character and have it split a line on occurrences of
that delimiter. For instance, your system will mostly likely have a file /etc/passwd that contains user
information1 , with every line consisting of fields separated by colons. For instance:
daemon:*:1:1:System Services:/var/root:/usr/bin/false
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
The seventh and last field is the login shell of the user; /bin/false indicates that the user is unable to
log in.
You can display users and their login shells with:
cut -d ":" -f 1,7 /etc/passwd
This tells cut to use the colon as delimiter, and to print fields 1 and 7.
1. This is traditionally the case; on Mac OS information about users is kept elsewhere and this file only contains system
services.
Victor Eijkhout 17
1. Unix intro
Text files can often be compressed to a large extent, so adding the z compressiong for gzip is a good idea:
tar fcz package.tar.gz directory_with_stuff
tar fx package.tar.gz
Remark 6 Like any good programming language, the shell language as comments. Any line starting with
a hash character # is ignored.
to enable it.
If you type a command such as ls, the shell does not just rely on a list of commands: it will actually go
searching for a program by the name ls. This means that you can have multiple different commands with
the same name, and which one gets executed depends on which one is found first.
Exercise 1.23. What you may think of as ‘Unix commands’ are often just executable files in a
system directory. Do which ls, and do an ls -l on the result.
Intended outcome. The location of ls is something like /bin/ls. If you ls that, you will
see that it is probably owned by root. Its executable bits are probably set for all users.
The locations where unix searches for commands is the search path, which is stored in the environment
variable (for more details see below) PATH.
Exercise 1.24. Do echo $PATH. Can you find the location of cd? Are there other commands in
the same location? Is the current directory ‘.’ in the path? If not, do export PATH=".:$PATH".
Now create an executable file cd in the current director (see above for the basics), and do
cd.
18 HPC Carpentry
1.5. Command execution
This holds both for compiled programs and shell scripts; section 1.9.1.
Remark 7 Not all Unix commands correspond to executables. The type command gives more information
than which:
$ type echo
echo is a shell builtin
$ type \ls
ls is an alias for ls -F
$ unalias ls
$ type ls
ls is /bin/ls
$ type module
module is a shell function from /usr/local/Cellar/lmod/8.7.2/init/zsh
1.5.2 Aliases
It is possible to define your own commands as aliases of existing commands.
Exercise 1.25. Do alias chdir=cd and convince yourself that now chdir works just like cd.
Do alias rm='rm -i'; look up the meaning of this in the man pages. Some people find
this alias a good idea; can you see why?
Intended outcome. The -i ‘interactive’ option for rm makes the command ask for confir-
mation before each delete. Since unix does not have a trashcan that needs to be emptied
explicitly (as on Windows or the Mac OS), this can be a good idea.
Victor Eijkhout 19
1. Unix intro
This is convenient if you repeat the same two commands a number of times: you only need to up-arrow
once to repeat them both.
and the compilation fails, the program will still be executed, using an old version of the executable if that
exists. This is very confusing.
which only executes the second command if the first one was successful.
1.5.3.2 Pipelining
Instead of taking input from a file, or sending output to a file, it is possible to connect two commands
together, so that the second takes the output of the first as input. The syntax for this is cmdone | cmdtwo;
this is called a pipeline. For instance, grep a yourfile | grep b finds all lines that contains both an
a and a b.
Exercise 1.26. Construct a pipeline that counts how many lines there are in your file that con-
tain the string th. Use the wc command (see above) to do the counting.
1.5.3.3 Backquoting
There are a few more ways to combine commands. Suppose you want to present the result of wc a bit
nicely. Type the following command
echo The line count is wc -l foo
where foo is the name of an existing file. The way to get the actual line count echoed is by the backquote:
echo The line count is `wc -l foo`
Anything in between backquotes is executed before the rest of the command line is evaluated.
Exercise 1.27. The way wc is used here, it prints the file name. Can you find a way to prevent
that from happening?
There is another mechanism for out-of-order evaluation:
echo "There are $( cat Makefile | wc -l ) lines"
This mechanism makes it possible to nest commands, but for compatibility and legacy purposes back-
quotes may still be preferable when nesting is not neeeded.
20 HPC Carpentry
1.5. Command execution
This only catches the last command. You could for instance group the three commands in a subshell and
catch the output of that:
( configure ; make ; make install ) > installation.log 2>&1
The script reports that the file was created even though it wasn’t.
Improved script:
[nowrite] cat ../betterfile
#!/bin/bash
touch $1
if [ $? -eq 0 ] ; then
echo "Created file: $1"
else
echo "Problem creating file: $1"
fi
Victor Eijkhout 21
1. Unix intro
22 HPC Carpentry
1.6. Input/output Redirection
Exercise 1.29. Type Control-z. This suspends the foreground process. It will give you a num-
ber like [1] or [2] indicating that it is the first or second program that has been sus-
pended or put in the background. Now type bg to put this process in the background.
Confirm that there is no foreground process by hitting return, and doing an ls.
Intended outcome. After you put a process in the background, the terminal is available
again to accept foreground commands. If you hit return, you should see the command
prompt. However, the background process still keeps generating output.
Exercise 1.30. Type jobs to see the processes in the current session. If the process you just put
in the background was number 1, type fg %1. Confirm that it is a foreground process
again.
Intended outcome. If a shell is executing a program in the foreground, it will not accept
command input, so hitting return should only produce blank lines.
Exercise 1.31. When you have made the hello script a foreground process again, you can kill
it with Control-c. Try this. Start the script up again, this time as ./hello & which
immediately puts it in the background. You should also get output along the lines of [1]
12345 which tells you that it is the first job you put in the background, and that 12345
is its process ID. Kill the script with kill %1. Start it up again, and kill it by using the
process number.
Intended outcome. The command kill 12345 using the process number is usually enough
to kill a running program. Sometimes it is necessary to use kill -9 12345.
So far, the unix commands you have used have taken their input from your keyboard, or from a file named
on the command line; their output went to your screen. There are other possibilities for providing input
from a file, or for storing the output in a file.
Victor Eijkhout 23
1. Unix intro
Unix has three standard files that handle input and output:
Standard file
stdin is the file that provides input for processes.
stdout is the file where the output of a process is written.
stderr is the file where error output is written.
In an interactive session, all three files are connected to the user terminal. Using input or output redirection
then means that the input is taken or the output sent to a different file than the terminal.
Just as with the input, you can redirect the output of your program. In the simplest case, grep string
yourfile > outfile will take what normally goes to the terminal, and redirect the output to outfile.
The output file is created if it didn’t already exist, otherwise it is overwritten. (To append, use grep text
yourfile >> outfile.)
Exercise 1.32. Take one of the grep commands from the previous section, and send its output
to a file. Check that the contents of the file are identical to what appeared on your screen
before. Search for a string that does not appear in the file and send the output to a file.
What does this mean for the output file?
Intended outcome. Searching for a string that does not occur in a file gives no terminal
output. If you redirect the output of this grep to a file, it gives a zero size file. Check this
with ls and wc.
Exercise 1.33. Generate a text file that contains your information:
My user name is:
eijkhout
My home directory is:
/users/eijkhout
I made this script on:
isp.tacc.utexas.edu
24 HPC Carpentry
1.7. Shell environment variables
Idiom
program 2>/dev/null send only errors to the null device
program >/dev/null 2>&1 send output to dev-null, and errors to output
Note the counterintuitive sequence of specifica-
tions!
program 2>&1 | less send output and errors to less
Remark 8 This does not include variables you define yourself, unless you export them; see below.
Exercise 1.34. Check on the value of the PATH variable by typing echo $PATH. Also find the
value of PATH by piping env through grep.
We start by exploring the use of this dollar sign in relation to shell variables.
You see that the shell treats everything as a string, unless you explicitly tell it to take the value of a variable,
by putting a dollar in front of the name. A variable that has not been previously defined will print as a
blank string.
Shell variables can be set in a number of ways. The simplest is by an assignment as in other programming
languages.
When you do the next exercise, it is good to bear in mind that the shell is a text based language.
Victor Eijkhout 25
1. Unix intro
Exercise 1.35. Type a=5 on the commandline. Check on its value with the echo command.
Define the variable b to another integer. Check on its value.
Now explore the values of a+b and $a+$b, both by echo’ing them, or by first assigning
them.
Intended outcome. The shell does not perform integer addition here: instead you get
a string with a plus-sign in it. (You will see how to do arithmetic on variables in sec-
tion 1.10.1.)
Things to watch out for. Beware not to have space around the equals sign; also be sure to
use the dollar sign to print the value.
[] exit
exit
[] export a=21
[] /bin/bash
[] echo $a
21
[] exit
[]
The syntax where you set the value, as a prefix without using a separate command, sets the value
just for that one command.
26 HPC Carpentry
1.8. Control structures
[]
That is, you defined the variable just for the execution of a single command.
In section 1.8 you will see that the for construct also defines a variable; section 1.9.1 shows some more
built-in variables that apply in shell scripts.
If you want to un-set an environment variable, there is the unset command.
1.8.1 Conditionals
The conditional of the bash shell is predictably called if, and it can be written over several lines:
if [ $PATH = "" ] ; then
echo "Error: path is empty"
fi
or on a single line:
if [ `wc -l file` -gt 100 ] ; then echo "file too long" ; fi
(The backquote is explained in section 1.5.3.3.) There are a number of tests defined, for instance -f
somefile tests for the existence of a file. Change your script so that it will report -1 if the file does
not exist.
The syntax of this is finicky:
• if and elif are followed by a conditional, followed by a semicolon.
• The brackets of the conditional need to have spaces surrounding them.
• There is no semicolon after then of else: they are immediately followed by some command.
Exercise 1.36. Bash conditionals have an elif keyword. Can you predict the error you get from
this:
if [ something ] ; then
foo
else if [ something_else ] ; then
bar
fi
Code it out and see if you were right.
Zsh note. The zsh shell has an extended conditional syntax with double square brackets. For
instance, pattern matching:
if [[ $myvar == *substring* ]] ; then ....
Victor Eijkhout 27
1. Unix intro
1.8.2 Looping
In addition to conditionals, the shell has loops. A for loop looks like
for var in listofitems ; do
something with $var
done
In a more meaningful example, here is how you would make backups of all your .c files:
for cfile in *.c ; do
cp $cfile $cfile.bak
done
Shell variables can be manipulated in a number of ways. Execute the following commands to see that you
can remove trailing characters from a variable:
[] a=b.c
[] echo ${a%.c}
b
(See the section 1.10 on expansion.) With this as a hint, write a loop that renames all your .c files to .x
files.
The above construct loops over words, such as the output of ls. To do a numeric loop, use the command
seq:
[shell:474] seq 1 5
1
2
3
4
5
Note the backtick, which is necessary to have the seq command executed before evaluating the loop.
You can break out of a loop with break; this can even have a numeric argument indicating how many
levels of loop to break out of.
28 HPC Carpentry
1.9. Scripting
1.9 Scripting
The unix shells are also programming environments. You will learn more about this aspect of unix in this
section.
and type ./script1 on the command line. Result? Make the file executable and try again.
Zsh note. Bash scripts If you use the zsh, but you have bash scripts that you wrote in the past, they will
keep working. The ‘hash-bang’ line determines which shell executes the script, and it is perfectly possible
to have bash in your script, while using zsh for interactive use.
In order write scripts that you want to invoke from anywhere, people typically put them in a directory
bin in their home directory. You would then add this directory to your search path, contained in PATH;
see section 1.5.1.
You will now learn how to incorporate this functionality in your scripts.
First of all, all commandline arguments and options are available as variables $1,$2 et cetera in the script,
and the number of command line arguments is available as $#:
#!/bin/bash
Formally:
variable meaning
$# number of arguments
$0 the name of the script
$1,$2,... the arguments
$*,$@ the list of all arguments
Victor Eijkhout 29
1. Unix intro
Exercise 1.37. Write a script that takes as input a file name argument, and reports how many
lines are in that file.
Edit your script to test whether the file has less than 10 lines (use the foo -lt bar test),
and if it does, cat the file. Hint: you need to use backquotes inside the test.
Add a test to your script so that it will give a helpful message if you call it without any
arguments.
The standard way to parse argument is using the shift command, which pops the first argument off the
list of arguments. Parsing the arguments in sequence then involves looking at $1, shifting, and looking at
the new $1.
Code: Output
[code/shell] arguments:
// arguments.sh
while [ $# -gt 0 ] ; do ./arguments.sh the quick "brown
echo "argument: $1" fox" jumps
shift argument: the
done argument: quick
argument: brown fox
argument: jumps
Exercise 1.38. Write a script say.sh that prints its text argument. However, if you invoke it
with
./say.sh -n 7 "Hello world"
it should be print it as many times as you indicated. Using the option -u:
./say.sh -u -n 7 "Goodbye cruel world"
should print the message in uppercase. Make sure that the order of the arguments does
not matter, and give an error message for any unrecognized option.
The variables $@ and $* have a different behavior with respect to double quotes. Let’s say we evaluate
myscript "1 2" 3, then
• Using $* is the list of arguments after removing quotes: myscript 1 2 3.
• Using "$*" is the list of arguments, with quotes removed, in quotes: myscript "1 2 3".
• Using "$@" preserved quotes: myscript "1 2" 3.
1.10 Expansion
The shell performs various kinds of expansion on a command line, that is, replacing part of the comman-
dline with different text.
Brace expansion:
[] echo a{b,cc,ddd}e
abe acce addde
This can for instance be used to delete all extension of some base file name:
[] rm tmp.{c,s,o} # delete tmp.c tmp.s tmp.o
30 HPC Carpentry
1.10. Expansion
There are many variations on parameter expansion. Above you already saw that you can strip trailing
characters:
[] a=b.c
[] echo ${a%.c}
b
The backquote mechanism (section 1.5.3.3 above) is known as command substitution. It allows you to
evaluate part of a command and use it as input for another. For example, if you want to ask what type of
file the command ls is, do
[] file `which ls`
This first evaluates which ls, giving /bin/ls, and then evaluates file /bin/ls. As another example,
here we backquote a whole pipeline, and do a test on the result:
[] echo 123 > w
[] cat w
123
[] wc -c w
4 w
[] if [ `cat w | wc -c` -eq 4 ] ; then echo four ; fi
four
Victor Eijkhout 31
1. Unix intro
You would do this, for instance, if you have edited your startup file.
Unfortunately, there are several startup files, and which one gets read is a complicated functions of cir-
cumstances. Here is a good common sense guideline2 :
• Have a .profile that does nothing but read the .bashrc:
# ~/.profile
if [ -f ~/.bashrc ]; then
source ~/.bashrc
fi
32 HPC Carpentry
1.12. Shell interaction
# ~/.bashrc
# make sure your path is updated
if [ -z "$MYPATH" ]; then
export MYPATH=1
export PATH=$HOME/bin:$PATH
fi
Victor Eijkhout 33
1. Unix intro
finger otheruser get information about another user; you can specify a user’s login name here, or
their real name, or other identifying information the system knows about.
top which processes are running on the system; use top -u to get this sorted the amount of cpu time
they are currently taking. (On Linux, try also the vmstat command.)
uptime how long has it been since your last reboot?
While you can change the group of a file, at least between groups that you belong to, changing the owning
user of a file with chown needs root priviliges. See section 1.13.2.
34 HPC Carpentry
1.14. Connecting to other machines: ssh and scp
where the yourname can be omitted if you have the same name on both machines.
To only copy a file from one machine to another you can use the ‘secure copy’ scp, a secure variant of
‘remote copy’ rcp. The scp command is much like cp in syntax, except that the source or destination can
have a machine prefix.
To copy a file from the current machine to another, type:
scp localfile yourname@othercomputer:otherdirectory
where yourname can again be omitted, and otherdirectory can be an absolute path, or a path relative
to your home directory:
# absolute path:
scp localfile yourname@othercomputer:/share/
# path relative to your home directory:
scp localfile yourname@othercomputer:mysubdirectory
Leaving the destination path empty puts the file in the remote home directory:
scp localfile yourname@othercomputer:
Note the colon at the end of this command: if you leave it out you get a local file with an ‘at’ in the name.
You can also copy a file from the remote machine. For instance, to copy a file, preserving the name:
scp yourname@othercomputer:otherdirectory/otherfile .
Victor Eijkhout 35
1. Unix intro
will apply the substitute command s/foo/bar/ to every line of myfile. The output is shown on your
screen so you should capture it in a new file; see section 1.6 for more on output redirection.
• If you have more than one edit, you can specify them with
sed -e 's/one/two/' -e 's/three/four/'
• If an edit needs to be done only on certain lines, you can specify that by prefixing the edit with
the match string. For instance
sed '/^a/s/b/c/'
only applies the edit on lines that start with an a. (See section 1.3 for regular expressions.)
You can also apply it on a numbered line:
sed '25/s/foo/bar'
• The a and i commands are for ‘append’ and ‘insert’ respectively. They are somewhat strange
in how they take their argument text: the command letter is followed by a backslash, with the
insert/append text on the next line(s), delimited by the closing quote of the command.
sed -e '/here/a\
appended text
' -e '/there/i\
inserted text
' -i file
• Traditionally, sed could only function in a stream, so the output file always had to be different
from the input. The GNU version, which is standard on Linux systems, has a flag -i which edits
‘in place’:
sed -e 's/ab/cd/' -e 's/ef/gh/' -i thefile
1.15.2 awk
The awk utility also operates on each line, but it can be described as having a memory. An awk program
consists of a sequence of pairs, where each pair consists of a match string and an action. The simplest
awk program is
cat somefile | awk '{ print }'
where the match string is omitted, meaning that all lines match, and the action is to print the line. Awk
breaks each line into fields separated by whitespace. A common application of awk is to print a certain
field:
awk '{print $2}' file
36 HPC Carpentry
1.16. Review questions
Exercise 1.39. Build a command pipeline that prints of each subroutine header only the sub-
routine name. For this you first use sed to replace the parentheses by spaces, then awk
to print the subroutine name field.
Awk has variables with which it can remember things. For instance, instead of just printing the second
field of every line, you can make a list of them and print that later:
cat myfile | awk 'BEGIN {v="Fields:"} {v=v " " $2} END {print v}'
As another example of the use of variables, here is how you would print all lines in between a BEGIN and
END line:
cat myfile | awk '/END/ {p=0} p==1 {print} /BEGIN/ {p=1} '
Exercise 1.40. The placement of the match with BEGIN and END may seem strange. Rearrange
the awk program, test it out, and explain the results you get.
For simplicity, we simulate this by making a directory submissions and two different
files student1.txt and student2.txt. After
submit_homework student1.txt
submit_homework student2.txt
there should be copies of both files in the submissions directory. Start by writing a
simple script; it should give a helpful message if you use it the wrong way.
Try to detect if a student is cheating. Explore the diff command to see if the submitted
file is identical to something already submitted: loop over all submitted files and
1. First print out all differences.
2. Count the differences.
3. Test if this count is zero.
Now refine your test by catching if the cheating student randomly inserted some spaces.
For a harder test: try to detect whether the cheating student inserted newlines. This can
not be done with diff, but you could try tr to remove the newlines.
Victor Eijkhout 37
Chapter 2
This command can also tell you about binary files. Here the output differs by operating system.
$$ which ls
/bin/ls
# on a Mac laptop:
$$ file /bin/ls
/bin/ls: Mach-O 64-bit x86_64 executable
# on a Linux box
$$ file /bin/ls
/bin/ls: ELF 64-bit LSB executable, x86-64
38
2.1. File types in programming
Exercise 2.1. Apply the file command to sources for different programming language. Can
you find out how file figures things out?
In figure 2.1 you find a brief summary of file types. We will now discuss them in more detail.
Text files
Source Program text that you write
Header also written by you, but not really program text.
Binary files
Object file The compiled result of a single source file
Library Multiple object files bundled together
Executable Binary file that can be invoked as a command
Data files Written and read by a program
Victor Eijkhout 39
2. Compilers and libraries
Fortran works differently: each record, that is, the output of each Write statement, has the record length (in
bytes) before and after it.
40 HPC Carpentry
2.2. Simple compilation
// binary_write.F90 write(13) i
Open(Unit=13,File="binarydata.out",Form=" end do
unformatted") Close(Unit=13)
do i=0,9
In this tutorial you will mostly be concerned with executable binary files. We then distinguish between:
• program files, which are executable by themselves;
• object files, which are like bit of programs; and
• library files, which combine object files, but are not executable.
Object files come from the fact that your source is often spread over multiple source files, and these can
be compiled separately. In this way, an object file, is a piece of an executable: by itself it does nothing, but
it can be combined with other object files to form an executable.
If you have a collection of object files that you need for more than one program, it is usually a good idea to
make a library: a bundle of object files that can be used to form an executable. Often, libraries are written
by an expert and contain code for specialized purposes such as linear algebra manipulations. Libraries
are important enough that they can be commercial, to be bought if you need expert code for a certain
purpose.
You will now learn how these types of files are created and used.
2.2.1 Compilers
Your main tool for turning source into a program is the compiler. Compilers are specific to a language:
you use a different compiler for C than for Fortran. You can also have two compilers for the same lan-
guage, but from different ‘vendors’. For instance, while many people use the open source gcc or clang
compiler families, companies like Intel and IBM offer compilers that may give more efficient code on their
processors.
int main() {
printf("hello world\n");
return 0;
}
Victor Eijkhout 41
2. Compilers and libraries
Compile this program with your favorite compiler; we will use gcc in this tutorial, but substitute your
own as desired.
TACC note. On TACC clusters, the Intel compiler icc is preferred.
As a result of the compilation, a file a.out is created, which is the executable.
%% gcc hello.c
%% ./a.out
hello world
You can get a more sensible program name with the -o option:
%% gcc -o helloprog hello.c
%% ./helloprog
hello world
42 HPC Carpentry
2.2. Simple compilation
et cetera.
• hello.s: the assembly listing of your program. This is a sort of ‘readable machine language’:
.arch armv8-a
.text
.cstring
.align 3
lC0:
.ascii "hello world\0"
.text
.align 2
.globl _main
_main:
LFB10:
stp x29, x30, [sp, -16]!
• hello.o: the object file, containing actual machine language. We will go into this more below.
The object file is not directly readable, but later you’ll see the nm tool that can give you some
information.
int main() {
Victor Eijkhout 43
2. Compilers and libraries
However, you can also do it in steps, compiling each file separately and then linking them together. This
is illustrated in figure 2.3.
Output
[code/compile] makeseparatecompile:
clang -g -O2 -o oneprogram fooprog.o foosub.o
./oneprogram
hello world
The -c option tells the compiler to compile the source file, giving an object file. The third command then
acts as the linker, tieing together the object files into an executable. (With programs that are spread over
several files there is always the danger of editing a subroutine definition and then forgetting to update all
the places it is used. See the ‘make’ tutorial, section 3, for a way of dealing with this.)
Exercise 2.3.
Exercise for separate compilation. Structure:
• Compile in one:
icc -o program fooprog.c foosub.c
• Compile in steps:
icc -c fooprog.c
icc -c foosub.c
icc -o program fooprog.o foosub.o
44 HPC Carpentry
2.2. Simple compilation
Lines with T indicate routines that are defined; lines with U indicate routines that are used but not define
in this file. In this case, printf is a system routine that will be supplied in the linker stage.
(With C++ the function names will look a little strange because of name mangling. However, you’ll still
be able to recognize them.)
Sometimes you will come across stripped binary file, and nm will report No symbols. In that case nm -D
may help, which displays ‘dynamic symbols’.
Victor Eijkhout 45
2. Compilers and libraries
Exercise 2.4. From level zero to one we get (in the above example; in general this depends on
the compiler) an improvement of 2× to 3×. Can you find an obvious factor of two?
Use the optimization report facility of your compiler to see what other optimizations are
applied. One of them is a good lesson in benchmark design!
Many compilers can generate a report of what optimizations they perform.
Generally, optimizations leave the semantics of your code intact. (Makes kinda sense, not?) However, at
higher levels, usually level 3, the compiler is at liberty to make transformations that are not legal according
to the language standard, but that in the majority of cases will still give the right outcome. For instance,
the C language specifies that arithmetic operations are evaluated left-to-right. Rearranging arithmetic
expressions is usually safe, but not always. Be careful when applying higher optimization levels!
2.3 Libraries
Purpose. In this section you will learn about creating libraries.
If you have written some subprograms, and you want to share them with other people (perhaps by selling
them), then handing over individual object files is inconvenient. Instead, the solution is to combine them
into a library. This section shows you the basic Unix mechanisms. You would typically use these in a
Makefile; if you use CMake it’s all done for you.
46 HPC Carpentry
2.3. Libraries
First we look at static libraries, for which the archive utility ar is used. A static library is linked into your
executable, becoming part of it. This has the advantage that you can give such an executable to someone
else, and it will immediately work. On the other hand, this may lead to large executables; you will learn
about shared libraries next, which do not suffer from this problem.
Create a directory to contain your library, and create the library file there. The library can be linked into
your executable by explicitly giving its name, or by specifying a library path:
Victor Eijkhout 47
2. Compilers and libraries
Output
[code/compilecxx] staticprogram:
for o in foosub.o ; do \
ar cr libs/libfoo.a ${o} ; \
done
clang++ -o staticprogram fooprog.o -Llibs -lfoo
-rwxr-xr-x 1 eijkhout staff 52536 Sep 27 04:57 staticprogram
.. running:
hello world
The nm command tells you what’s in the library, just like it did with object files, but now it also tells you
what object files are in the library:
We show this for C:
Code: Output
[code/compilec] staticlib:
==== Making static library
====
for o in foosub.o ; do ar cr foosub.o:
libs/libfoo.a ${o} ; done 0000000000000000 T bar
nm libs/libfoo.a 0000000000000000 N .debug_info_seg
U printf
For C++ we show the output in figure 2.5, where we note the -C flag for name demangling.
../lib/libfoo.so(single module):
00000fc4 t __dyld_func_lookup
00000000 t __mh_dylib_header
48 HPC Carpentry
2.3. Libraries
Code: Output
[code/compilecxx] staticlib:
==== Making static library
====
for o in foosub.o ; do ar cr foosub.o:
libs/libfoo.a ${o} ; done U __cxa_atexit
nm -C libs/libfoo.a 0000000000000000 N .debug_info_seg
U __dso_handle
U __gxx_personality_v0
0000000000000010 t __sti__$E
0000000000000000 T bar(std::__cxx11::basic_string<char, std::
char_traits<char>, std::allocator<char> >)
0000000000000000 b _INTERNALaee936d8::std::__ioinit
0000000000000000 W std::__cxx11::basic_string<char, std::
char_traits<char>, std::allocator<char> >::data() const
0000000000000000 W std::__cxx11::basic_string<char, std::
char_traits<char>, std::allocator<char> >::size() const
U std::ios_base::Init::Init()
U std::ios_base::Init::~Init()
U std::cout
U std::basic_ostream<char, std::char_traits<
char> >& std::operator<< <char, std::char_traits<char>, std
::allocator<char> >(std::basic_ostream<char, std::
char_traits<char> >&, std::__cxx11::basic_string<char, std
::char_traits<char>, std::allocator<char> > const&)
00000fd2 T _bar
U _printf
00001000 d dyld__mach_header
00000fb0 t dyld_stub_binding_helper
Shared libraries are not actually linked into the executable; instead, the executable needs the information
where the library is to be found at execution time. One way to do this is with LD_LIBRARY_PATH:
Victor Eijkhout 49
2. Compilers and libraries
Output
[code/compile] dynamicprogram:
.. running by itself:
50 HPC Carpentry
2.3. Libraries
Remark 10 On Apple OS Ventura the use of LD_LIBRARY_PATH is no longer supported for security reasons,
so using the rpath is the only option.
Use the command ldd to get information about what shared libraries your executable uses. (On Mac OS X,
use otool -L instead.)
Victor Eijkhout 51
Chapter 3
The Make utility helps you manage the building of projects: its main task is to facilitate rebuilding only
those parts of a multi-file project that need to be recompiled or rebuilt. This can save lots of time, since it
can replace a minutes-long full installation by a single file compilation. Make can also help maintaining
multiple installations of a program on a single machine, for instance compiling a library with more than
one compiler, or compiling a program in debug and optimized mode.
Make is a Unix utility with a long history, and traditionally there are variants with slightly different
behavior, for instance on the various flavors of Unix such as HP-UX, AUX, IRIX. These days, it is advisable,
no matter the platform, to use the GNU version of Make which has some very powerful extensions; it is
available on all Unix platforms (on Linux it is the only available variant), and it is a de facto standard. The
manual is available at http://www.gnu.org/software/make/manual/make.html, or you can read the
book [14].
There are other build systems, most notably Scons and Bjam. We will not discuss those here. The examples
in this tutorial will be for the C and Fortran languages, but Make can work with any language, and in fact
with things like TEX that are not really a language at all; see section 3.7.
3.1.1 C++
Make the following files:
foo.cxx
#include <iostream>
using std::cout;
#include "bar.h"
52
3.1. A simple example
int main()
{
int a=2;
cout << bar(a) << '\n';
return 0;
}
bar.cxx
#include "bar.h"
int bar(int a)
{
int b=10;
return b*a;
}
bar.h
int bar(int);
and a makefile:
Makefile
fooprog : foo.o bar.o
icpc -o fooprog foo.o bar.o
foo.o : foo.cxx
icpc -c foo.cxx
bar.o : bar.cxx
icpc -c bar.cxx
clean :
rm -f *.o fooprog
Victor Eijkhout 53
3. Managing projects with Make
Exercise. Do make clean, followed by mv foo.cxx boo.cxx and make again. Explain the error mes-
sage. Restore the original file name.
Expected outcome. Make will complain that there is no rule to make foo.cxx. This error was
caused when foo.cxx was a prerequisite for making foo.o, and was found not to exist. Make
then went looking for a rule to make it and no rule for creating .cxx files exists.
Now add a second argument to the function bar. This would require you to edit all of bar.cxx, bar.h,
and foo.cxx, but let’s say we forget to edit the last two, so only edit bar.cxx However, it also requires
you to edit foo.c, but let us for now ‘forget’ to do that. We will see how Make can help you find the
resulting error.
Exercise. Update the header file, and call make again. What happens, and what had you been hoping
would happen?
Expected outcome. Only the linker stage is done, and it gives the same error about an unresolved
reference. Were you hoping that the main program would be recompiled?
Caveats.
The way out of this problem is to tie the header file to the source files in the makefile.
In the makefile, change the line
foo.o : foo.cxx
to
foo.o : foo.cxx bar.h
which adds bar.h as a prerequisite for foo.o. This means that, in this case where foo.o already exists,
Make will check that foo.o is not older than any of its prerequisites. Since bar.h has been edited, it is
younger than foo.o, so foo.o needs to be reconstructed.
Remark 11 As already noted above, in C++ fewer errors are caught by this mechanism than in C, because
of polymorphism. You might wonder if it would be possible to generate header files automatically. This is of
course possible with suitable shell scripts, but tools such as Make (or CMake) do not have this built in.
54 HPC Carpentry
3.1. A simple example
3.1.2 C
Make the following files:
foo.c
#include "bar.h"
int c=3;
int d=4;
int main()
{
int a=2;
return(bar(a*c*d));
}
bar.c
#include "bar.h"
int bar(int a)
{
int b=10;
return(b*a);
}
bar.h
extern int bar(int);
and a makefile:
Makefile
fooprog : foo.o bar.o
cc -o fooprog foo.o bar.o
foo.o : foo.c
cc -c foo.c
bar.o : bar.c
cc -c bar.c
clean :
rm -f *.o fooprog
Victor Eijkhout 55
3. Managing projects with Make
Expected outcome. The above rules are applied: make without arguments tries to build the first
target, fooprog. In order to build this, it needs the prerequisites foo.o and bar.o, which do not
exist. However, there are rules for making them, which make recursively invokes. Hence you see
two compilations, for foo.o and bar.o, and a link command for fooprog.
Caveats. Typos in the makefile or in file names can cause various errors. In particular, make sure
you use tabs and not spaces for the rule lines. Unfortunately, debugging a makefile is not simple.
Make’s error message will usually give you the line number in the make file where the error was
detected.
Exercise. Do make clean, followed by mv foo.c boo.c and make again. Explain the error message.
Restore the original file name.
Expected outcome. Make will complain that there is no rule to make foo.c. This error was caused
when foo.c was a prerequisite for making foo.o, and was found not to exist. Make then went
looking for a rule to make it and no rule for creating .c files exists.
Now add a second argument to the function bar. This requires you to edit bar.c and bar.h: go ahead
and make these edits. However, it also requires you to edit foo.c, but let us for now ‘forget’ to do that.
We will see how Make can help you find the resulting error.
Expected outcome. Even through conceptually foo.c would need to be recompiled since it uses
the bar function, Make did not do so because the makefile had no rule that forced it.
to
foo.o : foo.c bar.h
which adds bar.h as a prerequisite for foo.o. This means that, in this case where foo.o already exists,
Make will check that foo.o is not older than any of its prerequisites. Since bar.h has been edited, it is
younger than foo.o, so foo.o needs to be reconstructed.
Exercise. Confirm that the new makefile indeed causes foo.o to be recompiled if bar.h is changed. This
compilation will now give an error, since you ‘forgot’ to edit the use of the bar function.
56 HPC Carpentry
3.1. A simple example
3.1.3 Fortran
Make the following files:
foomain.F
call func(1,2)
end program
foomod.F
contains
subroutine func(a,b)
integer a,b
print *,a,b,c
end subroutine func
end module
and a makefile:
Makefile
fooprog : foomain.o foomod.o
gfortran -o fooprog foo.o foomod.o
foomain.o : foomain.F
gfortran -c foomain.F
foomod.o : foomod.F
gfortran -c foomod.F
clean :
rm -f *.o fooprog
If you call make, the first rule in the makefile is executed. Do this, and explain what happens.
Exercise. Do make clean, followed by mv foomod.c boomod.c and make again. Explain the error mes-
sage. Restore the original file name.
Expected outcome. Make will complain that there is no rule to make foomod.c. This error was
caused when foomod.c was a prerequisite for foomod.o, and was found not to exist. Make then
went looking for a rule to make it, and no rule for making .F files exists.
Victor Eijkhout 57
3. Managing projects with Make
Expected outcome. Even through conceptually foomain.F would need to be recompiled, Make did
not do so because the makefile had no rule that forced it.
to
foomain.o : foomain.F foomod.o
which adds foomod.o as a prerequisite for foomain.o. This means that, in this case where foomain.o
already exists, Make will check that foomain.o is not older than any of its prerequisites. Recursively,
Make will then check if foomode.o needs to be updated, which is indeed the case. After recompiling
foomode.F, foomode.o is younger than foomain.o, so foomain.o will be reconstructed.
Exercise. Confirm that the corrected makefile indeed causes foomain.F to be recompiled.
58 HPC Carpentry
3.3. Variables and template rules
Exercise. Edit your makefile as indicated. First do make clean, then make foo (C) or make fooprog
(Fortran).
Expected outcome. You should see the exact same compile and link lines as before.
Caveats. Unlike in the shell, where braces are optional, variable names in a makefile have to be in
braces or parentheses. Experiment with what happens if you forget the braces around a variable
name.
One advantage of using variables is that you can now change the compiler from the commandline:
make CC="icc -O2"
make FC="gfortran -g"
Exercise. Invoke Make as suggested (after make clean). Do you see the difference in your screen output?
Expected outcome. The compile lines now show the added compiler option -O2 or -g.
Victor Eijkhout 59
3. Managing projects with Make
and use this variable instead of the program name in your makefile. This makes it easier to change your
mind about the name of the executable later.
Exercise. Edit your makefile to add this variable definition, and use it instead of the literal program name.
Construct a commandline so that your makefile will build the executable fooprog_v2.
Expected outcome. You need to specify the THEPROGRAM variable on the commandline using the
syntax make VAR=value.
Caveats. Make sure that there are no spaces around the equals sign in your commandline.
where the object file depends on the source file and another file.
We can take the commonalities and summarize them in one template rule1 :
%.o : %.c
${CC} -c $<
%.o : %.F
${FC} -c $<
This states that any object file depends on the C or Fortran file with the same base name. To regenerate
the object file, invoke the C or Fortran compiler with the -c flag. These template rules can function as a
replacement for the multiple specific targets in the makefiles above, except for the rule for foo.o.
The dependence of foo.o on bar.h, or foomain.o on foomod.o, can be handled by adding a rule
# C
foo.o : bar.h
# Fortran
foomain.o : foomod.o
1. This mechanism is the first instance you’ll see that only exists in GNU make, though in this particular case there is a similar
mechanism in standard make. That will not be the case for the wildcard mechanism in the next section.
60 HPC Carpentry
3.3. Variables and template rules
with no further instructions. This rule states, ‘if file bar.h or foomod.o changed, file foo.o or foomain.o
needs updating’ too. Make will then search the makefile for a different rule that states how this updating
is done, and it will find the template rule.
Figure 3.1: File structure with main program and two library files.
Changing a source file only recompiles that files: clang++ -o main libmain.o libf.o
libg.o
clang++ -c libf.cxx
clang++ -o main \ Changing the libapi.h recompiles everything:
libmain.o libf.o libg.o
clang++ -c libmain.cxx
Changing the implementation header only recom- clang++ -c libf.cxx
piles the library: clang++ -c libg.cxx
clang++ -o main libmain.o libf.o
clang++ -c libf.cxx libg.o
clang++ -c libg.cxx
For Fortran we don’t have header files so we use modules everywhere; figure 3.3. If you know how to use
submodules, a Fortran2008 feature, you can make the next exercise as efficient as the C version.
Exercise 3.2. Write a makefile for the following structure:
• There is one main file libmain.f90, that uses a module api.f90;
• There are two low level modules libf.f90 libg.f90 that are used in api.f90.
If you use modules, you’ll likely be doing more compilation than needed. For the optimal
solution, use submodules.
Victor Eijkhout 61
3. Managing projects with Make
Figure 3.3: File structure with main program and two library files.
3.3.3 Wildcards
Your makefile now uses one general rule for compiling any source file. Often, your source files will be
all the .c or .F files in your directory, so is there a way to state ‘compile everything in this directory’?
Indeed there is.
Add the following lines to your makefile, and use the variable COBJECTS or FOBJECTS wherever appro-
priate. The command wildcard gives the result of ls, and you can manipulate file names with patsubst.
# wildcard: find all files that match a pattern
CSOURCES := ${wildcard *.c}
# pattern substitution: replace one pattern string by another
COBJECTS := ${patsubst %.c,%.o,${SRC}}
62 HPC Carpentry
3.3. Variables and template rules
3.3.5 Conditionals
There are various ways of making the behavior of a makefile dynamic. You can for instance put a shell
conditional in an action line. However, this can make for a cluttered makefile; an easier way is to use
makefile conditionals. There are two types of conditionals: tests on string equality, and tests on environ-
ment variables.
The first type looks like
ifeq "${HOME}" "/home/thisisme"
# case where the executing user is me
else ifeq "${HOME}" "/home/buddyofmine"
# case for other user
else
# case where it's someone else
endif
Victor Eijkhout 63
3. Managing projects with Make
The text in the true and false part can be most any part of a makefile. For instance, it is possible to let one
of the action lines in a rule be conditionally included. However, most of the times you will use conditionals
to make the definition of variables dependent on some condition.
Exercise. Let’s say you want to use your makefile at home and at work. At work, your employer has a
paid license to the Intel compiler icc, but at home you use the open source Gnu compiler gcc. Write a
makefile that will work in both places, setting the appropriate value for CC.
3.4 Miscellania
3.4.1 Phony targets
The example makefile contained a target clean. This uses the Make mechanisms to accomplish some
actions that are not related to file creation: calling make clean causes Make to reason ‘there is no file
called clean, so the following instructions need to be performed’. However, this does not actually cause
a file clean to spring into being, so calling make clean again will make the same instructions being
executed.
To indicate that this rule does not actually make the target, you use the .PHONY keyword:
.PHONY : clean
Most of the time, the makefile will actually work fine without this declaration, but the main benefit of
declaring a target to be phony is that the Make rule will still work, even if you have a file (or folder) named
clean.
3.4.2 Directories
It’s a common strategy to have a directory for temporary material such as object files. So you would have
a rule
obj/%.o : %.c
${CC} -c $< -o $@
This raises the question how the obj directory is created. You could do:
obj/%.o : %.c
mkdir -p obj
${CC} -c $< -o $@
64 HPC Carpentry
3.4. Miscellania
obj :
mkdir -p obj
obj/%.o : %.c | obj
${CC} -c $< -o $@
This only tests for the existence of the object directory, but not its timestamp.
and likewise for make other. What goes wrong here is the use of [email protected] as prerequisite. In Gnu Make,
you can repair this as follows2 :
.SECONDEXPANSION:
${PROGS} : [email protected]
${CC} -o $@ [email protected] ${list of libraries goes here}
Exercise. Write a second main program foosecond.c or foosecond.F, and change your makefile so
that the calls make foo and make foosecond both use the same rule.
2. Technical explanation: Make will now look at lines twice: the first time $$ gets converted to a single $, and in the second
pass $@ becomes the name of the target.
Victor Eijkhout 65
3. Managing projects with Make
In the makefiles you have seen so far, the command part was a single line. You can actually have as many
lines there as you want. For example, let us make a rule for making backups of the program you are
building.
Add a backup rule to your makefile. The first thing it needs to do is make a backup directory:
.PHONY : backup
backup :
if [ ! -d backup ] ; then
mkdir backup
fi
Did you type this? Unfortunately it does not work: every line in the command part of a makefile rule gets
executed as a single program. Therefore, you need to write the whole command on one line:
backup :
if [ ! -d backup ] ; then mkdir backup ; fi
(Writing a long command on a single is only possible in the bash shell, not in the csh shell. This is one
reason for not using the latter.)
Next we do the actual copy:
backup :
if [ ! -d backup ] ; then mkdir backup ; fi
cp myprog backup/myprog
But this backup scheme only saves one version. Let us make a version that has the date in the name of
the saved program.
The Unix date command can customize its output by accepting a format string. Type the following:
date This can be used in the makefile.
Exercise. Edit the cp command line so that the name of the backup file includes the current date.
Expected outcome. Hint: you need the backquote. Consult the Unix tutorial, section 1.5.3, if you
do not remember what backquotes do.
If you are defining shell variables in the command section of a makefile rule, you need to be aware of the
following. Extend your backup rule with a loop to copy the object files:
66 HPC Carpentry
3.6. Practical tips for using Make
(This is not the best way to copy, but we use it for the purpose of demonstration.) This leads to an error
message, caused by the fact that Make interprets $f as an environment variable of the outer process. What
works is:
backup :
if [ ! -d backup ] ; then mkdir backup ; fi
cp myprog backup/myprog
for f in ${OBJS} ; do \
cp $$f backup ; \
done
(In this case Make replaces the double dollar by a single one when it scans the commandline. During the
execution of the commandline, $f then expands to the proper filename.)
and keep repeating this. There is a danger in this: if the make fails, for instance because of com-
pilation problems, your program will still be executed. Instead, write
make myprogram && ./myprogram -options
3. There is a convention among software developers that a package can be installed by the sequence ./configure ; make ;
make install, meaning: Configure the build process for this computer, Do the actual build, Copy files to some system directory
such as /usr/bin.
Victor Eijkhout 67
3. Managing projects with Make
info :
@echo "The following are possible:"
@echo " make"
@echo " make clean"
Now make without explicit targets informs you of the capabilities of the makefile.
If your makefile gets longer, you might want to document each section like this. This runs into a problem:
you can not have two rules with the same target, info in this case. However, if you use a double colon it
is possible. Your makefile would have the following structure:
info ::
@echo "The following target are available:"
@echo " make install"
install :
# ..... instructions for installing
info ::
@echo " make clean"
clean :
# ..... instructions for cleaning
%.pdf : %.tex
pdflatex $<
The command make myfile.pdf will invoke pdflatex myfile.tex, if needed, once. Next we repeat
invoking pdflatex until the log file no longer reports that further runs are needed:
%.pdf : %.tex
pdflatex $<
while [ `cat ${basename $@}.log | grep "Rerun to get" \
| wc -l` -gt 0 ] ; do \
pdflatex $< ; \
done
We use the ${basename fn} macro to extract the base name without extension from the target name.
In case the document has a bibliography or index, we run bibtex and makeindex.
68 HPC Carpentry
3.7. A Makefile for LATEX
%.pdf : %.tex
pdflatex ${basename $@}
-bibtex ${basename $@}
-makeindex ${basename $@}
while [ `cat ${basename $@}.log | grep "Rerun to get" \
| wc -l` -gt 0 ] ; do \
pdflatex ${basename $@} ; \
done
The minus sign at the start of the line means that Make should not exit if these commands fail.
Finally, we would like to use Make’s facility for taking dependencies into account. We could write a
makefile that has the usual rules
mainfile.pdf : mainfile.tex includefile.tex
but we can also discover the include files explicitly. The following makefile is invoked with
make pdf TEXTFILE=mainfile
The pdf rule then uses some shell scripting to discover the include files (but not recursively), and it calls
Make again, invoking another rule, and passing the dependencies explicitly.
pdf :
export includes=`grep "^.input " ${TEXFILE}.tex \
| awk '{v=v FS $$2".tex"} END {print v}'` ; \
${MAKE} ${TEXFILE}.pdf INCLUDES="$$includes"
This shell scripting can also be done outside the makefile, generating the makefile dynamically.
Victor Eijkhout 69
Chapter 4
Some people create the build directory in the source tree, in which case the CMake command is
cmake ..
Others put the build directory next to the source, in which case:
cmake ../src_directory
2. The build stage. Here the installation-specific compilation in the build directory is performed.
With Make as the ‘generator’ this would be
cd build
make
Alternatively, you could use generators such as ninja, Visual Studio, or XCode:
cmake -G ninja
## the usual arguments
3. The install stage. This can move binary files to a permanent location, such as putting library files
in /usr/lib:
make install
or
70
4.1. CMake as build system
General directives
cmake_minimum_required specify minimum cmake version
project name and version number of this project
install specify directory where to install targets
Project building directives
add_executable specify executable name and source files for it
add_library specify library name and files to go into it
add_subdirectory specify subdirectory where cmake also needs to
run
target_link_libraries specify executable and libraries to link into it
target_include_directories specify include directories, privately or publicly
find_package other package to use in this build
Utility stuff
target_compile_options literal options to include
target_compile_features things that will be translated by cmake into op-
tions
target_compile_definitions macro definitions to be set private or publicly
file define macro as file list
message Diagnostic to print, subject to level specification
Control
if() else() endif() conditional
Victor Eijkhout 71
4. The Cmake build system
However, the install location already has to be set in the configuration stage. We will see later in
detail how this is done.
Summarizing, the out-of-source workflow as advocated in this tutorial is
ls some_package_1.0.0 # we are outside the source
ls some_package_1.0.0/CMakeLists.txt # source contains cmake file
mkdir builddir && cd builddir # goto build location
cmake -D CMAKE_INSTALL_PREFIX=../installdir \
../some_package_1.0.0
make
make install
dir dir
src src
build build
install install
Usage requirements:
target_some_requirement( the_target PUBLIC the require ments )
4.1.2 Languages
CMake is largely aimed at C++, but it easily supports C as well. For Fortran support, first do
enable_language(Fortran)
Note that capitalization: this also holds for all variables such as CMAKE_Fortran_COMPILER.
72 HPC Carpentry
4.1. CMake as build system
CMake is driven by the CMakeLists.txt file. This needs to be in the root directory of your project. (You
can additionally have files by that name in subdirectories.)
Since CMake has changed quite a bit over the years, and is still evolving, it is a good idea to start each
script with a declaration of the (minimum) required version:
cmake_minimum_required( VERSION 3.12 )
You also need to declare a project name and version, which need not correspond to any file names:
project( myproject VERSION 1.0 )
Victor Eijkhout 73
4. The Cmake build system
74 HPC Carpentry
4.2. Examples cases
If there is only one source file, the previous section is all you
need. However, often you will build libraries. You declare those dir
with an add_library clause:
add_library( auxlib aux.cxx aux.h ) src
Next, you need to link that library into the program: CMakeLists.txt
target_link_libraries( program PRIVATE auxlib )
program.cxx
The PRIVATE clause means that the library is only for purposes
of building the executable. (Use PUBLIC to have the library be aux.cxx
included in the installation; we will explore that in section 4.2.2.2.)
aux.h
The full CMakeLists.txt:
cmake_minimum_required( VERSION 3.12 ) install
project( cmakeprogram VERSION 1.0 )
Note that private shared libraries make no sense, as they will give runtime unresolved references.
Victor Eijkhout 75
4. The Cmake build system
On the other hand, if we edit a header file, the main program needs to be recompiled too:
----------------
touch a source file and make:
Consolidate compiler generated dependencies of target auxlib
[ 25%] Building CXX object CMakeFiles/auxlib.dir/aux.cxx.o
[ 50%] Linking CXX static library libauxlib.a
[ 50%] Built target auxlib
Consolidate compiler generated dependencies of target program
[ 75%] Linking CXX executable program
[100%] Built target program
CMakeLists.txt
or adding a runtime flag
cmake -D BUILD_SHARED_LIBS=TRUE program.cxx
76 HPC Carpentry
4.2. Examples cases
#include "aux.h"
CMakeLists.txt
int main() {
aux1(); program.cxx
aux2();
return 0;
} src
To make sure the header file gets found during the build, you
aux.h
specify that include path with target_include_directories:
target_include_directories( install
program PRIVATE
"${CMAKE_CURRENT_SOURCE_DIR}/inc" )
program
It is best to make such paths relative to CMAKE_CURRENT_SOURCE_DIR
, or the source root CMAKE_SOURCE_DIR, or equivalently PROJECT_SOURCE_DIR
Usually, when you start making such directory structure, you will also have sources in subdirectories. If
you only need to compile them into the main executable, you could list them into a variable
set( SOURCES program.cxx src/aux.cxx )
and use that variable. However, this is deprecated practice; it is recommended to use target_sources:
target_sources( program PRIVATE src/aux1.cxx src/aux2.cxx )
Victor Eijkhout 77
4. The Cmake build system
78 HPC Carpentry
4.2. Examples cases
to build the library file from the sources indicated, and to in- install
stall it in a lib subdirectory.
We also add a clause to install the header files in an include directory: program
install( FILES aux.h DESTINATION include )
lib
For installing multiple files, use
install(DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} libauxlib.so
DESTINATION ${LIBRARY_OUTPUT_PATH}
FILES_MATCHING PATTERN "*.h") include
One problem is to tell the executable where to find the library. For this we use the rpath mechanism.
aux.h
By default, CMake sets it so that the executable in the build location can find the library. If you use a
non-trivial install prefix, the following lines work:
set( CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_PREFIX}/lib" )
set( CMAKE_INSTALL_RPATH_USE_LINK_PATH TRUE )
Victor Eijkhout 79
4. The Cmake build system
80 HPC Carpentry
4.3. Finding and using external packages
Victor Eijkhout 81
4. The Cmake build system
Some libraries come with a FOOConfig.cmake file, which is searched on the CMAKE_PREFIX_PATH through
find_library. If it is found, you can test the variable it is supposed to set:
find_library( FOOLIB foo )
if (FOOLIB)
target_link_libraries( myapp PRIVATE ${FOOLIB} )
else()
# throw an error
endif()
82 HPC Carpentry
4.3. Finding and using external packages
find_package( MPI )
Fortran version:
cmake_minimum_required( VERSION 3.12 )
project( ${PROJECT_NAME} VERSION 1.0 )
enable_language(Fortran)
find_package( MPI )
if( MPI_Fortran_HAVE_F08_MODULE )
else()
message( FATAL_ERROR "No f08 module for this MPI" )
endif()
Victor Eijkhout 83
4. The Cmake build system
Intel compiler installations come with CMake support: there is a file MKLConfig.cmake.
#include "mkl_cblas.h"
int main() {
vector<double> values{1,2,3,2,1};
auto maxloc = cblas_idamax ( values.size(),values.data(),1);
cout << "Max abs at: " << maxloc << " (s/b 2)" << '\n';
return 0;
}
The following configuration file lists the various options and such:
cmake_minimum_required( VERSION 3.12 )
project( mklconfigfind VERSION 1.0 )
## https://www.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/
top/getting-started/cmake-config-for-onemkl.html
84 HPC Carpentry
4.3. Finding and using external packages
which you can then use in the target_include_directories and target_link_directories target_link_libraries
commands.
Victor Eijkhout 85
4. The Cmake build system
In the following example, we use the fmtlib. The main CMake file:
cmake_minimum_required( VERSION 3.12 )
project( pkgconfiglib VERSION 1.0 )
86 HPC Carpentry
4.3. Finding and using external packages
add_subdirectory( prolib )
target_link_libraries( program PUBLIC prolib )
Library file:
project( prolib )
Name: @PROJECT_NAME@
Victor Eijkhout 87
4. The Cmake build system
Description: @CMAKE_PROJECT_DESCRIPTION@
Version: @PROJECT_VERSION@
Cflags: -I${includedir}
Libs: -L${libdir} -l@libtarget@
Alternatively, set environment variables CC, CXX, FC by the explicit paths of the compilers. For examples,
for Intel compilers:
export CC=`which icc`
export CXX=`which icpc`
export FC=`which ifort`
88 HPC Carpentry
4.4. Customizing the compilation process
## from https://youtu.be/eC9-iRN2b04?t=1548
if (MVSC)
add_compile_options(/W3 /WX)
else()
add_compile_options(-W -Wall -Werror)
endif()
The variable CMAKE_CXX_COMPILE_FEATURES contains the list of all features you can set.
Optimization flags can be set by specifying the CMAKE_BUILD_TYPE:
• Debug corresponds to the -g flag;
• Release corresponds to -O3 -DNDEBUG;
• MinSizeRel corresponds to -Os -DNDEBUG
• RelWithDebInfo corresponds to -O2 -g -DNDEBUG.
This variable will often be set from the commandline:
cmake .. -DCMAKE_BUILD_TYPE=Release
Unfortunately, this seems to be the only way to influence optimization flags, other than explicitly setting
compiler flags; see next point.
Victor Eijkhout 89
4. The Cmake build system
The CMakeLists.txt file is a script, though it doesn’t much look like it.
• Instructions consist of a command, followed by a parenthesized list of arguments.
• (All arguments are strings: there are no numbers.)
• Each command needs to start on a new line, but otherwise whitespace and line breaks are ignored.
Comments start with a hash character.
Instead of STATUS you can specify other logging levels (this parameter is actually called ‘mode’ in the
documentation); running for instance
cmake --log-level=NOTICE
90 HPC Carpentry
4.5. CMake scripting
4.5.3 Variables
Variables are set with set, or can be given on the commandline:
cmake -D MYVAR=myvalue
Variables can also be queried by the CMake script using the option command:
option( SOME_FLAG "A flag that has some function" defaultvalue )
Some variables are set by other commands. For instance the project command sets PROJECT_NAME and
PROJECT_VERSION.
4.5.4.2 Looping
while( myvalue LESS 50 )
message( stuff )
endwhile()
Victor Eijkhout 91
4. The Cmake build system
92 HPC Carpentry
Chapter 5
In this tutorial you will learn git, the currently most popular version control (also source code control or
revision control) systems. Other similar systems are Mercurial and Microsoft Sharepoint. Earlier systems
were SCCS, CVS, Subversion, Bitkeeper.
Version control is a system that tracks the history of a software project, by recording the successive
versions of the files of the project. These versions are recorded in a repository, either on the machine you
are working on, or remotely.
This has many practical advantages:
• It becomes possible to undo changes;
• Sharing a repository with another developer makes collaboration possible, including multiple
edits on the same file.
• A repository records the history of the project.
• You can have multiple versions of the project, for instance for exploring new features, or for
customization for certain users.
The use of a version control system is industry standard practice, and git is by far the most popular system
these days.
93
5. Source code control through Git
Modern version control systems allow you to have multiple branches, even in the same local repository.
This way you can have a main branch for the release version of the project, and one or more development
branches for exploring new features.
5.2 Git
This lab should be done two people, to simulate a group of programmers working on a joint project. You
can also do this on your own by using two clones of the repository, preferably opening two windows on
your computer.
Best practices for distributed version control: https://homes.cs.washington.edu/~mernst/advice/
version-control.html
94 HPC Carpentry
5.3. Create and populate a repository
This gives you a directory with the contents of the repository. If you leave out the local name, the directory
will have the name of the repository.
Cmd >> git clone https://github.com/TACC/empty.git
↪empty
Out >>
Cloning into 'empty'...
warning: You appear to have cloned an empty repository.
Cmd >> cd empty
Cmd >> ls -a
Out >>
. Clone an empty repository and
.. check that it is indeed empty
.git
Cmd >> git status
Out >>
On branch main
No commits yet
nothing to commit (create/copy files and use "git add"
↪to track)
Victor Eijkhout 95
5. Source code control through Git
The disadvantage of this method, over cloning an empty repo, is that you now have to connect your
directory to a remote repository. See section 5.6.
96 HPC Carpentry
5.3. Create and populate a repository
Victor Eijkhout 97
5. Source code control through Git
You need to git add on your file to tell git that the file belongs to the repository. (You can add a single
file, or use a wildcard to add multiple.) However, this does not actually add the file: it moves it to the
staging area. The status now says that it is a change to be committed.
Cmd >> git add firstfile
Cmd >> git status
Out >>
On branch main
No commits yet
Add the file to the local repository
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: firstfile
98 HPC Carpentry
5.4. Adding and changing files
The git add and git commit commands need to be repeated if you make any changes to a file in the
repository:
When you make changes to a file that has previous been added and committed, the git status command
will list it as ‘modified’.
Cmd >> echo bar >> firstfile
Cmd >> cat firstfile
Out >>
foo
bar
Cmd >> git status
Out >>
On branch main
Changes not staged for commit:
Make changes to a file that is
(use "git add <file>..." to update what will be tracked.
↪committed)
(use "git restore <file>..." to discard changes in
↪working directory)
modified: firstfile
no changes added to commit (use "git add" and/or "git
↪commit -a")
If you need to check what changes you have made, git diff on that file will tell you the differences the
between the edited, but not yet added or committed, file and the previous commit version.
Cmd >> git diff firstfile
Out >>
diff --git a/firstfile b/firstfile
index 257cc56..3bd1f0e 100644
--- a/firstfile See what the changes were wrt the
+++ b/firstfile previously commit version.
@@ -1 +1,2 @@
foo
+bar
You now need to repeat git add and git commit on that file.
Cmd >> git add firstfile
Cmd >> git commit -m "changes to first file"
Out >>
[main b1edf77] changes to first file
1 file changed, 1 insertion(+) Commit the changes to the local
Cmd >> git status repo.
Out >>
On branch main
nothing to commit, working tree clean
Doing git log will give you the history of the repository, listing the commit numbers, and the messages
that you entered on those commits.
Victor Eijkhout 99
5. Source code control through Git
Doing git checkout on that file gets the last committed version and puts it back in your working direc-
tory.
Cmd >> git checkout firstfile
Out >>
Updated 1 path from the index
Cmd >> cat firstfile
Out >>
foo Restore previously committed ver-
bar sion.
Cmd >> git status
Out >>
On branch main
nothing to commit, working tree clean
Now do:
git checkout sdf234987238947 -- myfile myotherfile
This will restore the file to its state before the last add and commit, and it will in generally leave the
repository back in the state it was before that commit.
Cmd >> cat firstfile
Out >>
foo
Cmd >> git status See that we have indeed undone the
Out >> commit.
On branch main
nothing to commit, working tree clean
However, the log will show that you have reverted a certain commit.
Cmd >> git log
Out >>
commit 3dca724a1902e8a5e3dba007c325542c6753a424
Author: Victor Eijkhout <[email protected]>
Date: Sat Jan 29 14:14:42 2022 -0600
Revert "changes to first file"
The git reset command can also be used for various types of undo.
We have some changes, added to the local repository with git add and git commit
Cmd >> git add newfile && git commit -m "adding first
↪file"
Out >>
[main 8ce1de4] adding first file
Committed changes.
1 file changed, 1 insertion(+)
create mode 100644 newfile
Finally, you can git push committed changes to this remote. Git doesn’t just push everything here: since
you can have multiple branches locally, and multiple upstreams remotely, you intially specify both:
git push -u servername branchname
but when you git push for the first time you get some permission-related errors.
Do
git remote -v
# output: origin https://[email protected]/username/reponame.git
git remote set-url origin [email protected]:username/reponame.git
Create another clone in person2. Normally the cloned repositories would be two user accounts, or the
accounts of one user on two machines.
Cmd >> git clone [email protected]:TACC/tinker.git person2
Out >> Person 2 makes a clone.
Cloning into 'person2'...
Now the first user creates a file, adds, commits, and pushes it. (This of course requires an upstream to be
set, but since we did a git clone, this is automatically done.)
Cmd >> ( cd person1 && echo 123 >> p1 && git add p1 &&
↪git commit -m "add p1" && git push )
Out >>
[main 6f6b126] add p1
1 file changed, 1 insertion(+)
Person 1 adds a file and pushes it.
create mode 100644 p1
To github.com:TACC/tinker.git
8863559..6f6b126 main -> main
to get these changes. Again, because we create the local repository by git clone it is clear where the pull
is coming from. The pull message will tell us what new files are created, or how many other files were
changes.
Cmd >> ( cd person2 && git pull )
Out >>
From github.com:TACC/tinker
8863559..6f6b126 main -> origin/main
Updating 8863559..6f6b126 Person 2 pulls, getting the new file.
Fast-forward
p1 | 1 +
1 file changed, 1 insertion(+)
create mode 100644 p1
The first user makes an edit on the first line; we confirm the state of the file;
Cmd >> ( cd person1 && sed -i -e '1s/1/one/' fourlines
↪&& cat fourlines )
Out >>
one Person 1 makes a change.
2
3
4
The other user also makes a change, but on line 4, so that there is no conflict;
Cmd >> ( cd person2 && sed -i -e '4s/4/four/'
↪fourlines && cat fourlines )
Out >>
1 Person 2 makes a different change to
2 the same file.
3
four
This change is added with git add and git commit, but we proceed more cautiously in pushing: first we
pull any changes made by others with
git pull --no-edit
git push
Cmd >> ( cd person2 && git add fourlines && git commit
↪-m "edit line four" && git pull --no-edit && git
↪push )
Out >>
[main 27fb2b2] edit line four
1 file changed, 1 insertion(+), 1 deletion(-)
From github.com:TACC/tinker
fdd70b7..6767e3f main -> origin/main
This change does not conflict, we can
Auto-merging fourlines pull/push.
Merge made by the 'recursive' strategy.
fourlines | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
To github.com:TACC/tinker.git
6767e3f..62bd424 main -> main
Now if the first user does a pull, they see all the merged changes.
Cmd >> ( cd person1 && git pull && cat fourlines )
Out >>
From github.com:TACC/tinker
6767e3f..62bd424 main -> origin/main
Updating 6767e3f..62bd424
Fast-forward
fourlines | 2 +-
Person 1 pulls to get all the changes.
1 file changed, 1 insertion(+), 1 deletion(-)
one
2
3
four
In the meantime, developer 2 makes another change, to the original file. This change can be added and
committed to the local repository without any problem.
Cmd >> ( cd person2 && sed -i -e '2s/2/two/' fourlines
↪&& cat fourlines && git add fourlines && git
↪commit -m "edit line two" )
Out >>
1
two
Change the 2 on line two to two.
3 We add and commit this to the local
4 repository.
[main c9b6ded] edit line two
1 file changed, 1 insertion(+), 1 deletion(-)
However, if we try to git push this change to the remote repository, we get an error that the remote is
ahead of the local repository. So we first pull the state of the remote repository. In the previous section
this led to an automatic merge; not so here.
5.7 Branching
With a branch you can keep a completely separate version of all the files in your project.
Initially we have a file on the main branch.
Cmd >> cat firstfile
Out >>
foo
Cmd >> git status We have a file, committed and all.
Out >>
On branch main
nothing to commit, working tree clean
If we switch back to the main branch, everything is as before when we made the dev branch.
Cmd >> git checkout main && cat firstfile && git status
Out >>
Switched to branch 'main'
foo
The other branch is still unchanged.
On branch main
nothing to commit, working tree clean
The first time you try to push a new branch you need to establish it upstream:
git push --set-upstream origin mynewbranch
We switch to the dev branch and make another file. The change in the main branch is indeed not here.
Cmd >> git checkout dev
Out >>
Switched to branch 'dev'
Cmd >> sed -i -e '4s/4/four/' fourlines && cat
↪fourlines
Out >>
1
2
On line 4, change 4 to four. This
3 change is far enough away from the
four other change, that there should be no
Cmd >> git add fourlines && git commit -m "edit line 4" conflict.
Out >>
[dev dbb0c03] edit line 4
1 file changed, 1 insertion(+), 1 deletion(-)
If two developers make changes on the same line, or on adjacent lines, git will not be able to merge and
you have to edit the file as in section 5.6.4.
remote repo
git fe
local repo
working directory
Figure 5.1: Add local changes to the remote repository (left); Get changes that were made to the remote
repository (right).
5.8 Releases
At certain point in your development process you may want to mark the current state of the repository
as ‘finished’. You can do this by
1. Attaching tag to the state of the repository, or
2. Creating an archive: a released version that has the repo information stripped.
5.8.1 Tags
A tag is a marked state of the repository. There are two types of tags:
1. light-weight tags are no more than a synonym for a commit:
git tag v0.09
You list all tags with git tag, you get information on a tag with git show v0.1, and you push a tag to a
remote server with
git push origin v0.1
but beware that changes you now make can not be pushed to anything: this is a ‘detached HEAD’ state.
If you want to fix bugs in a tagged state, you can create a branch based on the tag:
git checkout -b version0.2 v0.1
In this section we will discuss libraries for dense linear algebra operations.
Dense linear algebra, that is linear algebra on matrices that are stored as two-dimensional arrays (as
opposed to sparse linear algebra; see HPC book, section 5.4, as well as the tutorial on PETSc Parallel
Programming book, part III) has been standardized for a considerable time. The basic operations are defined
by the three levels of Basic Linear Algebra Subprograms (BLAS):
• Level 1 defines vector operations that are characterized by a single loop [13].
• Level 2 defines matrix vector operations, both explicit such as the matrix-vector product, and
implicit such as the solution of triangular systems [7].
• Level 3 defines matrix-matrix operations, most notably the matrix-matrix product [6].
The name ‘BLAS’ suggests a certain amount of generality, but the original authors were clear [13] that
these subprograms only covered dense linear algebra. Attempts to standardize sparse operations have
never met with equal success.
Based on these building blocks, libraries have been built that tackle the more sophisticated problems such
as solving linear systems, or computing eigenvalues or singular values. Linpack 1 and Eispack were the first
to formalize these operations involved, using Blas Level 1 and Blas Level 2 respectively. A later develop-
ment, Lapack uses the blocked operations of Blas Level 3. As you saw in section HPC book, section 1.6.1,
this is needed to get high performance on cache-based CPUs.
With the advent of parallel computers, several projects arose that extended the Lapack functionality to
distributed computing, most notably Scalapack [4, 2], PLapack [23, 22], and most recently Elemental [19].
These packages are harder to use than Lapack because of the need for a two-dimensional cyclic distri-
bution; sections HPC book, section 7.2.3 and HPC book, section 7.3.2. We will not go into the details
here.
1. The linear system solver from this package later became the Linpack benchmark; see section HPC book, section 2.11.5.
116
6.1. Some general remarks
• Computational routines. These are the routines that drivers are built up out of. A user may have
occasion to call them by themselves.
• Auxiliary routines.
Expert driver names end on ’X’.
• Linear system solving. Simple drivers: -SV (e.g., DGESV) Solve 𝐴𝑋 = 𝐵, overwrite A with LU (with
pivoting), overwrite B with X.
Expert driver: -SVX Also transpose solve, condition estimation, refinement, equilibration
• Least squares problems. Drivers:
xGELS using QR or LQ under full-rank assumption
xGELSY ”complete orthogonal factorization”
xGELSS using SVD
xGELSD using divide-conquer SVD (faster, but more workspace than xGELSS)
Also: LSE & GLM linear equality constraint & general linear model
• Eigenvalue routines. Symmetric/Hermitian: xSY or xHE (also SP, SB, ST)
simple driver -EV
expert driver -EVX
divide and conquer -EVD
relative robust representation -EVR
General (only xGE)
Schur decomposition -ES and -ESX
eigenvalues -EV and -EVX
SVD (only xGE)
simple driver -SVD
divide and conquer SDD
Generalized symmetric (SY and HE; SP, SB)
simple driver GV
expert GVX
divide-conquer GVD
Nonsymmetric:
Schur: simple GGES, expert GGESX
eigen: simple GGEV, expert GGEVX
svd: GGSVD
On the other hand, many LAPACK routines can be based on the matrix-matrix product (BLAS routine
gemm), which you saw in section HPC book, section 7.4.1 has the potential for a substantial fraction of
peak performance. To achieve this, you should use an optimized version, such as
A simple example:
// example1.F90
do i=1,n
xarray(i) = 1.d0
end do
call dscal(n,scale,xarray,1)
do i=1,n
if (.not.assert_equal( xarray(i),scale )) print *,"Error in index",i
end do
The same in C:
// example1c.cxx
xarray = new double[n]; yarray = new double[n];
Many routines have an increment parameter. For xscale that’s the final parameter:
// example2.F90
integer :: inc=2
call dscal(n/inc,scale,xarray,inc)
do i=1,n
if (mod(i,inc)==1) then
if (.not.assert_equal( xarray(i),scale )) print *,"Error in index",i
else
if (.not.assert_equal( xarray(i),1.d0 )) print *,"Error in index",i
end if
end do
The matrix-vector product xgemv computes 𝑦 ← 𝛼𝐴𝑥 + 𝛽𝑦, rather than 𝑦 ← 𝐴𝑥. The specification of
the matrix takes the M,N size parameters, and a character argument 'N' to indicate that the matrix is not
transposed. Both of the vectors have an increment argument.
subroutine dgemv(character TRANS,
integer M,integer N,
double precision ALPHA,
double precision, dimension(lda,*) A,integer LDA,
double precision, dimension(*) X,integer INCX,
double precision BETA,double precision, dimension(*) Y,integer INCY
)
The same example in C has an extra parameter to indicate whether the matrix is stored in row or column
major storage:
// example3c.cxx
for (int j=0; j<n; j++) {
xarray[j] = 1.;
for (int i=0; i<m; i++)
matrix[ i+j*m ] = 1.;
}
There are many ways of storing data, in particular data that comes in arrays. A surprising number of
people stores data in spreadsheets, then exports them to ascii files with comma or tab delimiters, and
expects other people (or other programs written by themselves) to read that in again. Such a process is
wasteful in several respects:
• The ascii representation of a number takes up much more space than the internal binary repre-
sentation. Ideally, you would want a file to be as compact as the representation in memory.
• Conversion to and from ascii is slow; it may also lead to loss of precision.
For such reasons, it is desirable to have a file format that is based on binary storage. There are a few more
requirements on a useful file format:
• Since binary storage can differ between platforms, a good file format is platform-independent.
This will, for instance, prevent the confusion between big-endian and little-endian storage, as
well as conventions of 32 versus 64 bit floating point numbers.
• Application data can be heterogeneous, comprising integer, character, and floating point data.
Ideally, all this data should be stored together.
• Application data is also structured. This structure should be reflected in the stored form.
• It is desirable for a file format to be self-documenting. If you store a matrix and a right-hand side
vector in a file, wouldn’t it be nice if the file itself told you which of the stored numbers are the
matrix, which the vector, and what the sizes of the objects are?
This tutorial will introduce the HDF5 library, which fulfills these requirements. HDF5 is a large and com-
plicated library, so this tutorial will only touch on the basics. For further information, consult http:
//www.hdfgroup.org/HDF5/. While you do this tutorial, keep your browser open on http://www.
hdfgroup.org/HDF5/doc/ or http://www.hdfgroup.org/HDF5/RM/RM_H5Front.html for the ex-
act syntax of the routines.
123
7. Scientific Data Storage
section you will learn to write programs that write to and read from HDF5 files. In order to check that the
files are as you intend, you can use the h5dump utility on the command line.1
Just a word about compatibility. The HDF5 format is not compatible with the older version HDF4, which
is no longer under development. You can still come across people using hdf4 for historic reasons. This
tutorial is based on HDF5 version 1.6. Some interfaces changed in the current version 1.8; in order to use
1.6 APIs with 1.8 software, add a flag -DH5_USE_16_API to your compile line.
Many HDF5 routines are about creating objects: file handles, members in a dataset, et cetera. The general
syntax for that is
hid_t h_id;
h_id = H5Xsomething(...);
Failure to create the object is indicated by a negative return parameter, so it would be a good idea to create
a file myh5defs.h containing:
#include "hdf5.h"
#define H5REPORT(e) \
{if (e<0) {printf("\nHDF5 error on line %d\n\n",__LINE__); \
return e;}}
hid_t h_id;
h_id = H5Xsomething(...); H5REPORT(h_id);
This file will be the container for a number of data items, organized like a directory tree.
Exercise. Create an HDF5 file by compiling and running the create.c example below.
1. In order to do the examples, the h5dump utility needs to be in your path, and you need to know the location of the hdf5.h
and libhdf5.a and related library files.
main() {
Note that an empty file corresponds to just the root of the directory tree that will hold the data.
7.3 Datasets
Next we create a dataset, in this example a 2D grid. To describe this, we first need to construct a dataspace:
dims[0] = 4; dims[1] = 6;
dataspace_id = H5Screate_simple(2, dims, NULL);
dataset_id = H5Dcreate(file_id, "/dset", dataspace_id, .... );
....
status = H5Dclose(dataset_id);
status = H5Sclose(dataspace_id);
Note that datasets and dataspaces need to be closed, just like files.
Exercise. Create a dataset by compiling and running the dataset.c code below
Expected outcome. This creates a file dset.h that can be displayed with h5dump.
/*
* File: dataset.c
* Author: Victor Eijkhout
*/
#include "myh5defs.h"
#define FILE "dset.h5"
main() {
DATASET "dset" {
DATATYPE H5T_STD_I32BE
DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) }
DATA {
(0,0): 0, 0, 0, 0, 0, 0,
(1,0): 0, 0, 0, 0, 0, 0,
(2,0): 0, 0, 0, 0, 0, 0,
(3,0): 0, 0, 0, 0, 0, 0
}
}
}
}
The datafile contains such information as the size of the arrays you store. Still, you may want to add
related scalar information. For instance, if the array is output of a program, you could record with what
input parameter was it generated.
parmspace = H5Screate(H5S_SCALAR);
parm_id = H5Dcreate
(file_id,"/parm",H5T_NATIVE_INT,parmspace,H5P_DEFAULT);
Exercise. Add a scalar dataspace to the HDF5 file, by compiling and running the parmwrite.c code
below.
Expected outcome. A new file wdset.h5 is created.
/*
* File: parmdataset.c
* Author: Victor Eijkhout
*/
#include "myh5defs.h"
#define FILE "pdset.h5"
main() {
%% h5dump wdset.h5
HDF5 "wdset.h5" {
GROUP "/" {
DATASET "dset" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) }
DATA {
(0,0): 0.5, 1.5, 2.5, 3.5, 4.5, 5.5,
(1,0): 6.5, 7.5, 8.5, 9.5, 10.5, 11.5,
(2,0): 12.5, 13.5, 14.5, 15.5, 16.5, 17.5,
(3,0): 18.5, 19.5, 20.5, 21.5, 22.5, 23.5
}
}
DATASET "parm" {
DATATYPE H5T_STD_I32LE
DATASPACE SCALAR
DATA {
(0): 37
}
}
}
}
/*
* File: parmwrite.c
* Author: Victor Eijkhout
*/
#include "myh5defs.h"
#define FILE "wdset.h5"
main() {
%% h5dump wdset.h5
HDF5 "wdset.h5" {
GROUP "/" {
DATASET "dset" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) }
DATA {
(0,0): 0.5, 1.5, 2.5, 3.5, 4.5, 5.5,
(1,0): 6.5, 7.5, 8.5, 9.5, 10.5, 11.5,
(2,0): 12.5, 13.5, 14.5, 15.5, 16.5, 17.5,
(3,0): 18.5, 19.5, 20.5, 21.5, 22.5, 23.5
}
}
DATASET "parm" {
DATATYPE H5T_STD_I32LE
DATASPACE SCALAR
DATA {
(0): 37
}
}
}
}
If you look closely at the source and the dump, you see that the data types are declared as ‘native’, but
rendered as LE. The ‘native’ declaration makes the datatypes behave like the built-in C or Fortran data
types. Alternatively, you can explicitly indicate whether data is little-endian or big-endian. These terms
describe how the bytes of a data item are ordered in memory. Most architectures use little endian, as you
can see in the dump output, but, notably, IBM uses big endian.
7.5 Reading
Now that we have a file with some data, we can do the mirror part of the story: reading from that file.
The essential commands are
h5file = H5Fopen( .... )
....
H5Dread( dataset, .... data .... )
where the H5Dread command has the same arguments as the corresponding H5Dwrite.
Exercise. Read data from the wdset.h5 file that you create in the previous exercise, by compiling and
running the allread.c example below.
Expected outcome. Running the allread executable will print the value 37 of the parameter, and
the value 8.5 of the (1,2) data point of the array.
Caveats. Make sure that you run parmwrite to create the input file.
/*
* File: allread.c
* Author: Victor Eijkhout
*/
#include "myh5defs.h"
#define FILE "wdset.h5"
main() {
status = H5Dread
(dataset,H5T_NATIVE_DOUBLE,H5S_ALL,H5S_ALL,H5P_DEFAULT,
data); H5REPORT(status);
printf("arbitrary data point [1,2]: %e\n",data[1*6+2]);
%% ./allread
parameter value: 37
arbitrary data point [1,2]: 8.500000e+00
Parallel I/O
Parallel I/O is a tricky subject. You can try to let all processors jointly write one file, or to write a file per
process and combine them later. With the standard mechanisms of your programming language there are
the following considerations:
• On clusters where the processes have individual file systems, the only way to write a single file
is to let it be generated by a single processor.
• Writing one file per process is easy to do, but
– You need a post-processing script;
– if the files are not on a shared file system (such as Lustre), it takes additional effort to bring
them together;
– if the files are on a shared file system, writing many files may be a burden on the metadata
server.
• On a shared file system it is possible for all files to open the same file and set the file pointer
individually. This can be difficult if the amount of data per process is not uniform.
Illustrating the last point:
// pseek.c
FILE *pfile;
pfile = fopen("pseek.dat","w");
fseek(pfile,procid*sizeof(int),SEEK_CUR);
// fseek(pfile,procid*sizeof(char),SEEK_CUR);
fprintf(pfile,"%d\n",procid);
fclose(pfile);
MPI also has its own portable I/O: MPI I/O, for which see chapter Parallel Programming book, section 10.
Alternatively, one could use a library such as hdf5; see 7.
For a great discussion see [15], from which figures here are taken.
133
8. Parallel I/O
The gnuplot utility is a simple program for plotting sets of points or curves. This very short tutorial will
show you some of the basics. For more commands and options, see the manual http://www.gnuplot.
info/docs/gnuplot.html.
or fig, latex, pbm, et cetera. Note that this will only cause the pdf commands to be written to your
screen: you need to direct them to file with
set output "myplot.pdf"
135
9. Plotting with GNUplot
9.2 Plotting
The basic plot commands are plot for 2D, and splot (‘surface plot’) for 3D plotting.
you get a plot of 𝑓 (𝑥) = 𝑥 2 ; gnuplot will decide on the range for 𝑥. With
set xrange [0:1]
plot 1-x title "down", x**2 title "up"
you get two graphs in one plot, with the 𝑥 range limited to [0, 1], and the appropriate legends for the
graphs. The variable x is the default for plotting functions.
Plotting one function against another – or equivalently, plotting a parametric curve – goes like this:
set parametric
plot [t=0:1.57] cos(t),sin(t)
9.2.3 Customization
Plots can be customized in many ways. Some of these customizations use the set command. For instance,
9.3 Workflow
Imagine that your code produces a dataset that you want to plot, and you run your code for a number
of inputs. It would be nice if the plotting can be automated. Gnuplot itself does not have the facilities for
this, but with a little help from shell programming this is not hard to do.
Suppose you have data files
data1.dat data2.dat data3.dat
and you want to plot them with the same gnuplot commands. You could make a file plot.template:
set term pdf
set output "FILENAME.pdf"
plot "FILENAME.dat"
The string FILENAME can be replaced by the actual file names using, for instance sed:
for d in data1 data2 data3 ; do
cat plot.template | sed s/FILENAME/$d/ > plot.cmd
gnuplot plot.cmd
done
Sooner or later, and probably sooner than later, every programmer is confronted with code not behaving
as intended. In this section you will learn some techniques of dealing with this problem. At first we will see
a number of techniques for preventing errors; in the next chapter we will discuss debugging, the process
of finding the inevitable errors in a program, once they have occurred.
10.1.1 Assertions
In the things that can go wrong with a program we can distinguish between errors and bugs. Errors are
things that legitimately happen but that should not. File systems are common sources of errors: a program
wants to open a file but the file doesn’t exist because the user mistyped the name, or the program writes
to a file but the disk is full. Other errors can come from arithmetic, such as overflow errors.
On the other hand, a bug in a program is an occurrence that cannot legitimately occur. Of course, ‘le-
gitimately’ here means ‘according to the programmer’s intentions’. Bugs can often be described as ‘the
computer always does what you ask, not necessarily what you want’.
138
10.1. Defensive programming
Assertions serve to detect bugs in your program: an assertion is a predicate that should be true at a certain
point in your program. Thus, an assertion failing means that you didn’t code what you intended to code.
An assertion is typically a statement in your programming language, or a preprocessor macro; upon failure
of the assertion, your program will stop.
Some examples of assertions:
• If a subprogram has an array argument, it is a good idea to test whether the actual argument is a
null pointer before indexing into the array.
• Similarly, you could test a dynamically allocated data structure for not having a null pointer.
• If you calculate a numerical result for which certain mathematical properties hold, for instance
you are writing a sine function, for which the result has to be in [−1, 1], you should test whether
this property indeed holds for the result.
Assertions are often disabled in a program once it’s sufficiently tested. The reason for this is that assertions
can be expensive to execute. For instance, if you have a complicated data structure, you could write a
complicated integrity test, and perform that test in an assertion, which you put after every access to the
data structure.
Because assertions are often disabled in the ‘production’ version of a code, they should not affect any
stored data . If they do, your code may behave differently when you’re testing it with assertions, versus
how you use it in practice without them. This is also formulated as ‘assertions should not have side-effects’.
which includes the literal text of the expression, the file name, and line number; and the program is
subsequently stopped. Here is an example:
#include<assert.h>
int main(void)
{
open_record(NULL);
}
which is used as
ASSERT(nItemsSet.gt.arraySize,"Too many elements set")
float value,result;
result = compute(value);
How do we handle the case where the user passes a negative number?
float compute(float val)
{
float result;
if (val<0) { /* then what? */
} else
result = ... sqrt(val) ... /* some computation */
return result;
}
We could print an error message and deliver some result, but the message may go unnoticed, and the
calling environment does not really receive any notification that something has gone wrong.
The following approach is more flexible:
int compute(float val,float *result)
{
float result;
if (val<0) {
return -1;
} else {
*result = ... sqrt(val) ... /* some computation */
}
return 0;
}
Note that this macro not only prints an error message, but also does a further return. This means that, if
you adopt this use of error codes systematically, you will get a full backtrace of the calling tree if an error
occurs. (In the Python language this is precisely the wrong approach since the backtrace is built-in.)
10.2.1.1 C
The C language has arrays, but they suffer from ‘pointer decay’: they behave largely like pointers in
memory. Thus, bounds checking is hard, other than with external tools like Valgrind.
10.2.1.2 C++
C++ has the containers such as std::vector which support bound checking:
vector<float> x(25);
x.at(26) = y; // throws an exception
On the other hand, the C-style x[26] does not perform such checks.
10.2.1.3 Fortran
Fortran arrays are more restricted than C arrays, so compilers often support a flag for activating runtime
bounds checking. For gfortran that is -fbounds-check.
The block of memory is allocated in each iteration, but the allocation of one iteration is no longer available
in the next. A similar example can be made with allocating inside a conditional.
It should be noted that this problem is far less serious in Fortran, where memory is deallocated automat-
ically as a variable goes out of scope.
There are various tools for detecting memory errors: Valgrind, DMALLOC, Electric Fence. For valgrind,
see section 11.8.
if this is available you should certainly make use of it. (The gcc compiler has a function mcheck, defined in
mcheck.h, that has a similar function.)
If you write in C, you will probably know the malloc and free calls:
int *ip;
ip = (int*) malloc(500*sizeof(int));
if (ip==0) {/* could not allocate memory */}
..... do stuff with ip .....
free(ip);
int *ip;
MYMALLOC(ip,500,int);
Runtime checks on memory usage (either by compiler-generated bounds checking, or through tools like
valgrind or Rational Purify) are expensive, but you can catch many problems by adding some functionality
to your malloc. What we will do here is to detect memory corruption after the fact.
We allocate a few integers to the left and right of the allocated object (line 1 in the code below), and put
a recognizable value in them (line 2 and 3), as well as the size of the object (line 2). We then return the
pointer to the actually requested memory area (line 4).
#define MEMCOOKIE 137
#define MYMALLOC(a,b,c) { \
char *aa; int *ii; \
aa = malloc(b*sizeof(c)+3*sizeof(int)); /* 1 */ \
ii = (int*)aa; ii[0] = b*sizeof(c); \
ii[1] = MEMCOOKIE; /* 2 */ \
aa = (char*)(ii+2); a = (c*)aa ; /* 4 */ \
aa = aa+b*sizesof(c); ii = (int*)aa; \
ii[0] = MEMCOOKIE; /* 3 */ \
}
Now you can write your own free, which tests whether the bounds of the object have not been written
over.
#define MYFREE(a) { \
char *aa; int *ii,; ii = (int*)a; \
if (*(--ii)!=MEMCOOKIE) printf("object corrupted\n"); \
n = *(--ii); aa = a+n; ii = (int*)aa; \
if (*ii!=MEMCOOKIE) printf("object corrupted\n"); \
You can extend this idea: in every allocated object, also store two pointers, so that the allocated memory
areas become a doubly linked list. You can then write a macro CHECKMEMORY which tests all your allocated
objects for corruption.
Such solutions to the memory corruption problem are fairly easy to write, and they carry little overhead.
There is a memory overhead of at most 5 integers per object, and there is practically no performance
penalty.
(Instead of writing a wrapper for malloc, on some systems you can influence the behavior of the system
routine. On linux, malloc calls hooks that can be replaced with your own routines; see http://www.
gnu.org/s/libc/manual/html_node/Hooks-for-Malloc.html.)
10.3 Testing
There are various philosophies for testing the correctness of a code.
• Correctness proving: the programmer draws up predicates that describe the intended behavior of
code fragments and proves by mathematical techniques that these predicates hold [10, 5].
• Unit testing: each routine is tested separately for correctness. This approach is often hard to do
for numerical codes, since with floating point numbers there is essentially an infinity of possible
inputs, and it is not easy to decide what would constitute a sufficient set of inputs.
• Integration testing: test subsystems
• System testing: test the whole code. This is often appropriate for numerical codes, since we often
have model problems with known solutions, or there are properties such as bounds that need to
hold on the global solution.
• Test-driven design: the program development process is driven by the requirement that testing
is possible at all times.
With parallel codes we run into a new category of difficulties with testing. Many algorithms, when exe-
cuted in parallel, will execute operations in a slightly different order, leading to different roundoff behav-
ior. For instance, the parallel computation of a vector sum will use partial sums. Some algorithms have an
inherent damping of numerical errors, for instance stationary iterative methods (section HPC book, sec-
tion 5.5.1), but others have no such built-in error correction (nonstationary methods; section HPC book,
section 5.5.8). As a result, the same iterative process can take different numbers of iterations depending
on how many processors are used.
• Global state in your program makes it hard to test, since it carries information between tests.
• Tests should not reproduce the logic of your code: if the program logic is faulty, the test will be
too.
• Tests should be short, and obey the single-responsibility principle. Naming your tests is good to
keep them focused.
Debugging
Debugging is like being the detective in a crime movie where you are also the murderer.
(Filipe Fortes, 2013)
When a program misbehaves, debugging is the process of finding out why. There are various strategies
of finding errors in a program. The crudest one is debugging by print statements. If you have a notion of
where in your code the error arises, you can edit your code to insert print statements, recompile, rerun,
and see if the output gives you any suggestions. There are several problems with this:
• The edit/compile/run cycle is time consuming, especially since
• often the error will be caused by an earlier section of code, requiring you to edit, compile, and
rerun repeatedly. Furthermore,
• the amount of data produced by your program can be too large to display and inspect effectively,
and
• if your program is parallel, you probably need to print out data from all processors, making the
inspection process very tedious.
For these reasons, the best way to debug is by the use of an interactive debugger, a program that allows
you to monitor and control the behavior of a running program. In this section you will familiarize yourself
with gdb and lldb, the open source debuggers of the GNU and clang projects respectively. Other debuggers
are proprietary, and typically come with a compiler suite. Another distinction is that gdb is a commandline
debugger; there are graphical debuggers such as ddd (a frontend to gdb) or DDT and TotalView (debuggers
for parallel codes). We limit ourselves to gdb, since it incorporates the basic concepts common to all
debuggers.
In this tutorial you will debug a number of simple programs with gdb and valgrind. The files can be found
in the repository in the directory code/gdb.
147
11. Debugging
Usually, you also need to lower the compiler optimization level: a production code will often be compiled
with flags such as -O2 or -Xhost that try to make the code as fast as possible, but for debugging you need
to replace this by -O0 (‘oh-zero’). The reason is that higher levels will reorganize your code, making it
hard to relate the execution to the source1 .
tutorials/gdb/c/hello.c
#include <stdlib.h>
#include <stdio.h>
int main() {
printf("hello world\n");
return 0;
}
%% cc -g -o hello hello.c
# regular invocation:
%% ./hello
hello world
# invocation from gdb:
%% gdb hello
GNU gdb 6.3.50-20050815 # ..... [version info]
Copyright 2004 Free Software Foundation, Inc. .... [copyright info] ....
(gdb) run
Starting program: /home/eijkhout/tutorials/gdb/hello
Reading symbols for shared libraries +. done
hello world
1. Typically, actual code motion is done by -O3, but at level -O2 the compiler will inline functions and make other simplifica-
tions.
Important note: the program was compiled with the debug flag -g. This causes the symbol table (that is,
the translation from machine address to program variables) and other debug information to be included
in the binary. This will make your binary larger than strictly necessary, but it will also make it slower, for
instance because the compiler will not perform certain optimizations2 .
To illustrate the presence of the symbol table do
%% cc -g -o hello hello.c
%% gdb hello
GNU gdb 6.3.50-20050815 # ..... version info
(gdb) list
For a program with commandline input we give the arguments to the run command (Fortran users use
say.F):
2. Compiler optimizations are not supposed to change the semantics of a program, but sometimes do. This can lead to the
nightmare scenario where a program crashes or gives incorrect results, but magically works correctly with compiled with debug
and run in a debugger.
11.3.1 C programs
The following code has several errors. We will use the debugger to uncover them.
// square.c
int nmax,i;
float *squares,sum;
fscanf(stdin,"%d",nmax);
for (i=1; i<=nmax; i++) {
squares[i] = 1./(i*i); sum += squares[i];
}
printf("Sum: %e\n",sum);
%% cc -g -o square square.c
%% ./square
5000
Segmentation fault
The segmentation fault (other messages are possible too) indicates that we are accessing memory that we
are not allowed to, making the program exit. A debugger will quickly tell us where this happens:
%% gdb square
(gdb) run
50000
Apparently the error occurred in a function __svfscanf_l, which is not one of ours, but a system func-
tion. Using the backtrace (or bt, also where or w) command we display the call stack. This usually allows
us to find out where the error lies:
Displaying a stack trace
gdb lldb
(gdb) where (lldb) thread backtrace
(gdb) where
#0 0x00007fff824295ca in __svfscanf_l ()
#1 0x00007fff8244011b in fscanf ()
#2 0x0000000100000e89 in main (argc=1, argv=0x7fff5fbfc7c0) at square.c:7
We take a close look at line 7, and see that we need to change nmax to &nmax.
There is still an error in our program:
(gdb) run
50000
We investigate further:
(gdb) print i
$1 = 11237
(gdb) print squares[i]
Cannot access memory at address 0x10000f000
(gdb) print squares
$2 = (float *) 0x0
Memory errors can also occur if we have a legitimate array, but we access it outside its bounds. The
following program fills an array, forward, and reads it out, backward. However, there is an indexing error
in the second loop.
// up.c
int nlocal = 100,i;
double s, *array = (double*) malloc(nlocal*sizeof(double));
for (i=0; i<nlocal; i++) {
double di = (double)i;
array[i] = 1/(di*di);
}
s = 0.;
for (i=nlocal-1; i>=0; i++) {
double di = (double)i;
s += array[i];
}
You see that the index where the debugger finally complains is quite a bit larger than the size of the array.
Exercise 11.1. Can you think of a reason why indexing out of bounds is not immediately fatal?
What would determine where it does become a problem? (Hint: how is computer memory
structured?)
In section 11.8 you will see a tool that spots any out-of-bound indexing.
We take a close look at the code and see that we did not allocate squares properly.
Often the error in a program is sufficiently obscure that you need to investigate the program run in detail.
Compile the following program
// roots.c
float root(int n)
{
float r;
r = sqrt(n);
return r;
}
int main() {
feenableexcept(FE_INVALID | FE_OVERFLOW);
int i;
float x=0;
for (i=100; i>-100; i--)
x += root(i+5);
printf("sum: %e\n",x);
but before you run the program, you set a breakpoint at main. This tells the execution to stop, or ‘break’,
in the main program.
(gdb) break main
Breakpoint 1 at 0x100000ea6: file root.c, line 14.
Now the program will stop at the first executable statement in main:
(gdb) run
Starting program: tutorials/gdb/c/roots
Reading symbols for shared libraries +. done
If execution is stopped at a breakpoint, you can do various things, such as issuing the step command:
Breakpoint 1, main () at roots.c:14
14 float x=0;
(gdb) step
15 for (i=100; i>-100; i--)
(gdb)
16 x += root(i);
(gdb)
(if you just hit return, the previously issued command is repeated). Do a number of steps in a row by
hitting return. What do you notice about the function and the loop?
Switch from doing step to doing next. Now what do you notice about the loop and the function?
Set another breakpoint: break 17 and do cont. What happens?
Rerun the program after you set a breakpoint on the line with the sqrt call. When the execution stops
there do where and list.
• If you set many breakpoints, you can find out what they are with info breakpoints.
• You can remove breakpoints with delete n where n is the number of the breakpoint.
• If you restart your program with run without leaving gdb, the breakpoints stay in effect.
• If you leave gdb, the breakpoints are cleared but you can save them: save breakpoints <file>.
Use source <file> to read them in on the next gdb run.
11.6 Breakpoints
If a problem occurs in a loop, it can be tedious keep typing cont and inspecting the variable with print.
Instead you can add a condition to an existing breakpoint. First of all, you can make the breakpoint subject
to a condition: with
condition 1 if (n<0)
means that breakpoint 8 becomes (unconditionally) active after the condition n<0 is encountered.
Set a breakpoint
gdb lldb
break foo.c:12 breakpoint set [ -f foo.c ] -l 12
break foo.c:12 if n>0
using the fact that NaN is the only number not equal to itself.
Another possibility is to use ignore 1 50, which will not stop at breakpoint 1 the next 50 times.
Remove the existing breakpoint, redefine it with the condition n<0 and rerun your program. When the
program breaks, find for what value of the loop variable it happened. What is the sequence of commands
you use?
You can set a breakpoint in various ways:
• break foo.c to stop when code in a certain file is reached;
• break 123 to stop at a certain line in the current file;
• break foo to stop at subprogram foo
• or various combinations, such as break foo.c:123.
Information about breakpoints:
• If you set many breakpoints, you can find out what they are with info breakpoints.
• You can remove breakpoints with delete n where n is the number of the breakpoint.
• If you restart your program with run without leaving gdb, the breakpoints stay in effect.
• If you leave gdb, the breakpoints are cleared but you can save them: save breakpoints <file>.
Use source <file> to read them in on the next gdb run.
• In languages with exceptions, such as C++, you can set a catchpoint:
Set a breakpoint for exceptions
gdb clang
catch throw break set -E C++
Finally, you can execute commands at a breakpoint:
break 45
command
print x
cont
end
This states that at line 45 variable x is to be printed, and execution should immediately continue.
If you want to run repeated gdb sessions on the same program, you may want to save an reload break-
points. This can be done with
save-breakpoint filename
source filename
After the conditional, the allocated memory is not freed, but the pointer that pointed to has gone away.
This last type especially can be hard to find. Memory leaks will only surface in that your program runs
out of memory. That in turn is detectable because your allocation will fail. It is a good idea to always
check the return result of your malloc or allocate statement!
As a first example, consider out of bound addressing, also known as buffer overflow:
MISSING SNIPPET corruptbound
This is unlikely to crash your code, but the results are unpredictable, and this is certainly a failure of your
program logic.
Valgrind indicates that this is an invalid read, what line it occurs on, and where the block was allocated:
==9112== Invalid read of size 4
==9112== at 0x40233B: main (outofbound.cpp:10)
==9112== Address 0x595fde8 is 0 bytes after a block of size 40 alloc'd
==9112== at 0x4C2A483: operator new(unsigned long) (vg_replace_malloc.c:344)
==9112== by 0x4023CD: allocate (new_allocator.h:111)
==9112== by 0x4023CD: allocate (alloc_traits.h:436)
==9112== by 0x4023CD: _M_allocate (stl_vector.h:296)
==9112== by 0x4023CD: _M_create_storage (stl_vector.h:311)
==9112== by 0x4023CD: _Vector_base (stl_vector.h:260)
==9112== by 0x4023CD: _Vector_base (stl_vector.h:258)
==9112== by 0x4023CD: vector (stl_vector.h:415)
==9112== by 0x4023CD: main (outofbound.cpp:9)
Remark 14 Buffer overflows are a well-known security risk, typically associated with reading string input
from a user source. Buffer overflows can be largely avoided by using C++ constructs such as cin and string
instead of sscanf and character arrays.
Valgrind is informative but cryptic, since it works on the bare memory, not on variables. Thus, these error
messages take some exegesis. They state that line 10 reads a 4-byte object immediately after a block of 40
bytes that was allocated. In other words: the code is writing outside the bounds of an allocated array.
The next example performs a read on an array that has already been free’d. In this simple case you will
actually get the expected output, but if the read comes much later than the free, the output can be anything.
MISSING SNIPPET corruptfree
Valgrind again states that this is an invalid read; it gives both where the block was allocated and where it
was freed.
On the other hand, if you forget to free memory you have a memory leak (just imagine allocation, and
not free’ing, in a loop)
MISSING SNIPPET corruptleak
which valgrind reports on:
==283234== LEAK SUMMARY:
==283234== definitely lost: 40,000 bytes in 1 blocks
==283234== indirectly lost: 0 bytes in 0 blocks
==283234== possibly lost: 0 bytes in 0 blocks
==283234== still reachable: 8 bytes in 1 blocks
==283234== suppressed: 0 bytes in 0 blocks
Memory leaks are much more rare in C++ than in C because of containers such as std::vector. However,
in sophisticated cases you may still do your own memory management, and you need to be aware of the
danger of memory leaks.
If you do your own memory management, there is also a danger of writing to an array pointer that has
not been allocated yet:
MISSING SNIPPET corruptinit
The behavior of this code depends on all sorts of things: if the pointer variable is zero, the code will crash.
On the other hand, if it contains some random value, the write may succeed; provided you are not writing
too far from that location.
The output here shows both the valgrind diagnosis, and the OS message when the program aborted:
==283234== LEAK SUMMARY:
==283234== definitely lost: 40,000 bytes in 1 blocks
==283234== indirectly lost: 0 bytes in 0 blocks
==283234== possibly lost: 0 bytes in 0 blocks
==283234== still reachable: 8 bytes in 1 blocks
==283234== suppressed: 0 bytes in 0 blocks
Suppose your program has an out-of-bounds error. Running with gdb, this error may only become appar-
ent if the bounds are exceeded by a large amount. On the other hand, if the code is linked with libefence,
the debugger will stop at the very first time the bounds are exceeded.
Parallel debugging
When a program misbehaves, debugging is the process of finding out why. There are various strategies
of finding errors in a program. The crudest one is debugging by print statements. If you have a notion of
where in your code the error arises, you can edit your code to insert print statements, recompile, rerun,
and see if the output gives you any suggestions. There are several problems with this:
• The edit/compile/run cycle is time consuming, especially since
• often the error will be caused by an earlier section of code, requiring you to edit, compile, and
rerun repeatedly. Furthermore,
• the amount of data produced by your program can be too large to display and inspect effectively,
and
• if your program is parallel, you probably need to print out data from all proccessors, making the
inspection process very tedious.
For these reasons, the best way to debug is by the use of an interactive debugger, a program that allows you
to monitor and control the behaviour of a running program. In this section you will familiarize yourself
with gdb, which is the open source debugger of the GNU project. Other debuggers are proprietary, and
typically come with a compiler suite. Another distinction is that gdb is a commandline debugger; there
are graphical debuggers such as ddd (a frontend to gdb) or DDT and TotalView (debuggers for parallel
codes). We limit ourselves to gdb, since it incorporates the basic concepts common to all debuggers.
In this tutorial you will debug a number of simple programs with gdb and valgrind. The files can be found
in the repository in the directory tutorials/debug_tutorial_files.
161
12. Parallel debugging
There are few low-budget solutions to parallel debugging. The main one is to create an xterm for each
process. We will describe this next. There are also commercial packages such as DDT and TotalView, that
offer a GUI. They are very convenient but also expensive. The Eclipse project has a parallel package, Eclipse
PTP, that includes a graphic debugger.
Debugging in parallel is harder than sequentially, because you will run errors that are only due to inter-
action of processes such as deadlock; see section HPC book, section 2.6.3.6.
As an example, consider this segment of MPI code:
MPI_Init(0,0);
// set comm, ntids, mytid
for (int it=0; ; it++) {
double randomnumber = ntids * ( rand() / (double)RAND_MAX );
printf("[%d] iteration %d, random %e\n",mytid,it,randomnumber);
if (randomnumber>mytid && randomnumber<mytid+1./(ntids+1))
MPI_Finalize();
}
MPI_Finalize();
Each process computes random numbers until a certain condition is satisfied, then exits. However, con-
sider introducing a barrier (or something that acts like it, such as a reduction):
for (int it=0; ; it++) {
double randomnumber = ntids * ( rand() / (double)RAND_MAX );
printf("[%d] iteration %d, random %e\n",mytid,it,randomnumber);
if (randomnumber>mytid && randomnumber<mytid+1./(ntids+1))
MPI_Finalize();
MPI_Barrier(comm);
}
MPI_Finalize();
Now the execution will hang, and this is not due to any particular process: each process has a code path
from init to finalize that does not develop any memory errors or other runtime errors. However as soon as
one process reaches the finalize call in the conditional it will stop, and all other processes will be waiting
at the barrier.
Figure 12.1 shows the main display of the Allinea DDT debugger (http://www.allinea.com/products/
ddt) at the point where this code stops. Above the source panel you see that there are 16 processes, and
that the status is given for process 1. In the bottom display you see that out of 16 processes 15 are calling
MPI_Barrier on line 19, while one is at line 18. In the right display you see a listing of the local variables:
the value specific to process 1. A rudimentary graph displays the values over the processors: the value of
ntids is constant, that of mytid is linearly increasing, and it is constant except for one process.
Exercise 12.1. Make and run ring_1a. The program does not terminate and does not crash. In
the debugger you can interrupt the execution, and see that all processes are executing a
receive statement. This is probably a case of deadlock. Diagnose and fix the error.
Exercise 12.2. The author of ring_1c was very confused about how MPI works. Run the pro-
gram. While it terminates without a problem, the output is wrong. Set a breakpoint at
the send and receive statements to figure out what is happening.
create a number of xterm windows, each of which execute the commandline gdb ./program. And be-
cause these xterms have been started with mpirun, they actually form a communicator.
Problem1 This program has every process independently generate random numbers, and if the number
meets a certain condition, stops execution. There is no problem with this code as such, so let’s suppose
you simply want to monitor its execution.
• Compile abort.c. Don’t forget about the -g -O0 flags; if you use the makefile they are included
automatically.
• Run the program with DDT, you’ll see that it concludes succesfully.
• Set a breakpoint at the Finalize statement in the subroutine, by clicking to the left of the line
number. Now if you run the program you’ll get a message that all processes are stopped at a
breakpoint. Pause the execution.
• The ‘Stacks’ tab will tell you that all processes are the same point in the code, but they are not in
fact in the same iteration.
• You can for instance use the ‘Input/Output’ tabs to see what every process has been doing.
• Alternatively, use the variables pane on the right to examine the it variable. You can do that
for individual processes, but you can also control click on the it variable and choose View as
Array. Set up the display as a one-dimensional array and check the iteration numbers.
• Activate the barrier statement and rerun the code. Make sure you have no breakpoints. Reason
that the code will not complete, but just hang.
• Hit the general Pause button. Now what difference do you see in the ‘Stacks’ tab?
Problem2 Compile problem1.c and run it in DDT. You’ll get a dialog warning about an error condition.
• Pause the program in the dialog. Notice that only the root process is paused. If you want to inspect
other processes, press the general pause button. Do this.
• In the bottom panel click on Stacks. This gives you the ‘call stack’, which tells you what the
processes were doing when you paused them. Where is the root process in the execution? Where
are the others?
• From the call stack it is clear what the error was. Fix it and rerun with File > Restart Session.
Problem2
Language interoperability
Most of the time, a program is written is written in a single language, but in some circumstances it is
necessary or desirable to mix sources in more than one language for a single executable. One such case
is when a library is written in one language, but used by a program in another. In such a case, the library
writer will probably have made it easy for you to use the library; this section is for the case that you find
yourself in the place of the library writer. We will focus on the common case of interoperability between
C/C++ and Fortran or Python.
This issue is complicated by the fact that both languages have been around for a long time, and various
recent language standards have introduced mechanisms to facilitate interoperability. However, there is
still a lot of old code around, and not all compilers support the latest standards. Therefore, we discuss
both the old and the new solutions.
166
13.1. C/Fortran interoperability
After compilation you can use nm to investigate the binary object file:
%% nm fprog.o
0000000000000000 T _foo_
....
%% nm cprog.o
0000000000000000 T _foo
....
You see that internally the foo routine has different names: the Fortran name has an underscore appended.
This makes it hard to call a Fortran routine from C, or vice versa. The possible name mismatches are:
• The Fortran compiler appends an underscore. This is the most common case.
• Sometimes it can append two underscores.
• Typically the routine name is lowercase in the object file, but uppercase is a possibility too.
Since C is a popular language to write libraries in, this means that the problem is often solved in the C
library by:
• Appending an underscore to all C function names; or
• Including a simple wrapper call:
int SomeCFunction(int i,float f)
{
// this is the actual function
}
int SomeCFunction_(int i,float f)
{
return SomeCFunction(i,f);
}
The complex data types in C/C++ and Fortran are compatible with each other. Here is an example of a C++
program linking to Lapack’s complex vector scaling routine zscal.
// zscale.cxx
extern "C" {
void zscal_(int*,double complex*,double complex*,int*);
}
complex double *xarray,*yarray, scale=2.;
xarray = new double complex[n]; yarray = new double complex[n];
zscal_(&n,&scale,xarray,&ione);
%% ifort -c fbind.F90
%% nm fbind.o
.... T _s
.... C _x
use iso_c_binding
The latest version of Fortran, unsupported by many compilers at this time, has mechanisms for interfacing
to C.
• There is a module that contains named kinds, so that one can declare
INTEGER,KIND(C_SHORT) :: i
• Fortran pointers are more complicated objects, so passing them to C is hard; Fortran2003 has a
mechanism to deal with C pointers, which are just addresses.
• Fortran derived types can be made compatible with C structures.
If you compile this and inspect the output with nm you get:
$ gcc -c foochar.c && nm foochar.o | grep bar
0000000000000000 T _bar
That is, apart from a leading underscore the symbol name is clear.
On the other hand, the identical program compiled as C++ gives
$ g++ -c foochar.c && nm foochar.o | grep bar
0000000000000000 T __Z3barPc
Why is this? Well, because of polymorphism, and the fact that methods can be included in classes, you can
not have a unique linker symbol for each function name. Instead this mangled symbol includes enough
information to make the symbol unique.
You can retrieve the meaning of this mangled symbol a number of ways. First of all, there is a demangling
utility c++filt:
c++filt __Z3barPc
bar(char*)
.
.
#ifdef __cplusplus
}
#endif
You again get the same linker symbols as for C, so that the routine can be called from both C and Fortran.
If your main program is in C, you can use the C++ compiler as linker. If the main program is in Fortran,
you need to use the Fortran compiler as linker. It is then necessary to link in extra libraries for the C++
system routines. For instance, with the Intel compiler -lstdc++ -lc needs to be added to the link line.
The use of extern is also needed if you link other languages to a C++ main program. For instance, a
Fortran subprogram foo should be declared as
extern "C" {
void foo_();
}
13.3 Strings
Programming languages differ widely in how they handle strings.
• In C, a string is an array of characters; the end of the string is indicated by a null character, that
is the ascii character zero, which has an all zero bit pattern. This is called null termination.
• In Fortran, a string is an array of characters. The length is maintained in a internal variable, which
is passed as a hidden parameter to subroutines.
• In Pascal, a string is an array with an integer denoting the length in the first position. Since only
one byte is used for this, strings can not be longer than 255 characters in Pascal.
As you can see, passing strings between different languages is fraught with peril. This situation is made
even worse by the fact that passing strings as subroutine arguments is not standard.
Example: the main program in Fortran passes a string
Program Fstring
character(len=5) :: word = "Word"
call cstring(word)
end Program Fstring
which produces:
length = 5
<<Word >>
Recently, the ‘C/Fortran interoperability standard’ has provided a systematic solution to this.
can not be called from Fortran. There is a hack to get around this (check out the Fortran77 interface to
the Petsc routine VecGetValues) and with more cleverness you can use POINTER variables for this.
1. With a bit of cleverness and the right compiler, you can have a program that says print *,7 and prints 8 because of this.
13.5 Input/output
Both languages have their own system for handling input/output, and it is not really possible to meet in
the middle. Basically, if Fortran routines do I/O, the main program has to be in Fortran. Consequently, it
is best to isolate I/O as much as possible, and use C for I/O in mixed language programming.
3. You need to declare what the types are of the C routines in python:
test_add = mylib.test_add
test_add.argtypes = [ctypes.c_float, ctypes.c_float]
test_add.restype = ctypes.c_float
test_passing_array = mylib.test_passing_array
test_passing_array.argtypes = [ctypes.POINTER(ctypes.c_int), ctypes.c_int]
test_passing_array.restype = None
13.6.1 Boost
Another way to let C and python interact is through the Boost library.
Let’s start with a C/C++ file that was written for some other purpose, and with no knowledge of Python
or interoperability tools:
char const* greet()
{
return "hello, world";
}
With it, you should have a .h header file with the function signatures.
Next, you write a C++ file that uses the Boost tools:
#include <boost/python.hpp>
#include "hello.h"
BOOST_PYTHON_MODULE(hello_ext)
{
using namespace boost::python;
def("greet", greet);
}
The crucial step is compiling both C/C++ files together into a dynamic library:
icpc -shared -o hello_ext.so hello_ext.o hello.o \
-Wl,-rpath,/pythonboost/lib -L/pythonboost/lib -lboost_python39 \
-Wl,-rpath,/python/lib -L/python/lib -lpython3
You can now import this library in python, giving you access to the C function:
import hello_ext
print(hello_ext.greet())
Bit operations
In most of this book we consider numbers, such as integer or floating point representations of real num-
bers, as our lowest building blocks. Sometimes, however, it is necessary to dig deeper and consider the
actual representation of such numbers in terms of bits.
Various programming languages have support for bit operations. We will explore the various options. For
details on C++ and Fortran, see Introduction to Scientific Programming book, section 5.2.1 and Introduction
to Scientific Programming book, section 30.7 respectively.
gives octal and hexadecimal representation, but there is no format specifier for binary. Instead use the
following bit of magic:
void printBits(size_t const size, void const * const ptr)
{
unsigned char *b = (unsigned char*) ptr;
unsigned char byte;
for (int i=size-1; i>=0; i--) {
for (int j=7; j>=0; j--) {
byte = (b[i] >> j) & 1;
printf("%u", byte);
}
}
}
/* ... */
printBits(sizeof(i),&i);
174
14.2. Bit operations
14.1.2 Python
• The python int function converts a string to int. A second argument can indicate what base the
string is to be interpreted in:
five = int('101',2)
maxint32 = int('0xffffffff',16)
that allocates Nbytes of memory, where the first byte has an address that is a multiple
of aligned_bits.
176
15.2. A gentle introduction to LaTeX
Originally, the latex compiler would output a device independent file format, named dvi, which could
then be translated to PostScript or PDF, or directly printed. These days, many people use the pdflatex
program which directly translates .tex files to .pdf files. This has the big advantage that the generated
PDF files have automatic cross linking and a side panel with table of contents. An illustration is found
below.
Let us do a simple example.
\documentclass{article}
\begin{document}
Hello world!
\end{document}
Exercise 15.1. Create a text file minimal.tex with the content as in figure 15.1. Try the com-
mand pdflatex minimal or latex minimal. Did you get a file minimal.pdf in the
first case or minimal.dvi in the second case? Use a pdf viewer, such as Adobe Reader,
or dvips respectively to view the output.
Things to watch out for. If you make a typo, TEX can be somewhat unfriendly. If you get
an error message and TEX is asking for input, typing x usually gets you out, or Ctrl-C.
Some systems allow you to type e to go directly into the editor to correct the typo.
\begin{document}
\end{document}
The ‘documentclass’ line needs a class name in between the braces; typical values are ‘article’ or ‘book’.
Some organizations have their own styles, for instance ‘ieeeproc’ is for proceedings of the IEEE.
All document text goes between the \begin{document} and \end{document} lines. (Matched ‘begin’
and ‘end’ lines are said to denote an ‘environment’, in this case the document environment.)
The part before \begin{document} is called the ‘preamble’. It contains customizations for this particular
document. For instance, a command to make the whole document double spaced would go in the preamble.
If you are using pdflatex to format your document, you want a line
\usepackage{hyperref}
here.
Have you noticed the following?
• The backslash character is special: it starts a LATEX command.
• The braces are also special: they have various functions, such as indicating the argument of a
command.
• The percent character indicates that everything to the end of the line is a comment.
Exercise 15.2. Create a file first.tex with the content of figure 15.1 in it. Type some text in
the preamble, that is, before the \begin{document} line and run pdflatex on your file.
Intended outcome. You should get an error message because you are not allowed to
have text in the preamble. Only commands are allowed there; all text has to go after
\begin{document}.
Exercise 15.3. Edit your document: put some text in between the \begin{document} and
\end{document} lines. Let your text have both some long lines that go on for a while,
and some short ones. Put superfluous spaces between words, and at the beginning or end
of lines. Run pdflatex on your document and view the output.
Intended outcome. You notice that the white space in your input has been collapsed in
the output. TEX has its own notions about what space should look like, and you do not
have to concern yourself with this matter.
Exercise 15.4. Edit your document again, cutting and pasting the paragraph, but leaving a blank
line between the two copies. Paste it a third time, leaving several blank lines. Format, and
view the output.
Intended outcome. TEX interprets one or more blank lines as the separation between
paragraphs.
Exercise 15.5. Add \usepackage{pslatex} to the preamble and rerun pdflatex on your
document. What changed in the output?
Intended outcome. This should have the effect of changing the typeface from the default
to Times Roman.
Things to watch out for. Typefaces are notoriously unstandardized. Attempts to use dif-
ferent typefaces may or may not work. Little can be said about this in general.
Add the following line before the first paragraph:
\section{This is a section}
and a similar line before the second. Format. You see that LATEX automatically numbers the sections, and
that it handles indentation different for the first paragraph after a heading.
Exercise 15.6. Replace article by artikel3 in the documentclass declaration line and refor-
mat your document. What changed?
Intended outcome. There are many documentclasses that implement the same commands
as article (or another standard style), but that have their own layout. Your document
should format without any problem, but get a better looking layout.
Things to watch out for. The artikel3 class is part of most distributions these days,
but you can get an error message about an unknown documentclass if it is missing or
if your environment is not set up correctly. This depends on your installation. If the file
seems missing, download the files from http://tug.org/texmf-dist/tex/latex/
ntgclass/ and put them in your current directory; see also section 15.2.9.
15.2.3 Math
Purpose. In this section you will learn the basics of math typesetting
One of the goals of the original TEX system was to facilitate the setting of mathematics. There are two
ways to have math in your document:
• Inline math is part of a paragraph, and is delimited by dollar signs.
• Display math is, as the name implies, displayed by itself.
Exercise 15.7. Put $x+y$ somewhere in a paragraph and format your document. Put \[x+y\]
somewhere in a paragraph and format.
Intended outcome. Formulas between single dollars are included in the paragraph where
you declare them. Formulas between \[...\] are typeset in a display.
For display equations with a number, use an equation environment. Try this.
Here are some common things to do in math. Make sure to try them out.
• Subscripts and superscripts: $x_i^2$. If the sub or superscript is more than a single symbol, it
needs to be grouped: $x_{i+1}^{2n}$. If you need a brace in a formula, use $\{ \}$.
15.2.4 Referencing
Purpose. In this section you will see TEX’s cross referencing mechanism in action.
So far you have not seen LATEX do much that would save you any work. The cross referencing mechanism
of LATEX will definitely save you work: any counter that LATEX inserts (such as section numbers) can be
referenced by a label. As a result, the reference will always be correct.
Start with an example document that has at least two section headings. After your first section heading,
put the command \label{sec:first}, and put \label{sec:other} after the second section heading.
These label commands can go on the same line as the section command, or on the next. Now put
As we will see in section~\ref{sec:other}.
in the paragraph before the second section. (The tilde character denotes a non-breaking space.)
Exercise 15.9. Make these edits and format the document. Do you see the warning about an
undefined reference? Take a look at the output file. Format the document again, and
check the output again. Do you have any new files in your directory?
Intended outcome. On a first pass through a document, the TEX compiler will gather all
labels with their values in a .aux file. The document will display a double question mark
for any references that are unknown. In the second pass the correct values will be filled
in.
Things to watch out for. If after the second pass there are still undefined references, you
probably made a typo. If you use the bibtex utility for literature references, you will
regularly need three passes to get all references resolved correctly.
Above you saw that the equation environment gives displayed math with an equation number. You can
add a label to this environment to refer to the equation number.
Exercise 15.10. Write a formula in an equation environment, and add a label. Refer to this
label anywhere in the text. Format (twice) and check the output.
Intended outcome. The \label and \ref command are used in the same way for formulas
as for section numbers. Note that you must use \begin/end{equation} rather than
\[...\] for the formula.
15.2.5 Lists
Purpose. In this section you will see the basics of lists.
Exercise 15.11. Add some lists to your document, including nested lists. Inspect the output.
Intended outcome. Nested lists will be indented further and the labeling and numbering
style changes with the list depth.
Exercise 15.12. Add a label to an item in an enumerate list and refer to it.
Intended outcome. Again, the \label and \ref commands work as before.
15.2.7 Graphics
Since you can not immediately see the output of what you are typing, sometimes the output may come
as a surprise. That is especially so with graphics. LATEX has no standard way of dealing with graphics, but
the following is a common set of commands:
\usepackage{graphicx} % this line in the preamble
The figure can be in any of a number of formats, except that PostScript figures (with extension .ps or
.eps) can not be used if you use pdflatex.
Since your figure is often not the right size, the include line will usually have something like:
\includegraphics[scale=.5]{myfigure}
A bigger problem is that figures can be too big to fit on the page if they are placed where you declare them.
For this reason, they are usually treated as ‘floating material’. Here is a typical declaration of a figure:
\begin{figure}[ht]
\includegraphics{myfigure}
\caption{This is a figure.}
\label{fig:first}
\end{figure}
declares that the figure has to be placed here if possible, at the bottom of the page if that’s not
possible, and on a page of its own if it is too big to fit on a page with text.
• A caption to be put under the figure, including a figure number;
• A label so that you can refer to the figure number by its label: figure~\ref{fig:first}.
• And of course the figure material. There are various ways to fine-tune the figure placement. For
instance
\begin{center}
\includegraphics{myfigure}
\end{center}
and format your document two more times. There should now be a bibliography in it, and a correct
citation. You will also see that files mydocument.bbl and mydocument.blg have been created.
15.3.1 Listings
The ‘listings’ package is makes it possible to have source code included, with coloring and indentation
automatically taken care of.
\documentclass{article} MPI_Recv,MPI_Irecv,MPI_Mrecv,
MPI_Sendrecv,MPI_Sendrecv_replace,
\usepackage[pdftex]{hyperref} },emphstyle={\color{red!70!black}\bfseries
\usepackage{pslatex} }
}
\lstset{emph={[2] %% constants
%%%% MPI_COMM_WORLD,MPI_STATUS_IGNORE,
%%%% Import the listings package MPI_STATUSES_IGNORE,MPI_STATUS_SIZE,
%%%% MPI_INT,MPI_INTEGER,
\usepackage{listings,xcolor} },emphstyle={[2]\color{green!40!black}}
}
%%%% \lstset{emph={[3] %% types
%%%% Set a basic code style MPI_Aint,MPI_Comm,MPI_Count,MPI_Datatype
%%%% (see documentation for more options} ,MPI_Errhandler,MPI_File,MPI_Group,
%%%% },emphstyle={[3]\color{yellow!30!brown}\
\lstdefinestyle{reviewcode}{ bfseries},
belowcaptionskip=1\baselineskip, }
breaklines=true, frame=L,
xleftmargin=\parindent, showstringspaces= \begin{document}
false, \title{SSC 335: listings demo}
basicstyle=\footnotesize\ttfamily, \author{Victor Eijkhout}
keywordstyle=\bfseries\color{blue}, \date{today}
commentstyle=\color{red!60!black}, \maketitle
identifierstyle=\slshape\color{black},
stringstyle=\color{green!60!black}, \section{C examples}
columns=fullflexible,
keepspaces=true,tabsize=8, \lstset{language=C}
} \begin{lstlisting}
\lstset{style=reviewcode} int main() {
MPI_Init();
\lstset{emph={ %% MPI commands MPI_Comm comm = MPI_COMM_WORLD;
MPI_Init,MPI_Initialized,MPI_Finalize, if (x==y)
MPI_Finalized,MPI_Abort, MPI_Send( &x,1,MPI_INT,0,0,comm);
MPI_Comm_size,MPI_Comm_rank, else
MPI_Send,MPI_Isend,MPI_Rsend,MPI_Irsend, MPI_Recv( &y,1,MPI_INT,1,1,comm,
MPI_Ssend,MPI_Issend, MPI_STATUS_IGNORE);
MPI_Finalize();
Victor Eijkhout
today
1 This is a section
This is a test document, used in [2]. It contains a discussion in section 2.
Exercise 1. Left to the reader.
Exercise 2. Also left to the reader, just like in exercise 1
This is a formula: a ⇐ b.
(k)
xi ← yi j · x j (1)
R1√
Text: 0 x dx
Z 1
√
x dx
0
As I showed in the introductory section 1, in the paper [1], it was shown that equation (1)
• There is an item.
Contents
1 This is a section 1
2 This is another section 1
List of Figures
1 this is the only figure 1
References
[1] Loyce M. Adams and Harry F. Jordan. Is SOR color-blind? SIAM J. Sci. Stat. Comput.,
7:490–506, 1986.
[2] Victor Eijkhout. Short LATEX demo. SSC 335, oct 1, 2008.
Victor Eijkhout
today
1 Two graphs
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed
do eiusmod tempor incididunt ut labore et dolore magna ali-
90 qua. Pharetra massa massa ultricies mi quis hendrerit. Tempor
nec feugiat nisl pretium fusce id velit ut tortor. Eget nulla fa-
#Average Marks
80
80 cilisi etiam dignissim diam quis enim. Cursus sit amet dictum
70 sit amet justo donec. Tortor consequat id porta nibh venenatis
cras sed felis eget. Senectus et netus et malesuada fames ac
60 turpis egestas integer. Ultricies mi quis hendrerit dolor magna
60 eget est. A iaculis at erat pellentesque adipiscing. Sagittis orci
50 a scelerisque purus. Quisque non tellus orci ac. Nisl nunc
mi ipsum faucibus. Vivamus at augue eget arcu dictum var-
ius duis. Maecenas ultricies mi eget mauris pharetra et ultri-
Tom Jack Hary Liza Henry ces neque ornare. Pulvinar neque laoreet suspendisse inter-
Students Name dum consectetur. Nunc id cursus metus aliquam eleifend mi.
Tristique sollicitudin nibh sit amet commodo nulla. Massa tin-
cidunt nunc pulvinar sapien et ligula ullamcorper malesuada.
Justo laoreet sit amet cursus sit. Laoreet id donec ultrices tincidunt arcu non sodales.
Sem nulla pharetra diam sit amet. Vel pharetra vel turpis nunc
eget. Vulputate dignissim suspendisse in est ante in nibh mau-
#Annual Growth Percentage
ris cursus. Sem viverra aliquet eget sit amet tellus cras. Rhon- 80
80 78
cus aenean vel elit scelerisque mauris pellentesque pulvinar 75
pellentesque. Fusce ut placerat orci nulla pellentesque. Vel
risus commodo viverra maecenas accumsan lacus vel facilisis 70
70 68
volutpat. Enim ut tellus elementum sagittis vitae et. In nibh
mauris cursus mattis molestie. Curabitur gravida arcu ac tor- 63
61
tor dignissim convallis aenean et tortor. Mauris commodo quis 60 59
imperdiet massa. 55
1 C examples
int main() {
MPI_Init();
MPI_Comm comm = MPI_COMM_WORLD;
if (x==y)
MPI_Send( &x,1,MPI_INT,0,0,comm);
else
MPI_Recv( &y,1,MPI_INT,1,1,comm,MPI_STATUS_IGNORE);
MPI_Finalize();
}
2 Fortran examples
Program myprogram
Type(MPI_Comm) :: comm = MPI_COMM_WORLD
call MPI_Init()
if (.not. x==y ) then
call MPI_Send( x,1,MPI_INTEGER,0,0,comm);
else
call MPI_Recv( y,1,MPI_INTEGER,1,1,comm,MPI_STATUS_IGNORE)
end if
call MPI_Finalize()
End Program myprogram
Much of the teaching in this book is geared towards enabling you to write fast code, whether this is
through the choice of the right method, or through optimal coding of a method. Consequently, you some-
times want to measure just how fast your code is. If you have a simulation that runs for many hours, you’d
think just looking on the clock would be enough measurement. However, as you wonder whether your
code could be faster than it is, you need more detailed measurements. This tutorial will teach you some
ways to measure the behavior of your code in more or less detail.
Here we will discuss
• timers: ways of measuring the execution time (and sometimes other measurements) of a particular
piece of code, and
• profiling tools: ways of measuring how much time each piece of code, typically a subroutine,
takes during a specific run.
16.1 Timers
There are various ways of timing your code, but mostly they come down to calling a timer routine twice
that tells you the clock values:
tstart = clockticks()
....
tend = clockticks()
runtime = (tend-tstart)/ticks_per_sec
192
16.1. Timers
16.1.1 Fortran
For instance, in Fortran there is the system_clock routine:
implicit none
INTEGER :: rate, tstart, tstop
REAL :: time
real :: a
integer :: i
with output
Clock frequency: 10000
1.000000 813802544 813826097 2.000000
16.1.2 C
In C there is the clock function: with output
clock resolution: 1000000
res: 1.000000e+00
start/stop: 0.000000e+00,2.310000e+00
Time: 2.310000e+00
Do you see a difference between the Fortran and C approaches? Hint: what happens in both cases when
the execution time becomes long? At what point do you run into trouble?
16.1.3 C++
While C routines are available in C++, there is also a new chrono library that can do many things, including
handling different time formats.
std::chrono::system_clock::time_point start_time;
start_time = std::chrono::system_clock::now();
// ... code ...
auto duration =
std::chrono::system_clock::now()-start_time;
auto millisec_duration =
std::chrono::duration_cast<std::chrono::milliseconds>(duration);
std::cout << "Time in milli seconds: "
<< .001 * millisec_duration.count() << endl;
For more details, see Introduction to Scientific Programming book, section 24.8.
There are unix system calls that can be used for timing: getrusage
#include <sys/resource.h>
double time00(void)
{
struct rusage ruse;
getrusage(RUSAGE_SELF, &ruse);
return( (double)(ruse.ru_utime.tv_sec+ruse.ru_utime.tv_usec
/ 1000000.0) );
}
and gettimeofday
#include <sys/time.h>
double time00(void)
{
struct timeval tp;
gettimeofday(&tp, NULL);
return( (double) (tp.tv_sec + tp.tv_usec/1000000.0) ); /* wall
}
These timers have the advantage that they can distinguish between user time and system time, that is,
exclusively timing program execution or giving wallclock time including all system activities.
However, this approach of using processor-specific timers is not portable. For this reason, the PAPI pack-
age (http://icl.cs.utk.edu/papi/) provides a uniform interface to hardware counters. You can see
this package in action in the codes in appendix HPC book, section 31.
In addition to timing, hardware counters can give you information about such things as cache misses
and instruction counters. A processor typically has only a limited number of counters, but they can be
assigned to various tasks. Additionally, PAPI has the concept of derived metrics.
16.4.1 gprof
The profiler of the GNU compiler, gprof requires recomplication with an extra flag:
% gcc -g -pg ./srcFile.c -o MyProgram
16.4.2 perf
Coming with most Unix distributions, perf does not require any instrumentation.
Run:
perf record yourprogram options
perf record --call-graph fp yourprogram options
The MPI library has been designed to make it easy to profile. See Parallel Programming book, section 15.6.
16.5 Tracing
In profiling we are only concerned with aggregate information: how many times a routine was called,
and with what total/average/min/max runtime. However sometimes we want to know about the exact
timing of events. This is especially relevant in a parallel context when we care about load unbalance and
idle time.
Tools such as Vampyr can collect trace information about events and in particular messages, and render
them in displays such as figure 16.1.
TAU
The TAU tool [20] (see http://www.cs.uoregon.edu/research/tau/home.php for the official doc-
umentation) uses instrumentation to profile and trace your code. That is, it adds profiling and trace calls
to your code. You can then inspect the output after the run.
Profiling is the gathering and displaying of bulk statistics, for instance showing you which routines take
the most time, or whether communication takes a large portion of your runtime. When you get concerned
about performance, a good profiling tool is indispensible.
Tracing is the construction and displaying of time-dependent information on your program run, for in-
stance showing you if one process lags behind others. For understanding a program’s behaviour, and the
reasons behind profiling statistics, a tracing tool can be very insightful.
• You can have the instrumentation added at compile time. For this, you need to let TAU take over
the compilation in some sense.
1. TAU has its own makefiles. The names and locations depend on your installation, but typi-
cally it will be something like
export TAU_MAKEFILE=$TAU_HOME/lib/Makefile.tau-mpi-pdt
2. Now you can invoke the TAU compilers tau_cc,sh, tau_cxx.sh, tau_f90.sh.
When you run your program you need to tell TAU what to do:
199
17. TAU
export TAU_TRACE=1
export TAU_PROFILE=1
export TRACEDIR=/some/dir
export PROFILEDIR=/some/dir
17.2 Instrumentation
Unlike such tools as VTune which profile your binary as-is, TAU works by adding instrumentation to your
code: in effect it is a source-to-source translator that takes your code and turns it into one that generates
run-time statistics.
This instrumentation is largely done for you; you mostly need to recompile your code with a script that
does the source-to-source translation, and subsequently compiles that instrumented code. You could for
instance have the following in your makefile:
ifdef TACC_TAU_DIR
CC = tau_cc.sh
else
CC = mpicc
endif
% : %.c
<TAB>${CC} -o $@ $^
If TAU is to be used (which we detect here by checking for the environment variable TACC_TAU_DIR), we
define the CC variable as one of the TAU compilation scripts; otherwise we set it to a regular MPI compiler.
Fortran note. Cpp includes If your source contains
#include "something.h"
17.3 Running
You can now run your instrumented code; trace/profile output will be written to file if environment vari-
ables TAU_PROFILE and/or TAU_TRACE are set:
export TAU_PROFILE=1
export TAU_TRACE=1
A TAU run can generate many files: typically at least one per process. It is therefore advisabe to create a
directory for your tracing and profiling information. You declare them to TAU by setting the environment
variables PROFILEDIR and TRACEDIR.
mkdir tau_trace
mkdir tau_profile
export PROFILEDIR=tau_profile
export TRACEDIR=tau_trace
TACC note. At TACC, use ibrun without a processor count; the count is derived from the queue submis-
sion parameters.
While this example uses two separate directories, there is no harm in using the same for both.
17.4 Output
The tracing/profiling information is spread over many files, and hard to read as such. Therefore, you need
some further programs to consolidate and display the information.
You view profiling information with paraprof
paraprof tau_profile
If you skip the tau_timecorrect step, you can generate the slog2 file by:
tau2slog2 tau.trc tau.edf -o yourprogram.slog2
17.6 Examples
17.6.1 Bucket brigade
Let’s consider a bucket brigade implementation of a broadcast: each process sends its data to the next
higher rank.
int sendto =
( procno<nprocs-1 ? procno+1 : MPI_PROC_NULL )
;
int recvfrom =
( procno>0 ? procno-1 : MPI_PROC_NULL )
;
MPI_Recv( leftdata,1,MPI_DOUBLE,recvfrom,0,comm,MPI_STATUS_IGNORE);
myvalue = leftdata
MPI_Send( myvalue,1,MPI_DOUBLE,sendto,0,comm);
We implement the bucket brigade with blocking sends and receives: each process waits to receive from
its predecessor, before sending to its successor.
// bucketblock.c
if (procno>0)
MPI_Recv(leftdata, N, MPI_DOUBLE,recvfrom,0, comm, MPI_STATUS_IGNORE);
for (int i=0; i<N; i++)
myvalue[i] = (procno+1)*(procno+1) + leftdata[i];
if (procno<nprocs-1)
MPI_Send(myvalue,N, MPI_DOUBLE,sendto,0, comm);
The TAU trace of this is in figure 17.1, using 4 nodes of 4 ranks each. We see that the processes within
each node are fairly well synchronized, but there is less synchronization between the nodes. However,
the bucket brigade then imposes its own synchronization on the processes because each has to wait for
its predecessor, no matter if it posted the receive operation early.
Next, we introduce pipelining into this operation: each send is broken up into parts, and these parts are
sent and received with non-blocking calls.
// bucketpipenonblock.c
MPI_Request rrequests[PARTS];
for (int ipart=0; ipart<PARTS; ipart++) {
MPI_Irecv
(
leftdata+partition_starts[ipart],partition_sizes[ipart],
MPI_DOUBLE,recvfrom,ipart,comm,rrequests+ipart);
}
> request,
> ierr )
call mpi_send( sum,
> 1,
> dp_type,
> reduce_exch_proc(i),
> i,
> mpi_comm_world,
> ierr )
sum = sum + d
enddo
We recognize this structure in the TAU trace: figure 17.3. Upon closer examination, we see how this
particular algorithm induces a lot of wait time. Figures 17.5 and 17.6 show a whole cascade of processes
waiting for each other.
Figure 17.5: Four stages of processes waiting caused by a single lagging process
17.6. Examples
Figure 17.6: Four stages of processes waiting caused by a single lagging process
SLURM
Supercomputer clusters can have a large number of nodes, but not enough to let all their users run si-
multaneously, and at the scale that they want. Therefore, users are asked to submit jobs, which may start
executing immediately, or may have to wait until resources are available.
The decision when to run a job, and what resources to give it, is not done by a human operator, but by
software called a batch system. (The Stampede cluster at TACC ran close to 10 million jobs over its lifetime,
which corresponds to starting a job every 20 seconds.)
This tutorial will cover the basics of such systems, and in particular Simple Linux Utility for Resource
Management (SLURM).
208
18.2. Queues
18.2 Queues
Jobs often can not start immediately, because not enough resources are available, or because other jobs
may have higher priority (see section 18.7). It is thus typical for a job to be put on a queue, scheduled, and
started, by a batch system such as SLURM.
Batch systems do not put all jobs in one big pool: jobs are submitted to any of a number of queues, that
are all scheduled separately.
Queues can differ in the following ways:
• If a cluster has different processor types, those are typically in different queues. Also, there may
be separate queues for the nodes that have a Graphics Processing Unit (GPU) attched. Having
multiple queues means you have to decide what processor type you want your job to run on,
even if your executable is binary compatible with all of them.
• There can be ‘development’ queues, which have restrictive limits on runtime and node count, but
where jobs typically start faster.
• Some clusters have ‘premium’ queues, which have a higher charge rate, but offer higher priority.
• ‘Large memory nodes’ are typically also in a queue of their own.
• There can be further queues for jobs with large resource demands, such as large core counts, or
longer-than-normal runtimes.
For slurm, the sinfo command can tell you much about the queues.
# what queues are there?
sinfo -o "%P"
# what queues are there, and what is their status?
sinfo -o "%20P %.5a"
Exercise 18.2. Enter these commands. How many queues are there? Are they all operational at
the moment?
All options regarding the job run are contained in the script file, as we will now discuss.
As a result of your job submission you get a job id. After submission you can queury your job with squeue:
squeue -j 123456
The squeue command reports various aspects of your job, such as its status (typically pending or running);
and if it is running, the queue (or ‘partition’) where it runs, its elapsed time, and the actual nodes where
it runs.
squeue -j 5807991
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
5807991 development packingt eijkhout R 0:04 2 c456-[012,034]
If you discover errors in your script after submitting it, including when it has started running, you can
cancel your job with scancel:
scancel 1234567
TACC note. This option can not be used to request arbitrary memory: jobs always have access
to all available physical memory, and use of shared memory is not allowed.
See https://slurm.schedmd.com/sbatch.html for a full list.
Exercise 18.3. Write a script that executes the date command twice, with a sleep in between.
Submit the script and investigate the output.
18.4.2 Environment
Your job script acts like any other shell script when it is executed. In particular, it inherits the calling
environment with all its environment variables. Additionally, slurm defines a number of environment
variables, such as the job ID, the hostlist, and the node and process count.
It would be possible to specify only the node count or the core count, but that takes away flexibility:
• If a node has 40 cores, but your program stops scaling at 10 MPI ranks, you would use:
#SBATCH -N 1
#SBATCH -n 10
• If your processes use a large amount of memory, you may want to leave some cores unused. On
a 40-core node you would either use
#SBATCH -N 2
#SBATCH -n 40
or
#SBATCH -N 1
#SBATCH -n 20
Rather than specifying a total core count, you can also specify the core count per node with --ntasks-per-node.
Exercise 18.4. Go through the above examples and replace the -n option by an equivalent
--ntasks-per-node values.
Python note. Python MPI programs Python programs using mpi4py should be treated like other MPI
programs, except that instead of an executable name you specify the python executable and the
script name:
ibrun python3 mympi4py.py
You can then ssh into the compute nodes of your job; normally, compute nodes are off-limits. This is
useful if you want to run top to see how your processes are doing.
18.9 Examples
Very sketchy section.
#!/bin/sh
you get the hostname of the login node from which your job was submitted.
Exercise 18.10. Which of these are shared with other users when your job is running:
• Memory;
• CPU;
• Disc space?
Exercise 18.11. What is the command for querying the status of your job?
• sinfo
• squeue
• sacct
Exercise 18.12. On 4 nodes with 40 cores each, what’s the largest program run, measured in
• MPI ranks;
• OpenMP threads?
SimGrid
Many readers of this book will have access to some sort of parallel machine so that they can run simu-
lations, maybe even some realistic scaling studies. However, not many people will have access to more
than one cluster type so that they can evaluate the influence of the interconnect. Even then, for didactic
purposes one would often wish for interconnect types (fully connected, linear processor array) that are
unlikely to be available.
In order to explore architectural issues pertaining to the network, we then resort to a simulation tool,
SimGrid.
Installation
Compilation You write plain MPI files, but compile them with the SimGrid compiler smpicc.
Running SimGrid has its own version of mpirun: smpirun. You need to supply this with options:
• -np 123456 for the number of (virtual) processors;
• -hostfile simgridhostfile which lists the names of these processors. You can basically
make these up, but are defined in:
• -platform arch.xml which defines the connectivity between the processors.
For instance, with a hostfile of 8 hosts, a linearly connected network would be defined as:
<?xml version='1.0'?>
<!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid/simgrid.dtd">
<platform version="4">
217
19. SimGrid
</platform>
Bibliography
[1] Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger. The Awk Programming Language.
Addison-Wesley Series in Computer Science. Addison-Wesley Publ., 1988. ISBN 020107981X,
9780201079814. [Cited on page 35.]
[2] L.S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammerling,
G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley. ScaLAPACK Users’ Guide. SIAM, 1997.
[Cited on page 116.]
[3] Netlib.org BLAS reference implementation. http://www.netlib.org/blas. [Cited on page 116.]
[4] Yaeyoung Choi, Jack J. Dongarra, Roldan Pozo, and David W. Walker. Scalapack: a scalable linear
algebra library for distributed memory concurrent computers. In Proceedings of the fourth symposium
on the frontiers of massively parallel computation (Frontiers ’92), McLean, Virginia, Oct 19–21, 1992,
pages 120–127, 1992. [Cited on page 116.]
[5] Edsger W. Dijkstra. Programming as a discipline of mathematical nature. Am. Math. Monthly, 81:608–
612, 1974. [Cited on page 145.]
[6] Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Iain Duff. A set of level 3 basic linear
algebra subprograms. ACM Transactions on Mathematical Software, 16(1):1–17, March 1990. [Cited on
page 116.]
[7] Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson. An extended set of
FORTRAN basic linear algebra subprograms. ACM Transactions on Mathematical Software, 14(1):1–
17, March 1988. [Cited on page 116.]
[8] Dale Dougherty and Arnold Robbins. sed & awk. O’Reilly Media, 2nd edition edition. Print ISBN:
978-1-56592-225-9 , ISBN 10:1-56592-225-5; Ebook ISBN: 978-1-4493-8700-6, ISBN 10:1-4493-8700-4.
[Cited on page 35.]
[9] Victor Eijkhout. The Science of TEX and LATEX. lulu.com, 2012. [Cited on page 39.]
[10] C. A. R. Hoare. An axiomatic basis for computer programming. Communications of the ACM, pages
576–580, October 1969. [Cited on page 145.]
[11] Helmut Kopka and Patrick W. Daly. A Guide to LATEX. Addison-Wesley, first published 1992. [Cited on
page 190.]
[12] L. Lamport. LATEX, a Document Preparation System. Addison-Wesley, 1986. [Cited on page 190.]
[13] C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic linear algebra subprograms for
fortran usage. ACM Trans. Math. Softw., 5(3):308–323, September 1979. [Cited on page 116.]
219
[14] Robert Mecklenburg. Managing Projects with GNU Make. O’Reilly Media, 3rd edition edition, 2004.
Print ISBN:978-0-596-00610-5 ISBN 10:0-596-00610-1 Ebook ISBN:978-0-596-10445-0 ISBN 10:0-596-
10445-6. [Cited on page 52.]
[15] Sandra Mendez, Sebastian L?hrs, Volker Weinberg, Dominic Sloan-Murphy, and Andrew
Turner. Best practice guide - parallel i/o. https://prace-ri.eu/training-support/
best-practice-guides/best-practice-guide-parallel-io/, 02 2019. [Cited on page 133.]
[16] Frank Mittelbach, Michel Goossens, Johannes Braams, David Carlisle, and Chris Rowley. The LATEX
Companion, 2nd edition. Addison-Wesley, 2004. [Cited on page 190.]
[17] NASA Advaned Supercomputing Division. NAS parallel benchmarks. https://www.nas.nasa.
gov/publications/npb.html. [Cited on page 203.]
[18] Tobi Oetiker. The not so short introduction to LATEX. http://tobi.oetiker.ch/lshort/. [Cited on
pages 177 and 190.]
[19] Jack Poulson, Bryan Marker, Jeff R. Hammond, and Robert van de Geijn. Elemental: a new framework
for distributed memory dense matrix computations. ACM Transactions on Mathematical Software.
submitted. [Cited on page 116.]
[20] S. Shende and A. D. Malony. International Journal of High Performance Computing Applications,
20:287–331, 2006. [Cited on page 199.]
[21] TEX frequently asked questions. [Cited on page 190.]
[22] R. van de Geijn, Philip Alpatov, Greg Baker, Almadena Chtchelkanova, Joe Eaton, Carter Edwards,
Murthy Guddati, John Gunnels, Sam Guyer, Ken Klimkowski, Calvin Lin, Greg Morrow, Peter Nagel,
James Overfelt, and Michelle Pal. Parallel linear algebra package (PLAPACK): Release r0.1 (beta)
users’ guide. 1996. [Cited on page 116.]
[23] Robert A. van de Geijn. Using PLAPACK: Parallel Linear Algebra Package. The MIT Press, 1997. [Cited
on page 116.]
[24] Greg Wilson, D. A. Aruliah, C. Titus Brown, Neil P. Chue Hong, Matt Davis, Richard T. Guy, Steven
H. D. Haddock, Kathryn D. Huff, Ian M. Mitchell, Mark D. Plumbley, Ben Waugh, Ethan P. White,
and Paul Wilson. Best practices for scientific computing. PLOS Biology, 12(1):1–7, 01 2014. [Cited on
page 6.]
List of acronyms
221
21. List of acronyms
Index
223
INDEX
ISBN 978-1-257-99254-6
90000
9 781257 992546