Path. The Full Path Always Starts With A Forward Slash (: 3.4.4 Special Characters
Path. The Full Path Always Starts With A Forward Slash (: 3.4.4 Special Characters
path. The full path always starts with a forward slash (/) which represents the root
of the Unix directory structure. For instance, the pwd command returns to full path
to the current location. A relative path is relative to your current location within the
directory structure. It often start with the name of a directory (moving down in the
directory tree) or with ../ (moving up in the directory tree).
The path to a directory or file can be added to many Unix commands. For instance,
assuming the current location within the directory tree is /home/students/rjones/,
listing the contents of the directory projects/sst can be achieved using the relative
path
ls -l projects/sst
ls -l /home/students/rjones/projects/sst
Some commands such as the copying command cp even take two paths, the first
being the path to the input file or directory (source) and the second being the path
to the output file or directory (destination).
The (relative) path to the current directory is a dot (.) or a dot followed by a forward
slash (./) which can be used to copy a file from another directory into the current
directory as shown in the following example.
cp dir1/dir2/myfile.csv ./
The dot (.) and tilde (∼) are shortcuts for paths pointing to the current directory
and the home directory of the user, respectively. The double-dot (..) and forward
slash (/) are used in paths, and represent a level up within the directory tree and the
corresponding sub-directory, respectively.
The asterisk (*) and the question mark (?) are useful for listing or search for specific
file or directory pattern. For instance, the following example lists all files that begin
with Engelstaedter, end with the file extension .pdf and that are located in a directory
called papers that sits at the root of the home directory. The command could be
executed from anywhere within the Unix directory tree.
ls -l ~/papers/Engelstaedter*.pdf
Text files can be created in various ways. First, a new text file can be created by
using a text editor as shown in Section 3.4.3. Second, the touch command followed
by a filename can be used. The actual purpose of the touch command is to update
the access and modification time of a file but it also creates an empty text file if the
specified filename does not correspond to an existing file. Third, a text file can be
created by redirecting the text output from a Unix command to a file as explained in
Section 3.5.x.
Unix is very picky about white spaces as they are interpreted by the Unix
system as the end of a command, path or filename. Do not use spaces in file
or directory names. Use underscores instead of spaces if needed. If white
spaces exist in filenames then double-quotes can be used to make the Unix
system aware of parts belonging together.
The mkdir (make directory) command can be used to create single or multiple
directories as shown in Table 3.5.1.1. More details about the mkdir command can
be found in the man pages (Section 3.4.2).
Command Description
mkdir dir1 Create a directory called dir1.
mkdir dir1/dir2 Create a sub-directory dir2 inside directory dir1.
mkdir dir1 dir2 dir3 Create three directories in one go called dir1, dir2 and dir3.
mkdir -p dir1/dir2 Create directory dir2 and parent directory dir1 in on go.
touch myfile.txt Create an empty text file.
Table 3.5.2.1: Examples for listing files and directories using the ls command.
Command Description
ls Simple list of files and directories.
ls -l Long list file format.
ls -lt Sort list by time.
ls -ltr Sort list by time and reverse order.
ls -lh Show size in human-readable format.
ls -lhS Sort by size and show size in human-readable format.
ls -la List all files including hidden files (that start with a dot).
ls -p | grep / List directories only.
ls -l dir1/ List the contents of directory dir1.
Table 3.5.3.1: Examples for moving around in the directory tree using the cd command.
Command Description
cd Jump to the root of the home directory.
cd dir1 Move into directory dir1.
cd dir1/dir2 Move two levels down into sub-directory dir2.
cd .. Move one level up in the directory tree.
cd ../.. Move two levels up in the directory tree.
Unix 43
Table 3.5.4.1: Examples for copying (cm), moving (mv), renaming (mv) and deleting (rm) files and
directories.
Command Description
cp <ifile> <ofile> Copy a file (generic syntax).
cp file.txt dir/file1.txt Create a copy of file.txt named file1.txt in the
directory dir.
cp ∼/test.txt ./ Copy a file from the root of the home directory to the
current directory without changing the files name.
cp -R idir odir Copy a directory recursively (generic syntax).
mv ifile ofile Move or rename a file a directory (generic syntax).
mv file.txt dir/ Move file file.txt into the directory dir.
mv file.txt ../ Move file file.txt one level up in the directory tree.
mv file1.txt file2.txt Rename file1.txt to file2.txt.
rm file.txt Delete file.txt.
rm -f file.txt Delete file file.txt without confirmation (force delete).
rm -r dir Remove directory dir and all its content.
When copying and moving files care should be taken not to overwrite
existing files as they may be over-written without warning.
The head and tail commands can be used to dump lines from the beginning or the
end of a text file into the terminal window, respectively.
To see the differing behaviour of the commands can be tested best with a long text file
(longfile.txt). Examples are shown in Table 3.6.1.1. For additional command options
such as how to search file content as part of the less command the man pages can be
consulted.
Table 3.6.1.1: Examples for for examining text files using the cat, less, head and tail commands.
Command Description
cat longfile.txt Print the contents of the file to the terminal window.
less longfile.txt Scroll and search through file.
head longfile.txt Dump the first 10 lines of the file to the terminal window.
head -n 20 longfile.txt Dump the first 20 lines of the file to the terminal window.
tail longfile.txt Dump the last 10 lines of the file to the terminal window.
tail -n 20 longfile.txt Dump the last 20 lines of the file to the terminal window.
A description of the file properties in the example above is shown in Table 3.6.2.1. The
first part (-rwxr-xr-x) are the file permissions. They are explained in more detail in
Section 3.6.3. The second part (1) is the number of hard links to the file (can be ignored
most of the time). The third and forth part (rjones and climate) are the username of
the file owner and the group the owner belongs to. When a Unix account is created by
the system administrator the owner is placed into a group for management purposes.
The last three parts (1.2K, Apr 19 16:32 and file.txt) show the file size, creation or
last modification time and the filename, respectively.
Unix 46
Command Description
-rwxr-xr-x File permissions.
1 Number of hard links to this file.
rjones Username of file owner.
climate Name of the group the file belongs to.
1.2K File size (in human-readable format).
Apr 19 16:32 File creation/access time and date.
file.txt Filename.
In some cases a plus symbol (+) is shown as an eleventh character indicating that
extended file permissions have been set using Access Control Lists (not covered in
this book).
The permissions part can be set using either symbolic or octal notation. The symbolic
notation will be discussed first.
In its simplest form, the symbolic permissions part is made up of three characters. The
first character defines the group for which permissions are intended to be changed
(u for user, g for group, o for others or a for all). The second character defines
whether permission is intended to be granted or to be removed (+ for granting or -
Unix 48
for removing permissions). The third character defines which permissions are being
modified (r for read, w for write or x for execute). For more options consult the manual
pages.
For example, executing the command ls -l script.sh may return the following file
properties.
Only the file owner rjones and members of the group climate can read and edit the
file. Others (everyone else) can not access the file. No one can execute the file.
The following command adds execute permissions for the file owner (needed to run
a Shell script).
The ls -l script.sh command now returns the following updated file properties.
The octal notation uses a three-digit octal number to set the permissions. An easy
way to identify the octal number for a specific set of permissions is to use one of the
online Unix permission calculators¹⁸.
Some examples for changing file and directory permissions using symbolic and octal
notation are given in Table 3.6.4.1.
Table 3.6.4.1: Examples for changing file and directory permissions using octal and symbolic
notation.
Command Description
chmod 755 file.txt Sets file permissions to -rwxr-xr-x (often the default).
chmod -R 760 dir1 Setting file permissions to drwxrw---- for a directory recursively.
chmod +r file.txt Give all groups read permission to a file.
chmod g-w file.txt Remove read permission to a file for members of the group.
¹⁸http://permissions-calculator.org
Unix 49
The owner and group of the file or directory can be changed using the chown (change
owner) and chgrp (change group) commands, respectively. Some examples for the use
of the chown and chgrp command are given in Table 3.6.4.2.
Table 3.6.4.2: Examples for changing file and directory ownership and group information.
Command Description
chown jking script.sh Change the owner of the file script.sh to jking.
chgrp students script.sh Change the group associated with the file script.sh
to students.
chown -R jking:students data Change the owner and group of the directory data
recursively in one command.
When logging into the Unix account next time then the new password should be
used.
If the old password is unknown the only the system administrator can reset
the password.
Unix 50
If the file to which the output is redirected to does not exist then it will be created.
If the file to which the output is redirected to already exists then the file will be
overwritten.
In order to append the redirected output to the content of an already existing file two
joined greaten-than symbols (>>) can be used instead of one (>).
The path can be either a dot (.) indicating that the search should start at the current
location or any other full or relative path (see Section 3.4.4) pointing to a directory on
the server. Expressions are where the power of the find command lies as they can be
used to determine the search pattern. Some examples that are frequently used with
the find command are shown in Table 3.6.7.1.
Unix 51
Table 3.6.7.1: Examples for finding files and directories using the find command.
Command Description
find . -iname 'myFile.txt' Find the file myFile.txt, case-insensitive.
find . -name '*.pro' -print Find all files and directories ending with pro,
case-sensitive.
find . -type f -iname '*ipcc*' Find files (ignore directories) that contain ipcc in
the file name.
find . -not -iname '*.dat' Find all files that do not end with .dat.
find . -user abcd1234 Find all files owned by a user called abcd1234.
find . -type f -size +100M Find files that are larger than 100 MB.
find . -type f -size -100M Find files that are smaller than 100 MB.
find . -maxdepth 1 -name '*.py' Find all files ending with .py in the current
directory only.
The find command returns an unsorted list of files. In order to generate a sorted
list the find command output can be passed on to the ls command using back ticks.
For instance, the following command searches for files with the file extension .ppt
starting the search at the current location within the directory tree. The whole find
command construct is put between backticks. The ls -l command is placed at the
beginning of the line.
Note the difference between a single quote and a backtick in the above
command. The backtick can normally be found on the keyboard just below
the Esc key.
and time stamps. It tends to have the file extension .tar. The name TAR was derived
from tape archive as the method was originally developed to write data to tape.
Both formats are quite commonly used to archive data and are sometimes used
together to create tarred zip files.
The archive tool GZIP (the G stands for GNU) makes use of both the zip and tar
format to generate tarred zip files using having the file extension .tar.gz. Some
examples of working with .zip, .tar and .tar.gz files are given in Table 3.6.8.1.
Command Description
unzip -l file.zip List the content of file.zip.
zip -r docs.zip docs Create a zipped archive docs.zip containing all files
in the directory docs.
unzip docs.zip Extract archive from docs.zip.
tar -ztvf out.tar.gz List the content of out.tar.gz without extracting.
tar -cf out.tar <infiles> Create a tar file out.tar containing several input
files.
tar -xf out.tar Extract files from out.tar.
tar -czf out.tar.gz <infiles> Create a tar file out.tar.gz with GZIP compression
in one go.
tar -xzf out.tar.gz Extract files from a GZIP tar file out.tar.gz in one
go.
locally and, if it does, to only download the file if the remote version of the file is
newer than the local version.
wget -N https://crudata.uea.ac.uk/cru/data/temperature/absolute.nc
While the download command is executed some information appears in the terminal
window including which server the file is downloaded from (crudata.uea.ac.uk) and
the file size (62K). The download progress is shown as well as the download speed.
It is possible to rename the download file on the fly within the same command using
the -O option followed by the new filename. In the following example the download
file is renamed to CRU_Sfc_T.nc.
To create a Screen session the screen command is used followed by the option -S and
the name of the Screen session. In the following command a Screen session name
era5 is created.
screen -S era5
To check which Screen sessions are currently set up the screen -ls command can be
used. All running Screen sessions will be list including information about the Screen
session’s names and associated ID numbers as well as the current connection status
(Attached or Detached). The output from the screen -ls command may look similar
to the following.
Multiple Screen sessions with the same name can be created. They are dis-
tinguishable via the associated process ID number (64521 in example above)
which can also be used as part of the Screen session name (64521.era5).
To detach from a Screen session either the the keyboard shortcut Ctrl-a d can be
used or the terminal window can be closed using the mouse.
To re-attach to a Screen session the command screen -dR followed by the session
name can be used. The -d option detaches any open connections (e.g., in another
terminal or on another machine). The -R option re-attaches to the Screen session.
To terminate a Screen session while being attached the keyboard shortcut Ctrl-a k
(kill) can be used or the Unix exit command can be executed on the command line.
Executing the Unix command exit on the command line while being
attached to a Screen session will terminate the session.
Some of the more frequently used Screen commands are listed in Table 3.7.1.1.
Unix 55
Command Description
screen -S era5 Start a Screen session named era5.
screen -R era5 Reconnect to the Screen session named era5.
screen -dR era5 Close any open connections to the era5 session and
reconnect.
Ctrl-a d Keyboard shortcut for detaching from a session.
Ctrl-a k Keyboard shortcut for terminating (killing) a session.
4. Multi-dimensional Gridded
Datasets
4.1 The Earth’s Coordinate System and Realms
In order to understand how models represent the spherical nature of our planet it
is important to be familiar with the terminology that is used to describe Earth’s
horizontal and vertical space. In a simplified conceptual model planet Earth takes
the form of a sphere. Our planet has one geographic pole in the north and one
in the south. Lines connecting the two poles are called meridians. Each meridian
is associated with a constant longitude value. Longitude values are expressed in
degrees west and east from the Prime Meridian. By convention the prime meridian
passes through the Royal Observatory in Greenwich, UK, and is associated with 0°
longitude. Meridian longitude values decrease westwards from the prime meridian
halfway around the Earth up to -180° and increase eastwards halfway around the
Earth up to 180°.
The line located at equal distance from both poles circling in the east-west direction
around the globe is called the Equator. Lines parallel to the Equator towards the
north and south are called parallels. Their position on the planet is determined by
the angle from the horizontal Equator plane. This angle is referred to as latitude. The
latitude value of the parallels increases from the Equator northwards up to 90° at the
north pole and decreases from the Equator southwards up to -90° at the south pole.
Each point on the Earth’s surface can be identified by a pair of latitude and longitude
coordinates.
Our planet is surrounded by a layer of gases (primarily nitrogen, oxygen, argon and
carbon dioxide) that make up the atmosphere (know as air). A planet without an
atmosphere does not have weather. Most of these gases are within 16 km of the
surface. Air pressure and density decreases with distance from the land or sea surface.
Mean sea-level pressure (MSLP) is the average air pressure at mean sea-level. The
Multi-dimensional Gridded Datasets 57
global average MSLP is 1012.25 hPa. In addition, large parts of the Earth’s surface
are covered by water forming large ocean basins (e. g., Atlantic, Pacific and Indian
Ocean) and some smaller more shallow seas (e. g., Mediterranean, North Sea). Water
pressure increases with ocean depths. Both, the global oceans and the atmosphere,
constitute the vertical component of the Earth climate system above the surface.
Figure 4.2.1: Schematic of surface grid cells (2D) and atmospheric grid boxes (3D) of an atmospheric
general circulation model (AGCM).
Earth’s surface and in its atmosphere. AGCMs therefore compute the climate state
for regularly spaced points around the planet. These model grid points are generally
referred to as the model grid. These points are located at the centre of grid cells.For
surface variables such as precipitation or 2m air temperature they do not have a
vertical representation but follow the surface topography (Figure 4.2.1).
In contrast, for atmospheric variables such as air temperature, humidity or winds the
data points are located at the centre of horizontally distributed and vertically stacked
3-dimensional grid boxes (Figure 4.2.1). As indicated in Figure 4.2.1 AGCMs compute
horizontal and vertical exchanges between the surface and the lowest atmospheric
level of grid boxes as well as between the atmospheric grid boxes based on the
thermodynamic equations. Data values associated with data points are meant to
represent an average value for the grid cell area (surface variables) or grid box volume
(atmospheric variables). The data values change with every model timestep.
The horizontal distance between data points is referred to as the model horizontal or
spatial resolution. AGCM horizontal resolutions typically range between 1° and 5°.
For regional area models and weather forecast models the spatial resolution may be
much lower. If the horizontal distance between the data points is large then the model
resolution is referred to as coarse or low. If the horizontal distance between the data
points is small then the model resolution is referred to as fine or high. The higher
the model resolution the larger the number of data points for which climate variables
have to be computed. Therefore, computing time and processing power requirements
increase exponentially with increasing model resolution. The longitudinal distance
between data points may be different from the latitudinal distance.
The lowest set of horizontally distributed grid boxes creates a layer around the planet
that interacts with land and ocean surfaces. In the vertical domain additional layers
stacked on top of each other make up the atmosphere (Figure 4.2.1). The layers
are referred to as model levels. The vertical distance between data points (vertical
resolution) is more complex and will be discussed in more detail in Section xxx.
Multi-dimensional Gridded Datasets 59
Figure 4.2.2: Schematic of projecting regularly spaced longitude and latitude grid cells wrapped
around the global onto a horizontal plane using the cylindrical projection cut along a) the prime
meridian and b) the 180° (-180°) meridian.
The surface grid cell boundaries and by extent the horizontal atmospheric grid box
boundaries follow the Earth’s meridians and parallels (Figure 4.2.1). In the context
of the model grid the term zonal is used to describe phenomena associated with
changes between grid boxes aligned in the east-west direction. Zonally aligned grid
boxes are bound by one parallel to the north and one to the south (Figure 4.2.1). The
term meridional is used to describe phenomena associated with changes between
grid boxes aligned in the north-south direction. Meridionally aligned grid boxes are
bound by one meridian to the west and one to the east. In climate science the terms
Multi-dimensional Gridded Datasets 60
zonal and meridional are used to describe directional climate variables or statistics
such as meridional wind which refers to the u-component (east-west) of the wind or
zonally averaged global surface temperature which refers to the surface temperature
averaged around the Earth between two specified latitudes.
While on a regularly spaced grid around the Earth the meridional grid cell width will
always be the same from one pole to the other, the zonal grid cell width decreases
with distance from the Equator towards the poles (Figure 4.2.1 and Figure 4.2.2).
Therefore, the area covered by each grid cell will always be the same in the zonal
direction whereas the cell area decreases in the meridional direction away from the
Equator towards the poles. For example, for meridians with a 1° longitudinal spacing
the distance from one meridian to the next is about 118 km near the Equator, 96 km
at 30°latitude and 56 km at 60° latitude.
For the purpose of climate computations the grid wrapped around the spherical
Earth is transposed to a regular grid where all grid boxes have the same size. For
illustration purposes the cylindrical projection may be used whereby the grid cells
are first projected onto a cylinder by an imaginary light source at the centre of the
Earth, after which the cylinder is cut along a meridian and ‘unfolded’ into
a 2-dimensional plane. Cutting the cylinder along the primer meridian results in a
Pacific-centred map (Figure 4.2.2a) whereas cutting the cylinder along the 180° (-180°)
meridian results in an Africa-centred map (Figure 4.2.2b). Each grid cell represents
a single data value. It is important to note that while the transposed grid cells all
have the same size the data value associated with each grid box still represents the
real-world grid cell area which changes with distance from the Equator as discussed
above.
Both represent 2-dimensional fields of the same size with the same number of data
points. Therefore, both are treated the same way during model data analysis. The
only difference between the two fields is with regards to what they represent (surface
field vs. atmospheric layer).
The surface and single atmospheric layer data field shown in Figure 4.3.1a and 4.3.1b
show global fields with a 5° by 5° spatial resolution represented by 72 grid cells or
grid boxes in the longitude direction and 36 grid cells or grid boxes in the latitude
direction. The northernmost boundary of the field is at 90° latitude, representing the
north pole. The southernmost boundary is at -90° latitude, representing the south
pole. The westernmost boundary of the global field is at -180° (or 0°) longitude. The
easternmost boundary is at 180° (or 360°) longitude.
Multi-dimensional Gridded Datasets 62
Figure 4.3.1: Schematic of projecting regularly spaced longitude and latitude grid cells wrapped
around the global onto a horizontal plane using the cylindrical projection cut along a) the prime
meridian and b) the 180° (-180°) meridian.
Multi-dimensional Gridded Datasets 63
Indices are used to refer to a specific grid cell or its associated data value. The
position of each grid cell or grid box within the 2-dimensional grids shown in Figure
4.3.1 can be specified by a pair of two index values. The first index value specifies
the latitudinal position and the second specifies the longitudinal position. In the
latitudinal direction the northernmost grid cell has the index 0. The index then
increases in steps of 1 towards the south until index value 35 for the most southern
grid cell). Note that for 36 grid cells in the latitude direction the index runs from
0 to 35 (Figure 4.3.1). Similarly, in the longitudinal direction the westernmost grid
cell has the index 0 increasing in steps of 1 towards the east until index 71 for the
easternmost grid cell. Note that for 72 grid cells in the longitude direction the index
goes from 0 to 71 (Figure 4.3.1).
By using this system the grid cells and grid boxes that make up the four corners of
the global field can by specified by the index pairs [0, 0] for the northwestern corner,
[0, 71] for the northeastern corner, [35, 0] for the southwestern corner and [35, 71]
for the southeastern corner. All other grid cells can be specified by their respective
indices accordingly in the same way for surface fields (Figure 4.3.1a) and single level
fields (Figure 4.3.1b).
The magnified grid cell in Figure 4.3.1a has the grid cell boundaries -85° and -90°
latitude and -180° (or 0°) and -175° (or 5°) longitude. The data point associated with
this grid cell is geographically located at the centre of the cell at -87.5° latitude
and -177.5° (or 2.5°) longitude. The concept of centred data points is similar for a
single atmospheric layer grid box. The magnified grid box shown in Figure 4.3.1b
has the grid box boundaries -85° and -90° latitude and 175° (or 355°) and 180° (or 360°)
longitude. Accordingly, the geographical location of the data point is at -87.5° latitude
and 177.5° (or 357.5°) longitude. In addition, the data point here is vertically raised
to the middle of the depth of the atmospheric layer. Different types of atmospheric
levels will be discussed in more detail in Section x.x¹.
¹
Multi-dimensional Gridded Datasets 64
Stacked atmospheric layers make up the atmosphere in the model and thereby
introduce a third dimension to the data field (Figure 4.3.2). As a result three indices
are now required to reference a grid box. The spatial indices associated with
longitude and latitude are the same as for the surface and single atmospheric layer
data field (Figure 4.3.1). In addition, a third index is added, usually located at the
first position within the index triplet. In the example depicted in Figure 4.3.2 the
index associated with the highest vertical layer in the atmosphere is 0 and the index
associated with the lowest atmospheric layer is 16. Note that for 17 vertical layers the
index goes from 0 to 16. The grid box associated with the westernmost, southernmost
and highest atmospheric level has the index triplet [0, 35, 0]. Similar, the grid box
associated with the easternmost, northernmost and lowest atmospheric level has the
index triplet [16, 0, 71]. Every single grid box or its associated data value within
this 3-dimensional data field can be referenced by using this system of index triplets
(Figure 4.3.2).
Multi-dimensional Gridded Datasets 65
Data fields are stored in data files, most likely in netCDF format (see Section 2.5.4).
It is important to note that the order in which the dimensions and indices (longitude,
latitude, levels) are stored may vary and differ from the examples presented in Figure
4.3.1 and Figure 4.3.2. While the index for the longitude dimension will always start at
the westernmost position with 0 and increase towards the east the order of indices for
the latitude dimension and vertical levels may be reversed. This means the latitude
index may start at the southernmost position with 0 and increases towards the north
and the index for the vertical levels may start with 0 at the lowest level and increase
with each level upwards.
In addition, the index positions within the index pairs or index triplets may change.
In the examples presented here the index position is [latitude, longitude] for the 2-
dimensional fields (Figure 4.3.1) and [level, latitude, longitude] for the 3-dimensional
field (Figure 4.3.2. How the order of indices and their position within the index pair
or index triplet can be identified in a data file will be discussed in more detail in
Section xx².
²
Multi-dimensional Gridded Datasets 66
Figure 4.5.1: Schematic of the time dimension in gridded datasets. Timesteps are indicated as T1 to
Tn corresponding to the indices 0 to n.
In addition to two (longitude and latitude) or three (longitude, latitude and vertical)
spatial dimensions a dataset can have a time dimension. In that case a 2 or 3-
dimensional data field will be associated with each timestep. An example is shown
in Figure 4.5.1 for a field with three spatial dimensions. The values associated with
each individual small grid box are likely to change between one timestep and the
next. Similar to the indexing of the spatial and vertical dimensions, the index of the
first timestep is also 0.
Multi-dimensional Gridded Datasets 67
coordinates (orthogonal coordinate system). On a Gaussian grid the grid points in the
zonal direction (along each parallel) are equally spaced. This means that the distance
between two adjacent degrees of longitude is the same for a given latitude. The grid
points in the meridional direction (along each meridian) are unequally spaced. This
means that the distance between adjacent degrees of latitude varies with distance
between the equator and the poles. The unequal spacing between grid points in the
meridional direction is determined by Gaussian quadrature³ calculations. Gaussian
grids have no grid point at the poles or on the Equator. However, the distances
between lines of latitude are symmetrical about the Equator.
There are two types of Gaussian grids. First, the full Gaussian grid (also referred to
as regular Gaussian grid). On a full (regular) Gaussian grid the number of zonal grid
points (grid points along each parallel) is always the same regardless of the latitude.
Second, the reduced Gaussian grid (also referred to as thinned or quasi-regular
Gaussian grid). On a reduced (thinned or quasi-regular) Gaussian grid the number
of zonal grid points (grid point along each parallel) decreases towards the poles.
Gaussian grids are labelled using the N value whereby N is the number of latitude
grid points between the Equator and the poles. The total number of latitude grid
points between the poles is, therefore, 2N. The total number of longitude grid points
is usually 4N for a full Gaussian grid as well as for latitude grid points located close
to the Equator on a reduced Gaussian grid.
Table 4.5.2.1 illustrates the concepts described above for a N80 Gaussian grid (e.g.,
ERA-40 surface fields). Similar tables provided by ECMWF can be found for N320⁴,
N640⁵ and N1280⁶.
³https://en.wikipedia.org/wiki/Gaussian_quadrature
⁴https://confluence.ecmwf.int/display/FCST/Gaussian+grid+with+320+latitude+lines+between+pole+and+equator
⁵https://confluence.ecmwf.int/display/FCST/Gaussian+grid+with+640+latitude+lines+between+pole+and+equator
⁶https://confluence.ecmwf.int/display/FCST/Gaussian+grid+with+1280+latitude+lines+between+pole+and+
equator
Multi-dimensional Gridded Datasets 69
Table 4.5.2.1: Example of full and reduced N80 Gaussian grid points (adapted from BADC).
Table 4.5.2.1: Example of full and reduced N80 Gaussian grid points (adapted from BADC).
Table 4.5.2.1: Example of full and reduced N80 Gaussian grid points (adapted from BADC).
differences between some of the more common vertical coordinate systems will be
discussed in the following sub-sections.
Figure 4.7.2.1: Hybrid sigma-pressure levels used by the ECMWF model. (a) The elevation of the
model levels (green lines; the example shows levels from the 31 level model; level indices k in
green) changes with surface pressure (black curve at the bottom). The data value for a given
pressure value p can be located at different levels in the grid (the red line marks the location
of p = 600 hPa). (b) Example of how the surface orography affects the vertical displacement
of the grid points in a vertical section. (Source: Three-dimensional visualization of ensemble
weather forecasts - Part 1: The visualization tool Met.3D (version 1.0) - Scientific Figure on
ResearchGate. Available from: https://www.researchgate.net/figure/Hybrid-sigma-pressure-levels-
used-by-the-ECMWF-model-a-The-elevation-of-the-model_fig9_307835524 [accessed 5 Aug, 2019];
available via license: Creative Commons Attribution 4.0 International)
pressure levels and sigma levels as a vertical coordinate system referred to as hybrid
levels or hybrid sigma-pressure levels.
Figure 4.7.3.1 shows hybrid sigma-pressure levels (blue) for a version of the ECMWF
forecast model system that has 91 levels. Close to the surface the levels are terrain-
following hybrid sigma-pressure levels. At approximately midway through the
atmosphere the levels transition to pure pressure levels.
Multi-dimensional Gridded Datasets 75
Figure 4.7.3.1: The 91 Sigma levels used in ENS configuration of the atmospheric model.
The 137 level configuration is similarly distributed but with a relatively higher verti-
cal resolution. Sigma levels are terrain-following at lower levels and become constant
pressure levels for the upper tropsphere and above. (Source: ECMWF. Available from:
https://confluence.ecmwf.int/display/FUG/Grid+point+Resolution [accessed 7 Aug, 2019])
5. The netCDF File Format
5.1 Introduction to the netCDF File Format
The netCDF file format (Section 2.5.4) has become the most commonly used data
file format for saving gridded climate data in recent years. The first step in climate
data analysis after obtaining access to data files is to get a good understanding of the
contents of the file. It is essential to understand how the data stored within netCDF
files are organised and what the data represent as this is the basis for any subsequent
data operations. The most important questions to ask of a data file are as follows:
• What temporal and spatial dimensions are associated with the data fields?
• What is the spatial resolution and what spatial domain is covered?
• What is the temporal resolution and what time period is covered?
• Which data variables are saved in the file?
• What units are the data variables saved in?
The variable names and variable dimensions are especially important as these are
needed to read in the data correctly into analysis software packages such as Python.
In addition, it may be helpful to find out what the time unit and the reference
time used is (discussed later in more detail). All the information needed to answer
the above questions is stored in the netCDF file headers, sometimes also called file
metadata. The netCDF file headers describe most aspects of the data the file contains,
hence why this data format is referred to as self-describing.
just read netCDF file headers but their use is described here only with regards to
that purpose. The difference between the three tools lies in the way the file header
information is presented. Which tool to use depends on personal preference as well as
the information one is interested in. For instance, ncdump displays a well-structured
overview of the file header in the terminal window whereas the CDO package is more
useful for looking at date and time information because it automatically converts
the timestamps into a more sensible ‘human-readable’ format. ncview is useful for
visually inspecting the geographical domain, spatial pattern in the data and data
value ranges of the fields stored in the netCDF file allowing, for instance, the user to
quickly check if a CDO file operation has produced the expected results.
ncdump -h data.nc
Executing the above command will generate text output inside the terminal similar
to the following.
Example output from a ncdump -h command
netcdf data {
dimensions:
longitude = 480 ;
latitude = 241 ;
time = UNLIMITED ; // (408 currently)
variables:
float longitude(longitude) ;
longitude:standard_name = "longitude" ;
¹https://www.unidata.ucar.edu/software/netcdf/netcdf-4/newdocs/netcdf/ncdump.html
The netCDF File Format 78
longitude:long_name = "longitude" ;
longitude:units = "degrees_east" ;
longitude:axis = "X" ;
float latitude(latitude) ;
latitude:standard_name = "latitude" ;
latitude:long_name = "latitude" ;
latitude:units = "degrees_north" ;
latitude:axis = "Y" ;
double time(time) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:units = "hours since 1900-01-01 00:00:00" ;
time:calendar = "standard" ;
double t2m(time, latitude, longitude) ;
t2m:long_name = "2 metre temperature" ;
t2m:units = "K" ;
t2m:_FillValue = -32767. ;
// global attributes:
:CDI = "Climate Data Interface version 1.6.0 (http://code.zmaw.\
de/projects/cdi)" ;
:Conventions = "CF-1.0" ;
:history = "Fri May 09 16:49:11 2014: cdo monmean erai_t2m_1979\
_2012.nc ./erai_mm_t2m_1979_2012.nc\n", "Tue Nov 26 15:36:25 2013: cdo -b F64 -\
mergetime erai_t2m_00.nc erai_t2m_06.nc erai_t2m_12.nc erai_t2m_18.nc erai_t2m_\
1979_2012.nc\n", "2013-11-26 15:19:53 GMT by mars2netcdf-0.92" ;
:CDO = "Climate Data Operators version 1.6.0" ;
}
The file name is indicated in line 1. In lines 2 to 6 the dimensions of the file of the
data are shown. In this example the file has three dimensions including two spatial
dimensions (longitude and latitude) and on time dimension (time). The longitude
dimension has 480 data points and the latitude dimension has 241 data points. Setting
the time dimension to UNLIMITED is quite common as it allows to add additional
timesteps to the netCDF file structure. The current number of timesteps is 408.
Lines 7 to 26 provide information about the variables included in the file. First, details
about the variables associated with the three dimensions are listed in lines 8 to 22
including the variables longitude (line 8), latitude (line 13) and time (line 18). These
The netCDF File Format 79
variables are associated with the dimensions and are also referred to as coordinate
variables. Following the coordinate variables, details about the data variable are
shown in lines 23 to 26 (netCDF files can hold multiple data variables). The data
variable in this example is called t2m (line 23).
The general format in which variable information is presented is the following. First,
an indented single line shows the data type, the variable name and in brackets the
dimension(s) associated with the variable. Second, further indented, a list of variable
attributes and their values is presented for each variable. In the following paragraphs
the variables and their attributes will be discussed in some more detail.
Line 8 shows that the coordinate variable longitude is of the data type float (floating
point) and that it is associated with a single dimension named longitude. Note that for
coordinate variables the variable name and the associated dimension variable name
is often the same - they should not be confused. The longitude variable contains 480
longitude values. In this example the longitude variable has four attributes (lines
9 to 12) named standard_name, long_name, units and axis which provide additional
information about the variable. The standard_name and long_name attributes are both
set to longitude and the units attribute is set to degrees_east. The longitude variable
represents the X axis on a map.
The latitudevariable information (lines 13 to 17) looks very similar to that of the
longitude variable. The main difference is that the units attribute of the latitude
variable is degrees_north and that the latitude variable represents the Y axis on a
map.
The variable time (line 18) is associated with the time dimension which means the
time variable stores 408 time values of the data type double (double precision). The
time variable attributes (lines 19 to 22) show that the standard_name and long_name
attributes are both set to ‘time’.
The netCDF File Format 80
Executing the above command will produce output in the terminal window that may
look similar to the following.
Example output from a cdo sinfon command
Lines 1 to 3 display general file information in the form of a table with the table
headers in line 2 and the associated values in line 3. The table headers include
institute (Institute), data source (Source), type of statistical processing (Ttype),
number of levels (Levels), z-axis number (Num), horizontal grid size (Gridsize), grid
size number (Num), data type (Dtype) and parameter identifier (Parameter name). From
the information in line 3 it can be deduced that the file contains data on a single level,
that the horizontal grid has a total of 115680 grid points, that the data are saved as
64-bit floating point values and that the parameter name is t2m.
Note that a netCDF file may contain more than one variable.
Lines 4 to 7 list details about the horizontal grid. Line 5 shows that the grid type
is lonlat meaning the data are on a regular longitude/latitude grid (see Section xxx
for more details on netCDF grid types). The number of grid boxes in the longitude
direction (nx = 480) and in the latitude direction (ny = 241) reveals that there are
115680 data points. Line 6 and 7 shows the range of the longitude and latitude
variables, respectively, as well as their associated spatial resolution. The data field
is on a global grid on a 0.75° by 0.75° degree spatial resolution. Note that the first
longitude value is 0° and the last is 359.25° indicating a Pacific centred global field
(see Figure 4.2.2a).
Useful information displayed in the beginning of the output includes the grid type
lonlat (regular lat/lon grid), the number of longitude (480) and latitude (241) grid
boxes, the first and last longitude (0° to 395.25°) and latitude (90° to -90°) value, the
spatial resolution (0.75° by 0.75°) and the number of timesteps (408). In addition, the
date/time information for each timestep is listed. The final line provides information
The netCDF File Format 83
about the number of variables that were processed (1 named t2m) and the time the
processing took (0.1s).
Note that the date/time information includes details about the hour, min-
utes and seconds (hh:mm:ss). The example file used here contains monthly
fields. Therefore, the details of time from days down to seconds should be
ignored as they are artefacts from the way CDO handles time information.
In contrast to the output of the sinfon operator (short list) the output of the infon
operator (long list) looks somewhat different. For each timestep some information
is provided. The Minimum, Mean and Maximum values are useful to quickly check
whether the range of data makes sense (for instance, after the file was manipulated
with a CDO operator).
Some other useful operators that provide information about the data content in a
netCDF file are listed in Table 5.2.2.1: .
Table 5.2.2.1: Some addition CDO operators that provide useful netCDF file information.
ncview data.nc
The resulting graphical windows will look similar to the ones shown in Figure
5.2.3.1. The main window allows a variable to be selected which will open in a new
window showing, for instance, a map. Clicking a specific location on the map will
open another window showing the time series for that location. Colour maps can be
adjusted.
²http://meteora.ucsd.edu/~pierce/ncview_home_page.html
The netCDF File Format 85
Figure 5.2.3.1: Ncview graphical windows. Image from David Pierce’s webpage at UC San Diego’s
Scripps Institution of Oceanography (http://meteora.ucsd.edu/~pierce/ncview_home_page.html; ac-
cessed 25 Apr, 2020).
While ncview is a useful tool to have a quick look at netCDF data files it is
too limited in its functionality to allow serious data analysis or plotting.
To find out if a netCDF file contains packed or unpacked data values the netCDF
utility tool ncdump can be used (see Section 5.2.1). If a netCDF file is packed then the
output of ncdump -h will show a scale_factor and add_offset attribute listed as part
of the variable attributes similar to the example below.
Most software packages automatically detect whether a netCDF file is packed or not
and convert the data fields accordingly when the file is read in. In this case there is
no need to worry about it. However, sometimes when applying CDO commands to
packed netCDF files the -b <bits> option (see Section xxx) needs to be used whereby
<bits> is either F32 or F64 depending on whether the operating system was built using
32-bit or 64-bit architecture. Without the -b option the CDO command may return an
error message. The output file generated by the CDO command will be an unpacked
netCDF file.
100K+ Python packages have been developed over the years for all kinds of purposes.
Some are well supported and being actively developed while others are not. The
latter tend to not stand the test of time. For the purpose of climate computations
and visualisation only a small number of well-supported Python packages is needed
with each package serving a specific purpose (Table 6.2.1.1). For instance, the NumPy
package allows computations with multi-dimensional number arrays while the
Matplotlib package provides functionality for everything related to plotting data.
Table 6.2.1.1: Some of the Python packages commonly used in climate computing and visualisation.
Package Purpose
Cartopy² Geospatial data processing for
creating maps and other
geospatial data analyses.
CIS Tools³ Analysing, comparing and
visualising earth system data.
IPython⁴ Powerful shell for interactive
computing.
Iris⁵ Powerful, format-agnostic
interface for working with
multi-dimensional earth science
data.
Matplotlib⁶ Cross-platform 2D plotting library
and interactive environments.
MetPy⁷ Reading, visualizing, and
performing calculations with
weather data.
netCDF4⁸ Object-oriented python interface
to the netCDF version 4 library.
NumPy⁹ Powerful scientific computing on
N-dimensional arrays.
Pandas¹⁰ Data analysis and manipulation
tool.
²https://scitools.org.uk/cartopy/docs/latest/
³http://www.cistools.net
⁴https://ipython.org
⁵https://scitools.org.uk/iris/docs/latest
⁶https://matplotlib.org
⁷https://unidata.github.io/MetPy/latest/index.html
⁸https://anaconda.org/anaconda/netcdf4
⁹https://numpy.org/
¹⁰https://pandas.pydata.org