0% found this document useful (0 votes)
263 views165 pages

An Online Infrastructure With Ubuntu 20.04 LTS LAMP

Uploaded by

pchollatee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
263 views165 pages

An Online Infrastructure With Ubuntu 20.04 LTS LAMP

Uploaded by

pchollatee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 165

An Online Infrastructure With Ubuntu 20.

04 LTS LAMP

by Bart Besseling

AN ONLINE INFRASTRUCTURE WITH UBUNTU 20.04 LTS LAMP


Copyright © 2020 by Quarium, Inc.

All rights reserved. Published in the United States of America. No part of this
book may be used or reproduced in any manner whatsoever without written
permission except in the case of brief quotations embodied in critical articles
or reviews.

For information contact [email protected]

Book and Cover design by Bart Besseling

ISBN: 9-798681-890270 First Edition: September 2020

10 9 8 7 6 5 4 3 2 1

To the best partner ever, my dear Atsuko.


Introduction
N ONLINE BUSINESS consists of five components:
A
1. A product or service that fills a market need.
2. A business strategy and an organization that can provide the product or
service to the market.
3. An online service software application that presents and sells the
product or service on the Internet.
4. Client software applications that access specialized functions of the
product or service from customer iPhone, Android, Windows, MacOS,
Linux or IoT devices.
5. A hardware and software infrastructure that supports the online service,
its clients and its organization.

This book addresses the fifth component: the custom infrastructure needed to
support the online service, its clients and its organization. The two other
books in the series address the third and fourth components.
Chapter 1 - Design

Technologies
The Internet
Since the early 1960s, the RAND Corporation had been researching systems
that could survive nuclear war based on the idea of distributed adaptive
message block switching. Independently, in 1966, work began on the
ARPANET to share the then-scarce research computers among
geographically dispersed scientists.

Funding and management for the ARPANET came from the US military for
the first two decades of its operation until the transition in 1990 to the
National Science Foundation (the successor to Office of Scientific Research
and Development which ran the Manhattan Project) and the Internet.

IPV4
Internet Protocol Version 4 (IPV4, RFC 791) was adopted by the ARPANET
in 1983 and is today the most widely used communications protocol in
history. It divides communications into packets between 20 and 216-1 =
65,535 bytes. Each packet consists of a 20 to 60-byte header section and a
data section. The header section contains a 4-byte source address and a 4-byte
destination address. This gives the protocol potentially 232 = 4.3 billion
addresses, divided into public, private and administrative ranges.

On January 31st 2011, all public IPV4 addresses were allocated while only
about 14% were actually in use. To counteract this address exhaustion, more
and more public addresses are being split into smaller blocks and assigned to
network address translating routers (NATs) from behind which large and
small private networks can still reach all other public addresses.
When systems on private networks want to communicate directly with each
other (for example in IP telephony, multi-player games and file sharing
applications) elaborate NAT traversal protocols such as Session Traversal
Utilities for NAT (STUN, RFC 5389) are used.

Ethernet
The TCP/IP protocols are named “Internet” protocols because they operate
end-to-end on top of whatever local data links exist between communicating
systems. The vast majority of such data links now use the Ethernet protocol, a
Carrier Sense Multiple Access with Collision Detection (CDMA/CD)
protocol introduced in 1980. Its design was inspired by the 1971 ALOHANet,
the first publicly demonstrated packet switching network developed at the
University of Hawaii.

Instead of the low-cost radio of ALOHANet, Ethernet first used a coaxial


cable with vampire taps for each node and later much cheaper point-to-point
twisted-pair telephony cables with hubs or switches in the StarLAN
configuration. Today, Ethernet has returned to its ALOHANet roots as Wi-Fi.

Ethernet has a Media Access Control (MAC) address of 48 bits that is large
enough to accommodate a unique address for each manufactured device
interface up to a density of one for each six square feet of the land surface of
the Earth or 30,000 for every human alive today.

IPV6
By 1992 it was clear that IPV4 would not have sufficient public addresses for
future needs and work started on a new protocol. In 1998 Internet Protocol
Version 6 (IPV6) became a draft standard and 25 years after the work started
in 2017 it finally became an Internet Standard (RFC 8200).

IPV6 uses 128-bit addresses, divided into a 64-bit network part and a 64-bit
device part. This is sufficient to automatically generate a public address from
the Ethernet MAC address of each interface. This has all kinds of privacy
concerns since it would then be possible to track each unique stationary or
mobile device and its user. This problem was fixed in the “Secure Neighbor
Discovery” protocol (SEND, RFC 3971, 4861 and 6494) by allowing devices
to choose cryptographically random addresses that are still globally unique.

Public address ranges are allocated as 248 very large blocks of 280 addresses.
Each human currently alive can be allocated 30,000 of such blocks or a total
of 3.96 * 1028 addresses each.

Our Design Choices


Our online service must be accessible on the public Internet. This means we
must request the allocation of a range of at least 8 public “/29” IPV4 and a
static public “/48” IPV6 280 network address block from the Internet service
provider of each link to each of our facilities.

Colocation and cloud hosting providers usually allocate single addresses at a


time from within larger blocks allocated to them. Unfortunately, at the time
of this writing in 2020, even many business Internet service providers (ISPs)
do not assign static IPV6 addresses yet and we may have to rely on IPV4
NAT solutions. If we have at least a small number of the increasingly scarce
public IPV4 addresses we can ignore IPV6 for the moment.

We must also develop a strategy for numbering our private subnets, our
gateways and the well-known internal servers attached to them. IPV4
allocates three private address ranges: 10.0.0.0/8, 172.16.0.0/10 and
192.168.0.0/16. Many home and small office networks use the
192.168.0.0/16 range.

We will think big and use the 10.0.0.0/8 range. We will allocate one address
byte value to up to 256 physical locations. Each location can then have up to
256 local area networks, virtual machine bridge networks and virtual private
networks, each with up to 253 devices, a gateway address and a broadcast
address.

Our our first office will be location 0 and it will use the range of addresses
between 10.0.0.0 and 10.0.255.255. Our first LAN will be subnet 0 and be
10.0.0.0/24. We will use the common convention that on each of our private
networks .0 is not used, .1 is a gateway towards the Internet and .255 is a
broadcast address. Effectively we will have up to 65,536 internal networks
10.x.y.0/24.

We do not have this problem with the IPV6 protocol. Links are always
automatically assigned a link-local address constructed of the “fe80::/64”
prefix and the modified IEEE 64-bit Extended Unique Identifier (modified
EUI-64) version of the interface Ethernet MAC address through stateless
address auto configuration (SLAAC, RFC 4862). A MAC address is turned
into its modified EUI-64 version by inserting “ff:fe” in the middle, for
example a device with a MAC address of “52:54:00:14:79:4a” always has at
least the IPV6 address “fe80::5054:ff:fe14:794a/64”. But unless we use IPV6
for communication on the public Internet, there is no point in using IPV6
locally, other than in preparation for some distant future.

Ubuntu Linux
UNIX
The UNIX operating system was developed in the late 1960s and early 1970s at
the computing research center of Bell Laboratories, a research and scientific
development company founded by Western Electric and AT&T as a
successor to Alexander Graham Bell’s original laboratory.

UNIX was first presented formally to the outside world at the 1973 Symposium
on Operating Systems Principles, where Dennis Ritchie and Ken Thompson
delivered a paper on “The UNIX Timesharing System”. According to them
“Perhaps the most important achievement of UNIX is to demonstrate that a
powerful operating system for interactive use need not be expensive either in
equipment or in human effort: UNIX can run on hardware costing as little as
$40,000, and less than two man years were spent on the main system
software.”

We can argue that adding resources to a project makes it less likely to


succeed: By 2000, the addition of armies of lawyers and so-called standards
engineers to dozens of incompatible versions had all but killed UNIX as a
viable operating system. In the meantime, the hardware cost to run it has been
reduced from $240,000 in today’s dollars to less than $50.

Linux
Due to the high cost of UNIX licenses and the onerous terms of its license
agreements, the GNU project had been working on a free UNIX-like operating
system since 1983. In 1991 Linus Torvalds released the first source code for
his personal version, “Linux”. By 1998, Linux was perceived as a major
threat by Microsoft, the main commercial provider of operating systems. By
2012, the aggregate Linux server market revenue exceeded that of the rest of
the UNIX market.

Today Linux is the single most popular operating system in the world,
running on more than 2 billion devices. Except for what has always been the
weak spot of both UNIX and Linux: personal productivity computers with
graphical user interfaces. Unfortunately, until the rise of smart phones, this
constituted the bulk of the general purpose computer market and due to lack
of a compelling single standard Linux GUI, Windows and MacOS retain firm
control. Even the smart phone market is sharply divided between the Android
and iOS GUI variants. MacOS, iOS, Android and Raspberry Pi IoT in the
meantime all have Linux as their core operating system.

Ubuntu
In 2020, the Linux market is more fragmented than the UNIX market ever was,
with an uncountable number of different distributions “distros”. Most are
related to one of the major branches Debian, RedHat and Slackware.

Ubuntu is a major free sub-branch of Debian maintained by Canonical Ltd,


which also provides commercial support services for it. Although the
statistics are unreliable, Debian is reportedly used in two-thirds of all web
server installations and has 40 million users with Ubuntu accounting for half
of those.

Alternatives are Fedora, a free sub-branch of the commercial RedHat branch


maintained by RedHat Inc. with about 1.2 million users (including reportedly
Linus Torvalds himself) and various free distros maintained by more or less
active volunteers.

Our Design Choices


One of our reasons for using Ubuntu is its support for a server version that
does not require a GUI for installation and administration (like the original
UNIX). This is very useful in installing rack-hosted and cloud servers. Another
reason is the very professional but free update and version management by
Canonical. Even-numbered production versions are “Long Term Support”
(LTS) versions with a predictable update time line.

This book is based on the most recent version, Ubuntu 20.04 LTS (released in
the 4th month of 2020) which offers significant improvements over the
previous LTS versions 16.04 and 18.04 and which will be supported until
April 2025. We have been using Ubuntu as our main deployment platform
since version 14.04 LTS and have always been able to upgrade to newer
versions with minimal changes.

Tools
The largest expense in the vast majority of software projects is people. If you
are doing business, you should use the tools and components that have the
largest commercial user base. You should then hire the people that have the
most experience in delivering products using those tools and components, so
they can get the job done the fastest and with the highest quality.

This chapter is about the tools we have in our own software development
toolkit. There are many alternatives but over the years these are the ones that
we have used most.

Development Techniques
Tools are useless without a proper understanding of how they are applied to a
problem. Knowledge itself is always our first and most important tool.

Computer Science and Engineering


In computer science, there are two schools of thought. One came out of
mathematics and the other out of electrical engineering.

Mathematicians prefer everything to be formal and correct, and if at all


possible, static. An equation is an equation, a diagram is a diagram. To them,
quality arises from elegance. Electrical engineers got their start in lighting
houses and in saving the lives of people on sinking ships. They build
something that will only have to last until they build something better. To
them, quality arises from getting things done. Both sides need each other and
need to learn from each-other and that is why today we actually have an
iPhone and an Android phone and next year we’ll have all of our personal
computing power in our pocket.

This division also shows in our programming languages: There are


mathematically elegant languages like Lisp, Smalltalk, Prolog and Scala that
are influential in driving the quality of the science. There are practical
engineering languages like C/C++, HTML and PHP that show the scientists
what is needed in the real world. And there are teaching languages like Basic,
Pascal and Java that show underlying concepts without distractions, but that
are generally not intended for real-time production applications. And just as
influence from Enzo Ferrari’s design elegance makes Henry Ford’s practical
vehicles nicer to look at and nicer to drive, so do the mathematical and the
teaching languages make the practical languages easier to use and therefore
more productive.

And, since we are here to sell information to people so they can improve their
lives and so that we can make money for ourselves, it is time to buy a Ford F-
150 pickup truck, the most popular motor vehicle of all time and cry about
that decision all the way to the bank. In the online commerce world, that is
the venerable open-source “LAMP” (Linux, Apache, MySQL and PHP)
system that we have been installing since the 1990s (except “P” stood for
“Perl” back then).
But to do this effectively, we need a proper education in the backgrounds of
our profession.

No programmer should claim to be a true professional unless they have read


the Classics, which in our field are (in no particular order and definitely
incompletely) Edsger Dijkstra, Harlan Mills, D.L. Parnas, John Backus, Peter
Nauer, Niklaus Wirth, Per Brinch Hansen, Fred Brooks, Andrew Tanenbaum,
Donald Knuth, C.A.R. Hoare, C. Böhm and G. Jacopini, F.T. Baker, Brian
Kernighan, Dennis Ritchie, Ken Thomson and Bill Plauger, Alfred V. Aho
and Jeffrey D. Ullman.

And no programmer should claim to be a true professional unless they know


what goes on inside the chips they use, where their objects go once they stop
using them, where their own income comes from and perhaps most important
of all, what the roses smell like today.

Top-Down Structured Design


Top-down structured design is the most powerful and generally applicable
technique to come out of computer science.

Dijkstra suggested, and Böhm and Jacopini proved that all procedural
problems could be solved by simple nested “sequences”, “decisions” and
“loops” in “structured code” instead of “spaghetti code”.

Even more generally, we can take literally any problem in any field and
divide it into a tree of sub-problem sequences, decisions and loops. We repeat
this process until the solution to each leaf problem is trivial. Then we
assemble the solution to the original problem from the leaves back to the
trunk, taking care to prove that each sub-assembly is working properly and
that it is only attached to the rest of the tree at one point (its interface). We
can then work on each sub-problem and sub-assembly in isolation, possibly
even in parallel in a development team, easily within our “Human Conceptual
Limits” [George Miller].

Lean and Agile Development


There is no point in building something that is not going to be used. Every
project should start with a “minimum viable product” (MVP) that can be
tested against an actual problem in an actual market as quickly as possible.

We should build this initial version and all subsequent improvements in short
development “sprints”, carefully limiting ourselves to things that can be built
and tested and integrated within the time of one sprint. We should use formal
project management tools, such as Kanban, to document everything that
needs to be done and its current state. This is nothing but top-down structured
design, formally applied to a development process.

In real-estate, the three most important aspects of a property are “location”,


“location” and “location”. In software development, as in most engineering,
the three most important aspects are “process”, “process” and “process”.
Breaking process is a waste of time and waste of time is a form of suicide.

Unit and Functional Testing


We should go even further than Lean and Agile Top-Down Structured Design
during actual development: At all times, our code should be able to run. At
all times, we should be able to demonstrate that all completed sub-assemblies
are still working properly.

We do this by constructing unit and functional tests for each sub-assembly.


Whenever new code is added to a project, all tests are run again and only if
all tests pass, all sub-assemblies are trusted again. It is even better if the tests
are run automatically by a “continuous integration build server”.

If anybody “breaks the build”, that person does not eat or sleep until the build
is working again.

Software Development Tools


Originally, we developed software on large machines that punched holes in
stiff paper cards. (“Do not fold, spindle or mutilate.”) Later, we used the
venerable Teletype model 33 to enter programs into a remote IBM, Philips,
HP or PDP time-share mainframe or minicomputer that we never got to see.
We could even “save” the program on a paper tape produced by the terminal
and later read it back in. We used a “job control language” consisting of
three-letter commands. (One of the first “hacks” was to tell fellow students to
type “DELIVER” to submit their homework.)

When we got the first “glass tty”, we could “edit” (“ed”) the consecutive lines
of our code without wasting miles of paper and ribbons of ink. Some
programmers worked better if they kept their entire program organized in
their head, like some chess players do with their games. Others limited each
routine to a length they could see in its entirety on their 24-line VT100 smart
terminal.

Software development productivity improved dramatically when we got the


first real programs to “visually inspect” (“vi”) our entire code by scrolling a
file up and down.

After that, the “what you see is what you get” (WYSIWYG) and “graphical
user interface” (GUI) fashions never really caught on in programming, but
syntax highlighting, auto-completion and real-time syntax checking have
again dramatically improved programmer productivity.

There are a number of computer science papers from the 1950s onwards that
demonstrate that visual pattern recognition and compile-time checks have the
largest impact on software quality. These lessons have clearly been ignored
by the creators of late-binding languages (Java, Objective-C) and languages
where white space is a critical semantic element (Python, YAML).

All of our current “integrated development environments” (IDE) are


remarkably similar: We have a display of a file or component tree in one
window, a display of the text of one or more files in an editor window and the
output of various compilation and debugging tools in another window. It is
annoying but not difficult to start using a new IDE, just as driving an
unfamiliar rental car after a long airline flight is annoying but not difficult.

Microsoft Visual Studio


Most likely because of the vast installed base of the Windows operating
system and the rapid evolution of the open PC platform, we have always
developed most of our software using Microsoft Visual Studio. Like all
Microsoft products, this is the result of a long history of backwards-
compatible incremental evolution. We can still edit, compile, debug and run
“C” software we originally published for 16-bit Windows 3.1 on our latest
64-bit Windows 10 systems!

Visual Studio supports most programming languages, for example PHP using
a “PHP Tools for Visual Studio” plug-in published by “DevSense”, and most
CSS, HTML, JavaScript and SQL variations. The debugger and the
performance tools are some of the best available and we frequently take a
peek at some binary file using the binary file editor. We can perform all git
version control functions without switching to another environment.

The bulk of our code is always independent of our platforms, servers,


desktops, game consoles, browsers or phones. Aside from minor operating
system and browser tweaks, we can develop and debug all of our software in
Visual Studio on a single laptop, offline, including the entire server and client
code described in the books in this series.

Another excellent counterpart of Visual Studio was the “Microsoft Developer


Network” (MSDN) subscription which literally included a copy of every
software product and every document that Microsoft published. Before
google.com and stackoverflow.com this was our main go-to resource.

Apple XCode
Apple is not nearly as good as Microsoft in providing consistency and
general usability to its developers. Periodically we have to rewrite all of our
Apple software in a completely new language (First Basic, then Pascal, then
HyperCard and MPW Script, then C and C++, then Objective-C and now
Swift, but minus OpenGL soon). Then we have to periodically rewrite all of
our software for different processors and operating systems (68000,
PowerPC, Intel and now ARM, through Next and now Linux with proprietary
drivers and a proprietary GUI that changes often). In many cases it is even
literally impossible to provide customers with a minor update to software that
was developed only recently. The expensive Apple hardware also has a nasty
habit of obsoleting itself: With every Apple operating system update a
generation of Apple hardware turns into e-waste.

And finally, long after Microsoft has given up on its “Metro” desktop UI,
every Apple desktop will now be made to look like a mobile device. With
Jobs and Ive gone, we can only wait in terror to see what that will look like.
But, as an Apple support person once literally told us, “we should not want to
know that”.

Every new update of XCode is always a surprise. Apple also maintains a QA


stranglehold on its iPhone App Store (for good reasons, which we fully
support – see the seminal book “Game Over” [David Sheff] for reasons why),
but in doing so, they continually change the ways in which software is
prepared, even just for debugging.

For the most part, after the initial delight of an update when our complex
build-and-sign process has to be changed completely overnight, again,
XCode is very similar to Visual Studio. It fully supports git and all of the
languages currently approved by Apple for use on Apple systems.

Android Developer Studio


Originally we developed Android code on a version of Eclipse, a Java-based
open source IDE which has since fallen into disrepair.

Currently we use Android Developer Studio, a custom version of the Java-


based IntelliJ IDEA IDE which works identically on Windows, MacOS and
Linux. Like any other IDE these days, it fully caters to any display and
command idiosyncrasies its users may have and we can operate git without
leaving the environment.

Our only peeve is that there are so many updates of its different components
and libraries that the first development of the day typically requires some
patience. These changes also frequently break the way for example a package
with less frequent updates like Cordova generates Android projects especially
for “Gradle”, which, to its credit, the IDE then knows how to re-factor.
Download and install Android Development Studio from the
“developer.android.com” web site. Make sure to use a “bundle” version and
not a “stand-alone IDE” version, even if you must use an older version. Any
older version will update itself anyway. The default settings should be
sufficient for our purpose. Install all suggested updates.

MySQL Workbench
The purchase of the free open-source database MySQL by Sun Microsystems
which was then itself purchased by Oracle was not a smooth transition.

As a result, we now have a second open source database named after


Widenius's other daughter (the two daughters are named My and Maria).

Fortunately, the situation now appears to have stabilized and Oracle publishes
the excellent MySQL Workbench as free software.

This is essentially an IDE for the MySQL database, which can even migrate
basic schemas to and from Oracle databases. The package supports modern
“secure shell tunneling” so we can operate on database servers that are hidden
behind a firewall with SSH access.

Photoshop
Photoshop has long been our graphics IDE of choice. Unfortunately, its
publisher Adobe has jumped on the subscription bandwagon and now you
can only rent a copy if you keep paying for it over and over and over again,
even during times when you do not use it.

For software where frequent updates are essential to its operation, for
example a virus checker or a tax preparation program, this business model
makes sense. For a mature piece of software that does not evolve a lot, this is
not a business model that aligns with the needs of the customer.

Photoshop used to be a standard part of each of our development


workstations. Today we only use it on an artist’s workstation when a
customer pays for us to use it creatively. We now use GIMP and
ImageMagick for our routine day-to-day graphics work.

Version Control Tools


Version control is not new: In the 1970s PDP-11 “RSTS” operating system,
each successive copy of a file already had a revision number. Another early
example was “Microsoft Visual SourceSafe”. Over the years, dozens of these
systems have come and gone. We have worked on projects for large
corporations where we used four different version control systems at the
same time. One of our favorites was the commercial package “Perforce”
which was very fast.

Git
Git was developed to manage the distributed development of Linux and it is
currently the most popular version control system with a reported market
share above 90%.

A major advantage of git is that it can be used in a network of loosely


connected cooperating developers. Older version control systems required a
network connection to a central server for each operation.

A typical use case of git consists of a project directory with a “.git” sub-
directory. The sub-directory is the “repository” or “repo” and the project
directory is the “working tree”. On version control servers, projects are
typically stored in “bare” repositories (without a working tree) in a
“someproject.git” directory.

Users can “fetch”, “pull” and “push” updates between cooperating


repositories using secure connections and they can create “branches” for
different concurrent states of a project, for example “master”, “staging” and
“live” or “v1” and “v2”. Each repository maintains an audit trail of all
changes, typically with a comment that relates it to a design document or a
bug report.

Different servers can automatically pull updates and, for example, perform
continuous integration builds of applications or automatically deploy
development, staging or live web sites.

All of our projects always begin with the creation of a new empty bare
repository on our version control server, which is then “cloned” to each of
our development workstations, continuous integration build servers and
development, staging and publication web servers.

One of our strict process rules is that no file may ever be deployed to any
server or published to any customer unless it comes out of a git repository.

TortoiseGit
TortoiseGit is a free open-source GUI wrapper for the git version control
system for Windows. It tightly integrates with the Windows File Explorer
and allows users to control directory and file versions using right-click menus
directly in File Explorer. This makes its use highly intuitive and eliminates
any excuse not to use git.

Atlassian SourceTree
Atlassian SourceTree is a free client for git, available for MacOS and
Windows. It consists of a full-featured graphical user interface. This makes it
easier to show people who are not familiar with version control what the
structure and development history of the project is.

Another major advantage is that development operations are identical on


MacOS and on Windows, which reduces the learning curve and reduces
errors.

Project Management Tools


Just as with version control tools, there is no shortage of project management
tools. The main things that we look for is how easy it is to keep track of tasks
and bugs. If a developer needs to start up an application or open a web site
for everything they do, in addition to the version control operations, and if
this takes any time at all, it tends not to get done.

Over the years we have used many different systems. The ones that stand out
are “Bugzilla”, originally published as an open source software web
application by Netscape and “DevTrack” by “TechExcel” as an excellent and
fast client-server application.

Atlassian Jira
Atlassian, the publisher of the “SourceTree” git GUI client, also publishes
“Jira” a web-based project planning application. It can operate in several
popular modes including as a “ticket” tracker and as a “Kanban” planning
board. While it is not as fast and lightweight as “DevTrack”, it is remarkable
useable. It is properly integrated with git and email. It has an import facility
which allows new projects of a common structure to be set up from a
spreadsheet template very quickly.

The development of all software in these books was organized using Jira. The
infrastructure book in this series describes in detail how you can set up a
system to host Jira for yourself.

MediaWiki
Not too long ago, every engineer worth his salary (paid in salt, in Roman
times) kept engineering notebooks. These were invaluable, not only for
remembering how you did something a few weeks ago, but also how to back-
track a dead-end in a solution to a new problem. The problem of course is
that you cannot edit or search or share a paper notebook very well.

Which is why today we use the robust MediaWiki software. This is the same
PHP software that powers Wikipedia. It is free and it is easy to set up on a
LAMP server. There is a set of useful plug-ins that do everything from giving
the wiki a more corporately branded look, to providing a WYSIWYG editor,
to generating a table-of-contents tree. The books in this series were all written
originally as engineering notebooks using MediaWiki.
The infrastructure book in this series describes in detail how you can set up a
MediaWiki system for yourself.

Components
We will design our infrastructure to satisfy the following requirements:

The infrastructure must be highly reliable.


It must be highly scalable both geographically and in performance.
It must appear to users as a single seamless service.
It must consist of many common cheap components.
It must be easy to deploy, operate and upgrade.
It must have a large pool of available technical talent.

At the top level, the infrastructure consists of the following components:

Initially, we will leanly develop our infrastructure using virtual machines


hosted on physical machines in our main office. Over time, we will expand
our capacity and geographic reach with virtual machines hosted on physical
machines in colocation facilities and cloud hosts.

Our customers will be using mobile and desktop machines with browsers and
application-specific software. We will develop and serve the first phases of
the service from the first corporate offices:

In some parts of the world, where cloud services are commercially not
available or are politically not permitted to do business, we will co-locate a
similar set of physical servers:

Where cloud services are available, we can scale up by trading-off


depreciating hardware costs for recurring service fees:
Our virtual servers will consist of two classes:

1. Individual servers for central services like DNS, DHCP, LDAP, Email,
Wiki, a project management database and services monitoring, and
2. Groups of virtual machines operating as application clusters:

The physical machines will act as hosts for the virtual machines, providing
them with processing, memory and mass storage and network interfaces. As
far as the virtual machines are aware, they are all connected to a local area
network behind a firewall and they receive requests forwarded to them by the
physical machine:
Chapter 2 – Implementation

Machine Types
Physical Servers
For physical servers we face the two eternal dilemmas: how many of what
kind do we need and do we make them or buy them?

The quantity versus performance question is easy: Buy more units of the
cheapest hardware that will do the job. This reduces the cost of failure and
improves scalability. Hosting of Linux virtual machines on large mainframes
like the IBM Z only makes business sense for certain load characteristics.
The make or buy decision is mostly a question of time-to-market and cost. If
we look at some typical modern components:

2U Rack mount Server Chassis - $70.


Micro ATX LGA-1151 64GB Server Motherboard - $250.
Dual Gigabit Ethernet included on Motherboard - $0.
Intel Xeon E-2124 Coffee Lake 3.3 GHz Processor - $220.
64GB Unbuffered ECC UDIMM DDR4 2400 MHz - $350.
4TB SATA 6.0Gb/s SSD Drive - $250.

This configuration costs around $1,500 plus the labor costs of buying the
parts, putting them together and testing the system. The configuration will be
unique and difficult to reproduce but at least you can follow the latest trends
in hardware.
For less than twice as much we can quickly get a similar pre-assembled 1U or
2U server from a reputable system manufacturer delivered almost anywhere
in the world from a production series that will likely last several more years.
The price difference is equivalent to about 10 installation and test hours of an
in-house build, which is quite competitive. There are still many territories in
the world that are under-served by the large cloud hosting companies and it is
good to have a physical server selected that can go into a remote colocation
facility at short notice.

Still, for a lean but scalable startup it is best to start with two in-house-build
servers for the first office and then, when you have more money and less
time, scale up with cloud servers or pre-assembled servers.

Virtual Machines
We could start out with just cloud-hosted virtual machines and this seems
like the cheapest option until we consider the needs of the corporate offices:
We will at least need some local firewall, some local storage and some
redundancy on both. This can quickly add up to the cost of two of the
physical servers described above. So we will develop and host our initial
virtual machines on physical machines in our office and then scale up by
cloning fully configured and tested virtual machines to a cloud hosting
service or run them on physical machines in some remote colocation facility.

For all corporate and application functions, virtual machines are the way to
go except for the hosting of the virtual machines themselves.
Application Containers
Application containers such as Docker rely on application isolation
mechanisms in the host operating system to provide large numbers of light
virtual machines that do not need a lot of administration themselves. This
technology is useful in the higher-scale smaller-configuration stages of a
mid-size to large company.

There are now also container orchestration tools like Kubernetes that can
manage very large installations. We are not discussing Kubernetes in this
version of this book. We are focusing on the installation and configuration of
the individual services used by the infrastructure. Until we have proven that
the product sells there is no need to actually scale it up. It is sufficient to
know the implementation is flexible enough to scale when it has to.

Actual virtual machines have the advantage over containers that they can be
hosted without change or limitation on any type of physical system for
example on Windows or MacOS development systems or large Windows
hosts. They use more storage than containers but at $20 per Terabyte retail
storage is a lot cheaper than development and maintenance time.

Cloud Hosting
Cloud hosting of virtual machines is an excellent way for a small company to
scale up to mid-size.

After the customer base and the profitability of a company grow to a certain
level, it has a choice between vetting and trusting its own employees and
facilities or letting the faceless hordes of the hosting company (some of who
actually work for the NSA and others who work for China) hold on to its
data. That the data in such facilities is almost always encrypted is only a
protection from people that do not have physical access to the underlying
internal system communications. Cloud hosting makes the most sense for
large quantities of static data, for example video or game data files.
Installation Types
We are going to need several different types of installations of Ubuntu for
different purposes.

Fortunately, the procedure for the basic server installation of Ubuntu is the
same for the first three configurations and we can prototype a first installation
or an upgrade very easily on for example a Windows development system
with a virtual machine host such as the free Oracle VirtualBox software.
After each installation step we clone the virtual disk image so we can quickly
recover from any installation mistakes and we base the more complicated
installations on the simpler ones.

Basic Virtual Machine


We need a basic virtual machine configuration for non-application functions
such as name servers, email servers etc. We’ll create one image of this
installation which we will clone as needed. We will initially allocate 1
processor, 2048MB RAM and 100GB virtual disk storage and 1 NIC. This
storage will not all be allocated immediately. The virtual disk image file on
the host will gradually grow.

Application Virtual Machine


We need a virtual machine configuration for web and database application
services. We’ll create one image of this installation which we will clone as
needed. We could split this configuration into two separate ones, but this
would increase our development, deployment and maintenance complexity.
We will initially allocate 1 processor, 4096MB RAM and 100GB virtual disk
storage and 1 NIC.

Hosting Physical Machine


We need a basic physical machine installation whose only functions are to
protect and serve a set of virtual machines. We need to do this installation
separately on each physical machine. The ease with which we can clone and
remotely install and update virtual machines without having to physically
reinstall them is a major reason for using them for everything else.

Management Physical Machine


We need a basic physical machine installation with an installed GUI for
development and management purposes.

Specific Installations
We are going to construct our world-wide business infrastructure using the
following specific installations:

Location Infrastructure
Each business office and each colocation facility has redundant or high-
availability connections to the Internet. The main office and all colocation
facilities have static IPV4 addresses.

The business needs at least one domain name, for example “quarium.com”
and a wild card security certificate for each domain, for example
“*.quarium.com”. The certificate will be based on one corporate master
private key.

Location Physical Servers


Each business office and each colocation facility has at least two main
physical machines acting as Internet firewalls and as hosts for a set of virtual
machines. These machines are configured with at least 64GB RAM, 4TB
SSD and Ubuntu 20.04. The main office uses the desktop configuration (the
management physical machine installation above). All others offices and co-
locations use the server configuration (the hosting physical machine
installation above).

Both machines have NICs (network interface cards, although these days the
hardware is actually integrated on motherboards) for Internet and local
network connections configured with dual bridged interfaces and
masquerading forwarding firewall settings.

Both machines allow remote access via SSH. Both machines provide
virtualization via KVM. Both machines run a satellite configuration of
Postfix to forward administrative email. No other software is run on these
machines directly and no data other than virtual machine disk images is
stored on these machines directly.

Location Name Servers


Each business office runs two virtual machines as name servers. Each virtual
machine is hosted on a separate physical machine. Both machines are
configured with Ubuntu 20.04 in server configuration (clones of the basic
virtual machine installation above). Both machines each have a single
network interface configured for a static internal IPV4 address and firewall
settings. Both machines allow remote access via SSH. One machine runs a
master DNS server and the other runs a slave DNS server. Both machines
receive external DNS requests forwarded by their physical host machines.
Both machines run DHCP servers in a fail-over configuration. One machine
runs a master LDAP server and the other runs a slave LDAP server. Both
machines run a satellite configuration of Postfix to forward administrative
email. No other software is run on these machines.

Location Email Server


Each business office runs one virtual machine as email server. The machine
is configured with Ubuntu 20.04 in server configuration (a clone of the basic
virtual machine installation above). The machine has a single network
interface configured for a static internal IPV4 address and firewall settings.
The machine allows remote access via SSH. The machine allows login by all
corporate LDAP users. The machine runs a Postfix email server. The
machine receives external public SMTP, secure SMTP and secure POP3
requests forwarded by its physical host machine.

Location Application Clusters


The business runs as many application clusters of four virtual machines as the
geographical distribution and load of the customer base requires. Initially, the
main office runs one production cluster, one staging cluster and at least one
development cluster. Clusters consist of two front-end machines and two
back-end machines. One front-end and one back-end are run together on one
physical host machine. All four machines are configured with Ubuntu 20.04
in server configuration (clones of the application virtual machine installation
above). All four machines each have a single network interface configured
for static internal IPV4 addresses in fail-over configurations and firewall
settings. All four machines allow remote access via SSH and OpenVPN. The
front-end machines run the Apache HTTP server with PHP and the business
server application. The front-end machines receive external HTTP and secure
HTTP requests forwarded by their physical host machines. The back-end
machines run the MySQL database server in multi-master replication mode.
All machines run a satellite configuration of Postfix to forward administrative
email. No other software is run on these machines.

Management Servers
The business also runs a number of the application virtual machine
installations to host Jira project databases, MediaWiki documentation
databases and Icinga service monitoring applications.
Chapter 3 – The Operating System
In this chapter we describe the installation and configuration of the operating
systems for each of the installation types. It is only necessary to do an
installation once for each virtual machine installation type.

Each physical machine must be installed separately, which is why we will


install them once and then do all of our other work on virtual machines which
can be cloned instead of having to be reinstalled.

Ubuntu Server
Installation
This process is used to prepare the basic, application and hosting installation
types above.

Download the most recent server installation image from “ubuntu.com”, for
example “ubuntu-20.04-live-server-amd64.iso”. For installation on physical
machines this image must be copied to a USB storage device or to an optical
disk. Ubuntu recommends the free application “Rufus” for creating bootable
USB sticks on Windows. On the Desktop version of Ubuntu you can use an
application called “startup disk creator”. For installation on virtual machines
the downloaded image file can be used directly.

On a physical machine, connect a display, a mouse and a keyboard, at least


temporarily or through a keyboard, video and mouse (KVM) switch. Make
sure “Virtualization Technology” is enabled in the BIOS and boot the
machine from the USB storage device or the optical disk.

On a new virtual machine created on a host system with a GUI (for example
desktop Ubuntu or Windows with Virtual Box), mount the image file on the
virtual optical drive and boot the machine. The first few installer questions
are about the user interface:

Select the default “English” as the installation language. Choose to update the
installer that was included with the distribution ISO.

Select the default “English (US)” keyboard layout and variant.

Select a network interface that can be used for updates (usually “eth0”).

Select a network proxy if necessary. Select an update mirror server.

Select the default “Use an Entire Disk” partitioning. Set up this disk as an
LVM group.

Select the default disk to install to, review the suggested partitioning as a
single mount point “/” and confirm it.

Then we create a first (probably administrative and only) user account: Enter
a full name for a user for example “Quarium Administrator”, a server base
name “ubuntu20base”, a user name “quarium”, a password and a password
confirmation.

Choose to install the OpenSSH server but do not install an SSH identity at
this point.

Do not add any software packages at this point.

This is all of the configuration we are able to do at this point: Observe the
installation progress. This is relatively quick. Then observe the online update.
This may take considerably more time.

Reboot the machine. When the installer complains, remove the installation
medium or to disconnect the installation file from the virtual machine.

Use the administrator account to log in.

For a virtual machine installation it may be necessary to shut down the


machine after installation and “virt-sparsify” the “.qcow2” disk image. See
the chapter on virtualization below.
Ubuntu GUI Desktop
Installation
This process is only used to prepare for the management installation type
above.

Download the most recent desktop installation image from “ubuntu.com”, for
example “ubuntu-20.04-desktop-amd64.iso”. For installation on physical
machines this image must be copied to a USB storage device or to an optical
disk. Ubuntu recommends the free application “Rufus” for creating bootable
USB sticks on Windows. On the Desktop version of Ubuntu you can use an
application called “startup disk creator”. For installation on virtual machines
the downloaded image file can be used directly.

On a physical machine with a display, a mouse and a keyboard, make sure


“Virtualization Technology” is enabled in the BIOS and boot the machine
from the USB storage device or from the optical disk.

On a new virtual machine created on a host system with a GUI (for example
desktop Ubuntu or Windows with Virtual Box), mount the image file on the
virtual optical drive and boot the machine. The GUI requires more RAM, so
initially allocate 1 processor, 4096GB RAM and 100GB disk storage. This
installation will also use more of the allocated disk storage. An initial
installation will use about 8.6GB.

Use the default language “English” and select “Install Ubuntu”.

The first few installer questions are about the user interface:

Select the default “English (US)” keyboard layout and variant.

Select a “Normal installation” for workstations or “Minimal Installation” for


servers, “Download updates while installing Ubuntu” but do not install any
3rd-party software at this point.
Then we configure the storage interfaces(s): Select the default “Erase disk
and install Ubuntu” option. Enable “Use LVM” so that the mass storage
partitions can be more easily re-sized later. Continue with the installation.

Then we create a first (probably administrative and only) user: Set the
appropriate time zone. Typically desktop systems operate in the local time
zone of the user and in this installation that also affects language selection.
Enter a full name for a user for example “Quarium Administrator”, a fully
qualified server name “ubuntu20desktop.quarium.com”, a user name
“quarium”, a password and a password confirmation. In Ubuntu 20.04, DO
NOT select “require my password to log in”. When this version was released,
the installer does not install or configure the login screen correctly. Install
with automatic login and then change this setting after installation in the
“Users” settings.

This is all of the configuration we are able to do at this point. Any network
interfaces will be configured automatically by NetworkManager. Observe the
installation progress. This is relatively quick.

Restart the system. It may be necessary to remove the installation medium or


to disconnect the installation file from the virtual machine.

Use the administrator account to log in. If after login the screen appears
garbled, try booting the system with the display disconnected and then log in
to the lower resolution default screen.

In the settings interface selected from the top-right corner of the screen,
disable power saving.

In the settings “About” tab, select “Check for updates”.

To start a terminal window application, click on the “Show Applications”


button in the bottom-left corner and click on the “Terminal” icon.

Common Configurations
Before configuring an Ubuntu system for any particular services, we will
make some changes to default settings that improve the manageability and
operations of servers. We do this same work on all installation types.

A very important point is to wherever possible not modify original


distribution files. The modifications will either get overwritten by distribution
updates, or our servers will retain problems fixed by those updates. Most
packages these days have one or more “conf.d” directories in which local
configuration files can be added without fear of being overwritten, for
example as a file “quarium.conf”. On the other hand there are still a number
of packages that do not implement this convention.

Common File Structures


Most system packages store configuration files in a directory
“/etc/<package>”.

Configuration files (for example with daemon startup options) are located in
“/etc/default”.

Application files can be located in “/var/lib/<package>”,


“/var/cache/<package>” and “/var/spool/<package>” while log files are
located in “/var/log/<package>”, or minor variations thereof.

Service files are located in (or symbolically linked to) “/etc/systemd/system”,


for example:
[Unit]
Description=Quarium Game Service
Documentation=http://www.quarium.com
[Service]
Type=forking
ExecStart=/usr/sbin/qserver
PIDFile=/var/run/qserver.pid
StandardOutput=null
Restart=on-failure
[Install]
WantedBy=multi-user.target
Common Service Operations
Ubuntu 20.04 uses “systemd” to control services instead of the older UNIX
“init” application and “inittab” files or the more recent “service” commands.
Services can be controlled using the “systemctl” command. Some common
commands are:
systemctl enable <name>.service # permanently enable a service to start on boot
systemctl list-units # list all active services and other units
systemctl list-unit-files # list all known services and other units
systemctl start <name>.service # start a service
systemctl status <name>.service # display the status of a service
systemctl restart <name>.service # stop and then start a service
systemctl reload <name>.service # tell a service to reload its configuration
systemctl stop <name>.service # stop a service
systemctl disable <name>.service # permanently disable a service to not start on
boot
systemctl daemon-reload # reload the systemd daemon after config changes

The system log can be read using the “journalctl” command. Some common
operations are:
journalctl –follow # display log file entries as they occur
journalctl --since "2015-01-10" --until "2015-01-11 03:00"
journalctl --since yesterday
journalctl --unit=<name>.service # display all log entries for a service
journalctl --list-boots # list all system boot times
journalctl -b <name>.service # display log entries for a service since boot

Other commands in this same group are “networkctl”, “timedatectl”,


“hostnamectl” and “machinectl”.

Sudo Settings
On all Linux systems it is common to prohibit direct login as the super user
(“root”). Instead, we add certain user accounts to a group “sudo” in
“/etc/group”. If they execute a “sudo” command, for example “sudo bash” or
“sudo poweroff”, the system in its default configuration asks for the user
password and then executes the command with super user permissions. On
our servers we only have one administrative user account and we only use it
to get super user access to commands and files. We do not use passwords for
remote login. If the administrative user has a valid access key but no
password, “sudo” may be configured not to ask for it. This can be done by
using the “visudo” command to change a line in a configuration file to:
%sudo ALL=(ALL:ALL) NOPASSWD:ALL

Now all “sudo” commands will be executed immediately. Most of the


procedures in this book must be performed with super user privileges unless
otherwise noted. After login, first use “sudo bash” before trying the example
commands.

Software Updates
Before we do anything else to a new or cloned Ubuntu installation we must
obtain the latest updates using these commands, which we could save in a file
“~/Scripts/update_system.sh”:
#!/bin/bash
set –x
apt update
apt full-upgrade --assume-yes
apt autoremove --assume-yes
apt clean --assume-yes
update-grub

Run the script and reboot the system. If the upgrade process is interrupted it
may be necessary to manually correct the package data using:
dpkg --configure -a

Then we must tell the system to install updates automatically. The package
“unattended-upgrades” is installed by default. Review the file
“/etc/apt/apt.conf.d/50unattended-upgrades” and un-comment the following:
"${distro_id}:${distro_codename}-updates";
...
Unattended-Upgrade::Mail "root";
...
Unattended-Upgrade::MailReport "only-on-error";
...
Unattended-Upgrade::Automatic-Reboot "true";
...
Unattended-Upgrade::Automatic-Reboot-Time "02:00"; // NOTE: This is GMT0! Set
this to 2AM local time, for example 10:00
To enable automatic updates, edit the file “/etc/apt/apt.conf.d/10periodic” and
set the appropriate apt configuration options:
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Download-Upgradeable-Packages "1";
APT::Periodic::AutocleanInterval "7";
APT::Periodic::Unattended-Upgrade "1";

It is not recommended, but you could upgrade to a new Ubuntu main release
using:
do-release-upgrade

System Name
When we create a single virtual machine disk image, we need to change the
system name on each copy we put into service. Recent versions of Ubuntu
automatically install and activate the package “cloud-init”. This is supposed
to make configurations of new virtual machines easier. Unfortunately this
package is still documented poorly and any scripts written for it must be
adapted to the latest software updates before they are run for the first time
(alternatively they must be re-tested almost daily). This is an excellent way to
introduce all kinds of untraceable automated configuration changes at the
critical time of new installation. We won’t be using this mechanism in this
book and we must prevent “cloud-init” from changing our carefully
configured host name on every reboot by making the following change:
echo "preserve_hostname: true" >/etc/cloud/cloud.cfg.d/99_hostname.cfg

Officially, the fully-qualified domain name (FQDN) of the machine must be


set with the command:
hostnamectl set-hostname <fqdn>

In addition, we should add the system name in the “/etc/hosts” file:


127.0.1.1 <fqdn> <name>

Unfortunately, the system name is hard-coded in a number of different


configuration files and it is tedious to have to remember and change all of
them. It is easier to use the following script (with possibly some changes to
the file names for a particular situation), for example
“~/Scripts/configure_hostname.sh”:
#!/bin/bash
OLDBASE=`hostname --short`
NEWBASE=$1
sed -i -- "s/${OLDBASE}/${NEWBASE}/g" \
/etc/hostname \
/etc/hosts \
/etc/mailname \
/etc/postfix/main.cf \
/etc/apache2/sites-av*/* \
/etc/awstats/awstats.conf.local

System Clock and Time Zone Settings


We should think big and assume that our infrastructure will grow into a
world-wide network of servers.

When servers must exchange files and databases, we must make sure that we
do not need to translate any dates and times between different time zones, or
that we might copy files back in time: If a file created in London on June 2nd
2018 at 6AM local time is immediately copied to a server in Los Angeles, it
will arrive there the day before at 9PM local time. If both servers were set to
local time its creation timestamp would indicate it came from the future!
Instead we will configure all our servers to operate at Coordinated Universal
Time (UTC), which prior to 1972 was called Greenwich Mean Time (GMT).
This way all servers will agree which events occurred before or after which
other events. Whenever a date or time must be presented to a user, the display
application will convert it to local time on their client display.

In Ubuntu 20.04, the system clock and the hardware clock are managed by
the “systemd-timesyncd.service”. We can check the current settings with:
timedatectl

If the timezone is set incorrectly, we can obtain a (long) list of supported


timezones with:
timedatectl list-timezones
Set the correct timezone on all servers with:
timedatectl set-timezone UTC

Make sure the local hardware real-time clock (RTC) is set to UTC as well:
timedatectl set-local-rtc 0

We must also make sure that all servers agree on the current time. For this we
connect them to the world-wide atomic clock service using the Network Time
Protocol (NTP, RFC 5905). If we see that our server is not set up to
synchronize its clock we can correct that with:
timedatectl set-ntp on

These are the only operations we need to keep all clocks of a world-wide
network of servers in sync. On desktop systems these operations can also be
performed with the settings GUI.

User Accounts
Each user account on a production server is a potential security hole.

We will protect the servers from most random account and password
guessing attacks by only allowing encrypted remote login with access keys
that are stored on the server. In our service we will also only have a single
administrative account on each server (the account created during
installation). This account will have the public security keys of all users that
are allowed to manage the server. This is a compromise between auditability
of the system (who has logged in) and obscuring which users are allowed
access (who is allowed to log in, i.e. which account do we need to
compromise to get in). We believe preventing damage is more likely to
succeed than auditing who caused it. After all, once logged on to a system it
is relatively easy to erase malicious footsteps.

That being said, new user accounts can be created using:


adduser <account>
and removed with:
deluser <account>

Regular user accounts have account directories in “/home”. Each account


directory should have access mode 750:
chmod 750 /home/*

We should also restrict the default permissions for new user-created files by
changing a setting in “/etc/login.defs”:
UMASK 027

This causes new user files and directories to be inaccessible to accounts that
are not in the same user group by default. It is also useful to inspect the
“/etc/passwd” file and set or correct the full username (GECOS) field of the
root and the administrative users.

If you will be using the “vi” or “vim” editors to paste-in indented content it
may be useful to add the following line to “/etc/vim/vimrc”:
set paste

In the later chapter on LDAP we will show how to configure development


and office systems to allow automatic creation of authorized user accounts.

Bootstrap Error Recovery


Under certain circumstances, for example after a power failure, the default
configuration of Ubuntu waits for user input before rebooting. This can be
very inconvenient if a headless (no display or keyboard) server is installed in
a remote colocation facility. We can tell the system it is ok to proceed by
changing the following parameters in the file “/etc/default/grub”:
GRUB_TIMEOUT=2
GRUB_RECORDFAIL_TIMEOUT=2

We can also disable the IPV6 protocol in the same file:


GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1"
GRUB_CMDLINE_LINUX="ipv6.disable=1"

After this we must update the boot program using:


update-grub

Reduced Disk Activity


The UNIX and Linux operating systems use system memory to cache mass
storage blocks to speed up file operations and they use a swap partition (or in
Ubuntu 20.04, a swap file) to extend the amount of memory available to
applications. The default settings of Ubuntu for this mass storage cache and
swap are too aggressive and result in too many disk operations for server
systems.

Servers should be configured with enough system memory for all standard
operations and they should almost never swap. We add a few parameters in
the file “/etc/sysctl.conf”:
vm.swappiness = 10
vm.dirty_ratio = 40
vm.dirty_background_ratio = 10
vm.dirty_writeback_centisecs = 6000

We also add some parameters to the root file system in “/etc/fstab”:


<id> / ext4 defaults,errors=remount-ro,commit=60 0 1

This dramatically reduces mass storage activity in idle systems.

Determine System Configuration


A number of commands can be used to display the configuration of a system.
We can install the packages that are not part of the default installation using:
apt install inxi hwinfo

We can then record a comprehensive report on the current system


configuration using “~/Scripts/extract_system.sh”:
#!/bin/bash
HOSTNAME=`hostname --short`
echo "========== inxi ${HOSTNAME} =========="
inxi -Flmprsu -c 0
echo; echo
echo "========== df ${HOSTNAME} =========="
df -h
echo; echo
echo "========== fdisk ${HOSTNAME} =========="
fdisk -l
echo; echo
echo "========== hdparm ${HOSTNAME} =========="
hdparm -i /dev/sda
echo; echo
echo "========== mount ${HOSTNAME} =========="
mount
echo; echo
echo "========== free ${HOSTNAME} =========="
free
echo; echo
echo "========== lscpu ${HOSTNAME} =========="
lscpu
echo; echo
echo "========== lshw ${HOSTNAME} =========="
lshw -short -quiet
echo; echo
echo "========== hwinfo ${HOSTNAME} =========="
hwinfo --short
echo; echo
echo "========== lspci ${HOSTNAME} =========="
lspci
echo; echo
echo "========== lsusb ${HOSTNAME} =========="
lsusb -v
echo; echo
echo "========== lsblk ${HOSTNAME} =========="
lsblk -aO
echo; echo

Other Strange Things


On some hardware configurations with ASPEED, Matrox or Nvidia display
adapters, the graphics display of a desktop installation shows “hsync tearing”
after login. Editing “/etc/gdm3/custom.conf” and removing the comment in
front of “WaylandEnable=false” seems to solve the problem. See
https://bugs.launchpad.net/ubuntu/+source/gnome-shell/+bug/1730796. This
does not seem to be unique to Ubuntu: See
https://bugzilla.redhat.com/show_bug.cgi?id=1498336.
Chapter 4 - Networking

Netplan, Networkd and NetworkManager


In the early days of TCP/IP on UNIX and Linux, we had to configure network
interfaces using shell scripts, obscure “ifconfig” or “ip” commands (or their
limited X-windows GUI front-ends) and the infuriatingly poorly documented
“/etc/sysconfig/network-scripts/” directory.

Then the Debian branch of Linux started to adopt the


“/etc/network/interfaces” text file and configuring things became a lot
simpler.

Ubuntu 20.04 uses the network configuration package “netplan”.

Installation
The standard server and desktop installations of Ubuntu 20.04 LTS both
include the netplan package. The “Network Manager” package is
automatically installed in the desktop installation.

File Structure
The netplan interface configuration YAML files are located in the directory
“/etc/netplan/”. These files are then “rendered” by either the “networkd”
service on servers or the “NetworkManager” service on workstations into
files in the “/run/systemd/network” directory.

Unfortunately, the directory structure is uncomfortable: Each installer


variation seems to use its own naming for files in this directory. It would
have been better to adhere to the “/etc/netplan/netplan.conf” plus
“/etc/netplan/conf.d” convention.
Service Operations
The “ifupdown” tools are no longer installed by default but the “ip” tools can
still be used. The preferred tool now is “networkctl”:
ip addr
ip route
networkctl list
networkctl status
arp –a
nmap –sT –O 10.0.0.0/24

DHCP Client Configuration


The default network configuration in “/etc/netplan/*” uses DCHP and is
intended for the most common case, a system connected to a LAN. We will
store our own configurations in whatever-name file “/etc/netplan/*” the
installer happens to create:
network:
version: 2
renderer: networkd
ethernets:
eth0:
addresses: []
dhcp4: true
optional: true

IPV6 addresses are automatically configured and do not require a DHCP


server and in our infrastructure we do not use them anyway.

Note that this new standardization has given the maintainers of the “udev”
device manager new freedom and it is no longer easy to predict what the
name of network devices is going to be. Worse even, the names now seem to
change from version to version on the same hardware. This unnecessarily
complicates documentation and scripting. Historically these devices used to
be named “eth[0-9]”. In these examples we use “eth[0-9]” for clarity but you
should substitute the names used in your various installations as needed.

Fixed Address Configuration


A range of public IPV4 addresses is specified by a couple of pieces of
information: Primarily it is specified as a dotted and slashed quad: for
example 54.77.31.24/29. This range specifies 2(32 – 29) = 8 public addresses
from 55.77.31.24 through 55.77.31.31.

We also need to know which of these addresses is allocated to our ISP router.
Some ISPs use the first usable address in the range (.25) and some use the
last (.31). In each case, the other end address is the broadcast address in the
range (.31 or .24, respectively). So each such /29 range has 6 usable
addresses.

The ISP typically also provides two IP addresses (not in this range) for their
DNS servers. We will be using our own DNS servers but it is useful to have
these extra DNS servers as a backup configuration.

Many ISP-provided routers also serve the DHCP protocol on our static IP
range. It should be possible to disable or ignore this since the DHCP protocol
contains a mechanism for making sure allocated addresses are not already in
use, either by fixed allocation or by another DHCP server.

With this information we can now configure a “/etc/netplan/*” file for a


physical server with a public interface “eth0” and a private interface “eth1”:
network:
version: 2
renderer: networkd
ethernets:
eth0:
addresses: [54.77.31.25/29]
gateway4: 54.77.31.24
nameservers:
addresses: [54.77.54.54, 54.77.54.55]
eth1:
addresses: [10.0.0.10/24]
nameservers:
addresses: [10.0.0.53,10.0.0.54]

Bridged Configuration
This static configuration is not sufficient if we use a physical machine to host
a set of virtual machines. In that case we must configure both the network
“ethernets” and “bridges” for each:
network:
version: 2
renderer: networkd
ethernets:
eth0:
dhcp4: false
eth1:
dhcp4: false
bridges:
br0:
interfaces: [eth0]
addresses: [50.255.38.81/29,50.255.38.83/29]
gateway4: 50.255.38.86
parameters:
stp: false
forward-delay: 0
br1:
interfaces: [eth1]
addresses: [10.0.0.10/24]
nameservers:
addresses: [10.0.0.30,10.0.0.31]
parameters:
stp: false
forward-delay: 0

The settings are applied to the bridges which then carry them forward into
their associated interfaces.
Chapter 5 - Firewall
A firewall protects a server and any local area networks behind it from
unauthorized incoming Internet traffic while permitting and facilitating
outgoing traffic.

A basic principle of our infrastructure is that no system should assume it can


trust the network it is connected to or any other systems that can access it.
Each system should start by prohibiting all access and then only enable types
of access that are absolutely necessary for it to perform its specific function.

Netfilter, Iptables and Ufw


In the Linux kernel, network packets are processed by “netfilter” modules.
These modules are controlled by the “iptables” utility, which gives the
mechanism its name. Packets are processed by sequentially traversing sets of
“rules” in “chains”. A rule in a chain can cause a “goto” or “jump” (really a
“subroutine call”) to another chain, and this can be repeated to whatever level
of nesting is desired. In future releases iptables will be replaced by “nftables”.

Iptables is quite flexible but it is also quite complicated. Most systems would
benefit from a simple standard set of rules. Ubuntu provides a front-end
called the “uncomplicated firewall” (ufw) and its configuration files which
are stored in “/etc/ufw”.

Installation
The standard server and desktop installations of Ubuntu 20.04 LTS both
include the netfilter, iptables and ufw firewall software packages.

File Structure
The ufw package stores its configuration files in “/etc/ufw” and there is an
additional configuration file “/etc/defaults/ufw”.

Service Operations
We can control and determine the status of the firewall using:
ufw enable
ufw status
ufw show raw
ufw logging on
ufw logging off
ufw disable

To list the current iptables configuration in more detail use:


iptables --list
iptables --table nat --list

To determine which applications use which ports use:


ss -tulpn

Selectively Allowing Incoming Requests


Initially, an Ubuntu server installation only listens on TCP port 22 for the
Secure Shell (SSH), TCP and UDP ports 53 for the Domain Name System
(DNS).

An Ubuntu desktop installation initially does not listen on TCP port 22 for
the “Secure Shell” (SSH), but it does listen on TCP and UDP ports 53 for the
“Domain Name System” (DNS), UDP port 68 for the “Dynamic Host
Configuration Protocol” (DHCP) client, UDP port 5353 for Avahi mDNS and
TCP and UDP ports 631 for the “Common Unix Printing System” (CUPS).
We will not use the mDNS and the CUPS protocols and we will let the
firewall block access to these ports.

We can use a simple script for example “~/Scripts/configure_ufw.sh” to


configure ufw for the ports a particular system will be allowed to be
contacted on. Enable the relevant entries for each server by removing the
“#?” comment markers:
#!/bin/bash
set –x
# disable and return firewall to default settings
ufw disable
ufw reset
# allow SSH access from anywhere via tcp only
ufw allow 22/tcp
# allow HTTP and HTTPS access from anywhere via tcp only (if we serve web pages or
REST)
#? ufw allow 80/tcp
#? ufw allow 443/tcp
# allow SMTP and encrypted SMTP and POP3 access from anywhere via tcp only (if we
serve email)
#? ufw allow 25/tcp
#? ufw allow 587/tcp
#? ufw allow 995/tcp
# allow DNS access from anywhere via udp and tcp (if we serve DNS)
#? ufw allow 53
# allow DHCP access from anywhere via udp and tcp (if we serve DHCP)
#? ufw allow 67/udp
#? ufw allow 68/udp
#? ufw allow 647/tcp
#? ufw allow 847/tcp
# allow NTP access from anywhere via udp only (if we serve NTP)
#? ufw allow 123/udp
# allow OpenLDAP access from anywhere via udp only (if we serve OpenLDAP)
#? ufw allow 389
# allow OpenVPN access from anywhere via udp only (if we serve OpenVPN)
#? ufw allow 1194/udp
# allow MySQL access from internal networks only
#? ufw allow proto tcp from 10.0.0.0/8 to any port 3306
# start using the firewall and report any issues
ufw enable
# report the current rules used by the firewall
ufw status verbose

We run this script once when a new machine is installed and from then on
ufw will protect the system from all other access and from a set of malformed
or malicious network packets.

Periodically we can verify if the firewall is still configured properly:


ufw status verbose

We can add custom “iptables” commands as well, by placing them in


“/etc/ufw/before.rules”. One problem with this file is that it is restored to its
initial contents by “ufw reset”, so this command must only be executed once,
at the beginning of the configuration process.

If you have created the “~/Scripts/configure_ufw.sh” script above to maintain


the permitted incoming addresses, comment-out the reset command after
running the script for the first time.

Always keep a backup copy of the modified “before.rules” file.

Masquerading Outgoing Requests


We want to give the machines on an internal network access to the Internet,
for example for automatic Ubuntu software and security updates. We do this
by configuring the firewall of the gateway machines as a network address
translator (a traditional NAT with port translation per RFCs 2663 and 4787).
Setting up the firewall as a NAT requires a number of cooperating
configuration steps:

First we must allow the gateway kernel to forward packets between the
network interfaces or bridges in “/etc/ufw/sysctl.conf”:
net/ipv4/ip_forward=1

Then we must tell the firewall that we permitted this in “/etc/default/ufw”:


DEFAULT_FORWARD_POLICY="ACCEPT"

Finally we add a “nat” table to the “/etc/ufw/before.rules” before the first


“*filter” rules to do the actual address translation:
# nat Table rules
*nat
:PREROUTING ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
-F
# Place additional forwarding rules here…
# Forward traffic from br1 through br0.
-A POSTROUTING -s 10.0.0.0/8 -o br0 -j MASQUERADE
COMMIT
Bridging Virtual Machine Requests
On a physical server that will host virtual machines we must tell the firewall
that we are using bridging in “/etc/ufw/before.rules”, as the final command
before the final “COMMIT”:
# configure bridge networking
-I FORWARD -m physdev --physdev-is-bridged -j ACCEPT

Forwarding Requests to Internal Servers


In our infrastructure we have physical machines whose only role is to protect
and serve virtual machines which are dedicated to specific functions. The
physical machines act as firewalls and gateways for the virtual machines and
are assigned public IPV4 addresses. The virtual machines only have internal
addresses so the physical machine must also forward certain incoming
requests to the proper virtual machine.

If for example our infrastructure offers web and REST API services it does so
on ports 80 and 443 of a public IPV4 address. The physical machine must
forward packets received on these ports to the equivalent ports on the internal
address of a virtual machine dedicated to that service. Then, response packets
from the virtual machine must be returned to the proper public client address
(where it may be forwarded again to an internal workstation). Two common
cases are forwarding TCP connections, for example SMTP:
-A PREROUTING -i br0 -p tcp -d 50.255.38.82 --dport 25 -j DNAT --to-destination
10.0.0.34:25
-A POSTROUTING -o br1 -p tcp -d 10.0.0.34 --dport 25 -j SNAT --to-source 10.0.0.20
Or connectionless UDP, for example DNS:
-A PREROUTING -i br0 -p udp -d 50.255.38.82 --dport 53 -j DNAT --to-destination
10.0.0.31:53
-A PREROUTING -i br0 -p tcp -d 50.255.38.82 --dport 53 -j DNAT --to-destination
10.0.0.31:53
-A POSTROUTING -o br1 -p tcp -d 10.0.0.31 --dport 53 -j SNAT --to-source 10.0.0.20

For more details, see the later chapter on “Global Load Balancing”.
Chapter 6 - Remote Access

OpenSSH Secure Shell


The Secure Shell (SSH, RFC 4251) application “ssh” and service
“ssh.service” allow us to create ad-hoc encrypted communication channels
between systems. The main application for this is remote login, but it is also
used for scripted communications among systems, for example for backup
file transfers. SSH can also be used to tunnel X11 graphics console
communications and arbitrary TCP communications.

SSH replaces the older “telnet”, “ftp”, “rlogin” and “rsh” applications which
are not secure and which should never be installed or used in any production
environment.

An alternative to SSH could be “OpenVPN” which can create more


permanent encrypted communication channels between systems.

Installation
The base Ubuntu 20.04 server installation automatically installs the
“OpenSSH” client and server packages. On the desktop installation we must
install the server package manually. We can make sure the packages are
properly installed using:
apt install openssh-client openssh-server
apt list --installed | grep openssh

File Structure
The configuration files of the package are found in the “/etc/ssh” directory.
The client configuration is found in the file “/etc/ssh/ssh_config” with
overrides in the “/etc/ssh/ssh_config.d” directory.

The Server configuration is found in the file “/etc/ssh/sshd_config” with


overrides in the “/etc/ssh/sshd_config.d” directory.

Additional server configuration options can be added to the file


“/etc/default/ssh”.

Service Operations
Server operations are managed by the “ssh.service”. Some common
operations are:
systemctl start ssh
systemctl restart ssh
systemctl stop ssh

We can observe ongoing server operations using


systemctl status ssh
journalctl --unit=ssh
journalctl --follow --unit=ssh

Server Configuration
The server configuration is found in the file “/etc/ssh/sshd_config”. We will
add our own rules in a file “/etc/ssh/sshd_config.d/quarium.conf” which will
override the default settings. We will require that all logins are only done
through less predictable account names:
PermitRootLogin no

We will require that users may only login if their public SSH key is present in
the “.ssh/authenticated_keys” file of the account they are logging in to:
PasswordAuthentication no

The normal security updates of our systems are continuously changing the
encryption cyphers, message authentication codes and key exchange
algorithms used by SSH. We can list the ones supported by our current
version with:
sshd -T | grep "\(ciphers\|macs\|kexalgorithms\)"

We can debug a particular configuration by running “sshd” as a regular


application and turn on debugging:
systemctl stop ssh
/usr/sbin/sshd –D –d –d -d

This is useful if a connection that worked before stops working after a


security update disables a security method both sides relied on. We may want
to accommodate some of the older, less secure methods if they are still
supported but are not enabled by default:
KexAlgorithms +diffie-hellman-group1-sha1
Ciphers +3des-cbc,aes128-cbc,aes192-cbc,aes256-cbc

In older SSH clients like F-Secure it may be necessary to explicitly deny


SSH1 connections.

Client Keys and Configuration


Each account that allows login has a home directory “/home/<account>” that
should have permissions 750.

Each account that allows login using SSH should have a subdirectory “.ssh”
with permissions 700.

In this directory we must create a file “authorized_keys” with permissions


400 which contains one line for each access key that is allowed to login using
SSH:
ssh-rsa <key> <keyname>

Each key is binary encoded using Base64. 3 binary bytes are encoded as 4
characters, so a 2048-bit key is encoded to 342 characters. The key name is
used to recognize incoming connection requests and is not related to (but
often the same as) the account name of the owner of the key. We can create a
new 2048-bit SSH private/public key pair with the command:
ssh-keygen –b 2048 -C <keyname>

Use an empty password to protect the SSH keys if asked and select a key
name that describes its purpose, for example a user name or a host name.
This command produces two files in the “.ssh” directory, by default “id_rsa”
and “id_rsa.pub”. This last file consists of the single line suitable for
appending to the “authorized_keys” files of accounts that may be logged in to
using this key pair.

Some other SSH implementations require public keys to be in RFC 4716


SSH Public Key File Format. We can convert an OpenSSH public key into
this format using the command:
ssh-keygen –e

Another option “-i” allows for the conversion of RFC 4716 SSH Public Key
File Format keys into OpenSSH key format.

Tunneling
Our infrastructure consists of (among others) a set of physical servers which
host virtual machines. The virtual machines perform services based on
packets that are forwarded to them by the “iptables” “ufw” configuration of
the physical machines. The virtual machines do listen on SSH port 22, but
only on their internal bridge network. To access a virtual machine, we first
have to login to the physical machine on port 22 of its public address, and
then we have to login to the virtual machine on port 22 of its internal bridge
network address.

This can be tedious and fortunately the SSH client application can use a
configuration file “~/.ssh/config” to set up complex connection forwarding
options. In the following example the local private key “~/.ssh/id_rsa”
authenticates a user “quarium” on a cluster consisting of servers “us1” and
“us11a” where only “us1” has a public IP address and “us11a” can be
reached through its internal bridge network only:
Host us1
Hostname us1.quarium.com
User quarium
Host us11a
Hostname us11a.lan.quarium.com
User quarium
ProxyCommand ssh -A -q us1 nc -q0 us11a.lan.quarium.com 22

The local user can connect to each of the systems by their short names
without having to specify a user name, for example:
ssh us11a

To allow other system users (for example crontab) to use the aliases, store the
file in “/etc/ssh/ssh_config.d” instead.
Chapter 7 - Virtualization
In early computers, physical processors were directly connected to physical
memory, i.e. the same program address always accessed the same physical
memory location. When a program needed more memory than available, it
had to resort to complex overlay methods to run.

In the early 1960’s, mainframe computers were constructed with memory


management units (MMUs). This allowed multiple programs to access the
same virtual memory location, which for each program was mapped to a
different physical memory location. This also allowed computers to separate
their operating system and user programs to protect each other from
erroneous or malicious accesses. If there was not enough physical memory,
pages or segments of memory could be swapped to disk storage to simulate
more memory.

The UNIX and Linux operating systems make heavy use of virtual memory and
can theoretically use entire mass storage devices as extensions of physical
memory. In 1979, the Motorola 68000 microprocessor was one of the first to
have a MMU. This meant that the UNIX operating system could run on it. In
1985, the Intel x86 processor line caught up with the 80386. Today, the ARM
processors in every smart phone and every “Internet of Things” (IoT) device
run operating systems which use virtual memory.

In 1967 on mainframe computers the virtual memory concept was further


generalized with the introduction of machine virtualization. A hypervisor
program now allowed multiple operating systems to share the same physical
computer, each in their own virtual machine. In practice, the hypervisor is
part of a host operating system which then runs several guest virtual
machines, each with their own guest operating system. In 2005, the
microprocessor operating systems also started using machine virtualization.

As an offshoot, UNIX and Linux systems also implemented operating system


level virtualization, first as “chroot” and now for example as “Docker
containers”. This allows programs to share the resources of a host operating
system more securely, but such containers can only be used fully if they run
the same operating system as the host.

KVM Kernel-Based Virtual Machine


In our infrastructure we will use the kernel based virtual machine (KVM)
which became part of the Linux kernel in 2007. Internally, for development,
we also use Oracle VirtualBox to run virtual machines on Windows, and we
use Parallels to run virtual machines on MacOS. The latter allows us to
construct a continuous-integration server for all platforms we support on a
single Macintosh computer.

Installation
The virtualization packages are not installed by default and should only be
installed on physical Ubuntu 20.04 server or desktop installations. The base
packages are installed with:
apt install qemu qemu-kvm libvirt-daemon libvirt-clients virtinst bridge-utils
libguestfs-tools

On desktop installations, the virtualization GUI can be installed with:


apt install virt-manager

In the virt-manager preferences, enable it as a system tray icon. In the


desktop sidebar, add it to favorites. In preferences, go to the VM Details tab
and set Graphical Console Scaling to “Always”. This will always scale the
graphical console of virtual machines to the size of the virt-manager window.
Also select VNC (and not Spice) to remotely access graphics on a virtual
machine.

Add all users that may control virtual machines (for example “quarium”) to
the “/etc/group” “libvirt”.

Check if “vhost.net” is installed using:


modprobe vhost_net
followed by:
lsmod | grep vhost

You should see “vhost_net” in the output of grep. Tell the kernel to load the
module at boot time using:
echo vhost_net >>/etc/modules

File Structure The disk images for virtual machines are stored in
“/var/lib/libvirt/images”. Service Operations Server operations are managed
by the “libvirtd.service”. Some common operations are:
systemctl start libvirtd
systemctl restart libvirtd
systemctl stop libvirtd

We can observe ongoing server operations using


systemctl status libvirtd
journalctl --unit=libvirtd
journalctl --follow --unit=libvirtd

The status of the configured virtual machines can be determined with:


machinectl list –all
virsh list --all --title

To get a list of just the virtual machines in a particular state for scripting
purposes, use:
virsh list --state-shutoff --name
virsh list --state-running --name
virsh list --state-paused –name

Managing Virtual Machines in the GUI


In the “Virtual Machine Manager” desktop application, select “File” -> “New
Virtual Machine” from the application menu.

Choose a local install image and use the ISO image of the software
distribution medium.

Set the amount of RAM allocated to the virtual machine, for example to 2048
and set the number of processors, for example 1.
Create a mass storage image for the virtual machine, for example 100GB.

Set the name of the virtual machine, for example “ubuntu20base”, select to
customize the configuration before install and finish the installation.

In the “Overview” tab, set the “Title” to something meaningful, for example
“Ubuntu 20 Base” and “Apply” the setting.

In the “Boot Options” tab, select “Start virtual machine on host boot up” and
“Apply” the setting.

In the “NIC” tab, select “Specify shared device name” and enter “br1” and
“Apply” the setting.

Then “Begin Installation”. See the earlier chapters for configuration details.
Once a virtual machine is properly configured, dump its configuration into a
file that can be used to create the virtual machine on other physical servers
using:
virsh dumpxml somehost > somehost.xml

Managing Virtual Machines in the Shell


On a physical machine with an Ubuntu 20.04 server installation, first copy a
prepared virtual disk image for example “somehost.qcow2” and the virtual
machine specification for example “somehost.xml” file into the
“/var/lib/libvirt/images” directory and then either create and start a temporary
virtual machine using:
virsh create somehost.xml

or create and start a permanent one with:


virsh define somehost.xml
virsh start somehost.xml

List the configured virtual machines using:


virsh list --all --title
Start a virtual machine using:
virsh start somehost

Stop a virtual machine using:


virsh shutdown somehost

Unfortunately, the newer “systemd”-compatible command “machinectl” is


not quite up to the task of replacing all functions of the older “virsh”
command we use daily yet.

Managing Virtual Disk Space


Over time, virtual disk images grow as software is installed and updated and
as data files grow. The virtual disk image will also accumulate some unused
space after files are deleted. It is possible to release the unused space back to
the host file system using:
virt-sparsify --quiet <source>.qcow2 <target>.qcow2

We use this utility to periodically make backup images of our (shut down)
production virtual machines which are then copied to other physical
machines for use in a standby virtual machine. We then resume production
operations with the sparsified disk image.

It is also possible to change the size of a virtual disk image. This can be done
using the utilities “fdisk”, “qemu-img”, “pvs”, “lvs” and “resize2fs”. This is a
complicated operation and it is probably easier to allocate new virtual
machines with a large amount of unused disk space to begin with (for
example 100GB). The unused space will only be allocated on the physical
disk if it is actually used.

Duplicating Virtual Disk Images


It is possible to copy a virtual disk image (or any file really) which contains
long sequences of zero bytes to a sparse file using:
cp --sparse=always source destination

Virtual disk images can be copied from one host server to another while
preserving sparse disk allocation using:
rsync -a -e ssh --sparse --compress --progress source destination

It is also possible to convert a virtual disk image from one format to another.
Some useful conversions are from the old “img” format to “qcow2” and from
“qcow2” to “vmdk”. Use the following command for conversion:
qemu-img convert -f raw -O qcow2 source.img target.qcow2
Chapter 8 - DNS
In the early days of the ARPANET, there was one centralized file
“HOSTS.TXT” that contained all mappings of system names to their network
addresses. The file was maintained manually at the Network Information
Center of the Stanford Research Institute and updates had to be
communicated to it by telephone, during business hours. In the early 1980’s
this mechanism became slow and unwieldy and the mechanism was replaced
with the Domain Name Service (DNS, RFC 882 and 883, currently 1034 and
1035).

Today on the Internet, there are two main types of applications: In one type, a
“server” listens for incoming requests from “clients” and sends responses to
those clients. In the other type, two “peers” exchange notifications, requests
and responses with each other.

In both cases one side of the application can “wait for something to happen”
while the other side must determine which remote IP address and port it must
“cause something to happen to”. Most peer-to-peer applications use some
kind of peer discovery server, so ultimately for every network application a
client must map a DNS name to an IP address and TCP or UCP port, and a
server must, mostly for management purposes, be able to map an IP address
and port back to a DNS name.

All Linux, Windows and MacOS systems still have a “hosts” file that is used
to map names, mostly locally, in case DNS servers cannot be reached. This
file can be used to map DNS names to IP addresses and back.

Bind and Resolver


But in most cases, an application will call a function in a “resolver” library
which implements the client side of a DNS client-server application. Such a
DNS client will need a server to connect to.
A local resolver library typically receives the addresses of two or more DNS
servers from the local system, which has received them in turn either through
DHCP or in a static address configuration.

In 1984, the Berkeley Internet Name Domain (BIND) was the first
implementation of DNS for the UNIX system. One of its components was the
name daemon “named”. Although it has been revised a number of times (we
use bind version 9) it is still the most widely used DNS software on the
Internet.

Domains are registered with an Internet domain registrar, who publishes the
addresses of the authoritative DNS servers as part of a higher-level domain,
for example “.com”.

Typical domains have two or more DNS servers. One server is the “master”
and this server contains and serves the authoritative copies of the domain
“zone files”. One or more additional DNS servers act as “slaves” and serve
copies of the zone files obtained from the master.

Typical infrastructures have “internal” and “external” zones. The external


zone contains the public IP addresses served by the infrastructure. The
internal zone or zones contain local IP addresses only visible inside the
firewall. Often the internal zone can be updated by a cooperating DHCP
server (see below) to dynamically configure names for local systems.

Installation
The default Ubuntu20.04 LTS installation includes a DNS client resolver
library.

The DNS server software is not installed automatically in either the server or
the desktop distributions of Ubuntu 20.04 LTS. In our infrastructure, we will
install a DNS master and slave server on our office and development
networks. Office and development workstations will obtain their IP addresses
from these DNS servers. Install the DNS server software on both master and
slave servers but not on client systems using:
apt install bind9 dnsutils

File Structure
The DNS configuration files are stored in the “/etc/bind” directory. The
primary configuration file is “/etc/bind/named.conf”. Files for dynamic zones
can be found in “/var/lib/bind”. On slave servers, files are stored in
“/var/cache/bind”.

Service Operations
The name of package is “bind9”. This is confusing because to “systemctl”
and “journalctl” the associated service is known as “named”. Also, the
associated directories are named “bind”. Operate the service with the
following:
systemctl enable named
systemctl start named
systemctl restart named
systemctl stop named
systemctl disable named

We can observe the activities of the service using:


systemctl status named
journalctl --unit=named
journalctl --follow --unit=named

On Linux and MacOS, a host record can be tested with:


dig email.quarium.com
dig email.lan.quarium.com

or:
dig @server host

An MX record can be tested with:


dig domain MX
On Windows, a host record can be tested with either:
nslookup host

or:
nslookup host server

On Windows, an MX record can be tested in interactive mode:


nslookup
set type=MX
quarium.com

There are a number of useful public free DNS check services on the Internet.
They will test a set of standard configuration criteria and report any problems
found for a specified domain.

Master Configuration
The “/etc/bind” directory contains the server configuration files and the zone
files of the externally visible domains that are maintained manually. The
“/var/lib/bind” directory contains internally visible domains that can be
updated automatically by for example a DHCP server as systems are being
added to or removed from the internal networks.

We will place the master configuration in the file “/etc/bind/named.conf”.


(There is a “named.conf.local” file, but unfortunately in this configuration,
we need to use “views” and “options” and the standard installation won’t let
us put them in the “.local” file.)

We will begin by specifying a few access control lists (“acl”) to limit access
to certain resources. The examples assume we will be using the IPV4
addresses 10.0.0.30 and 10.0.031 for our DNS servers. We'll have a few
systems “ph[12]” for “physical host” and “ns[12]” for “name server”. All IP
address values, domain names and security keys are examples only and
should be changed to the actual values used in your infrastructure:
acl slaves {
10.0.0.31; // ns2.lan.quarium.com
};
acl internals {
50.255.38.80/29; // Main office ISP
10.0.0.0/24; // Main Office LAN
127.0.0/24; // localhost
};

In our infrastructure we are going to allow our DHCP servers to update


certain zones dynamically. To allow that, we need to generate a shared secret
string for each DHCP server:
ddns-confgen -s ns1.lan.quarium.com
ddns-confgen -s ns2.lan.quarium.com

And then we add the generated “.private” values to the


“/etc/bind/named.conf.local” file:
key "ddns-key.ns1.lan.quarium.com" {
algorithm hmac-sha256;
secret "<a Base64 string>";
};
key "ddns-key.ns2.lan.quarium.com" {
algorithm hmac-sha256;
secret "<a different Base64 string>";
};

Next, we’ll add a few options that limit our server to a certain level of
security. Clients can request individual records or transfers of entire zones.
The latter is not safe except for our own slave servers so we will block that
option:
options {
directory "/var/cache/bind";
forwarders {
75.75.75.75; // Main office ISP primary DNS
75.75.76.76; // Main office ISP secondary DNS
};
dnssec-enable no;
dnssec-validation no;
auth-nxdomain no; # conform to RFC1035
listen-on-v6 { none; };
allow-transfer { slaves; };
also-notify { 10.0.0.31; };
version "restricted";
rate-limit {
responses-per-second 10;
// log-only yes;
};
allow-recursion { none; };
additional-from-cache no;
recursion no;
};

Next, we add two “views”, one for our “internal” networks and one for our
“external” networks:
view "internal" {
match-clients { internals; };
allow-query { internals; };
allow-recursion { internals; };
additional-from-cache yes;
recursion yes;
/* insert zones visible internally here */
include "/etc/bind/named.conf.default-zones";
};
view "external" {
match-clients { any; };
/* insert zones visible externally here */
include "/etc/bind/named.conf.default-zones";
};

Then we add a “zone” specification for both our main internal and external
domains in the “internal” view. Note that we only allow private zones to be
updated:
zone "lan.quarium.com" {
type master;
file "/var/lib/bind/lan.quarium.com.zone";
allow-update {key "ns1.lan.quarium.com";};
};
zone "0.0.10.in-addr.arpa" {
type master;
file "/var/lib/bind/0.0.10.in-addr.arpa.zone";
allow-update {key "ns1.lan.quarium.com";};
};
zone "quarium.com" {
type master;
file "/etc/bind/quarium.com.zone";
};
zone "38.255.50.in-addr.arpa" {
type master;
file "/etc/bind/38.255.50.in-addr.arpa.zone";
};

Finally, we add only the external domains to the “external” view:


zone "quarium.com" {
type master;
file "/etc/bind/quarium.com.zone";
};
zone "38.255.50.in-addr.arpa" {
type master;
file "/etc/bind/38.255.50.in-addr.arpa.zone";
};

Public Domain Zone


We describe our main external domain “quarium.com” (how our public
names map to our public IP addresses) in a file
“/etc/bind/quarium.com.zone”:
$TTL 1d
@ IN SOA ph1.quarium.com. hostmaster.quarium.com. (
2020052701 ; the serial number
1h ; slave refresh cycle
15m ; slave retry cycle
1w ; slave cache TTL
2h ) ; negative caching TTL
NS ph1.quarium.com.
NS ph2.quarium.com.
MX 100 ph1.quarium.com.
TXT "v=spf1 mx -all"
A 50.99.46.19
$ORIGIN @
ph1 A 50.99.46.18
ph2 A 50.99.46.19
www CNAME @
email CNAME ph2
lan.quarium.com. NS ph2.quarium.com.

Set the ownership and access modes of all zone files:


chown root:bind /etc/bind/*.zone
chmod 644 /etc/bind/*.zone
chown bind:bind /var/lib/bind/*.zone
chmod 644 /var/lib/bind/*.zone

Public Address Reverse Zone


Likewise, we describe our reverse domain (how our public IP addresses map
to our public names) in a file “/etc/bind/46.99.50.in-addr.arpa.zone”:
$TTL 1d
@ IN SOA ph1.quarium.com. hostmaster.quarium.com. (
2020052701 ; the serial number
1h ; slave refresh cycle
15m ; slave retry cycle
1w ; slave cache TTL
2h ) ; negative caching TTL
NS ph1.quarium.com.
NS ph2.quarium.com.
$ORIGIN @
18 PTR ph1.quarium.com.
19 PTR ph2.quarium.com.

Private Domain Zone


We describe our internal sub domain “lan.quarium.com” (how our private
names map to private IP addresses) in a file
“/var/lib/bind/lan.quarium.com.zone”:
$ORIGIN .
$TTL 1800 ; 30 minutes
lan.quarium.com IN SOA ph1.lan.quarium.com. hostmaster.quarium.com. (
2020052701 ; serial
1800 ; refresh (30 minutes)
900 ; retry (15 minutes)
2592000 ; expire (4 weeks 2 days)
900 ; minimum (15 minutes)
)
NS ns1.lan.quarium.com.
NS ns2.lan.quarium.com.
A 10.0.0.10
$ORIGIN lan.quarium.com.
ph1 A 10.0.0.10
ph2 A 10.0.0.20
ns1 A 10.0.0.30 ; a vm
ns2 A 10.0.0.31 ; a vm
www A 10.0.0.32 ; a vm
email A 10.0.0.33 ; a vm

Private Address Reverse Zone


We describe our private reverse domain (how our private IP addresses map to
our private names) in a file “/var/lib/bind/0.0.10.in-addr.arpa.zone”:
$ORIGIN .
$TTL 1800 ; 30 minutes
0.0.10.in-addr.arpa IN SOA ph1.quarium.com. hostmaster.quarium.com. (
2020052701 ; serial
1800 ; refresh (30 minutes)
900 ; retry (15 minutes)
2592000 ; expire (4 weeks 2 days)
900 ; minimum (15 minutes)
)
NS ph1.quarium.com.
NS ph2.quarium.com.
$ORIGIN 0.0.10.in-addr.arpa.
10 PTR ph1
20 PTR ph2
30 PTR ns1 ; a vm
31 PTR ns2 ; a vm
32 PTR www ; a vm
33 PTR email ; a vm

Other Public Domain Zone


If we serve any other public domains from the same server, for example
“quarium.net”, we can add a file “/etc/bind/others.zone”:
$TTL 1d
@ IN SOA ph1.quarium.com. hostmaster.quarium.com. (
2020052701 ; the serial number
1h ; slave refresh cycle
15m ; slave retry cycle
1w ; slave cache TTL
2h ) ; negative caching TTL
NS ph1.quarium.com.
NS ph2.quarium.com.
MX 100 ph1.quarium.com.
TXT "v=spf1 mx -all"
A 50.99.46.18
$ORIGIN @
ph1 A 50.99.46.18
ph2 A 50.99.46.19
www CNAME @
email CNAME ph2

In that case we also need to add the following zone specification to the
“internal” and “external” views in “/etc/bind/named.conf.local”:
zone "quarium.net" {
type master;
file "/etc/bind/others.zone";
};

Slave Configuration
On slave servers, overall the “/etc/bind/named.conf” file looks very similar,
except for the access control lists:
masters masters {
10.0.0.30; // ns1.lan.quarium.com
};
acl internals {
50.255.38.80/29; // Main Office ISP
10.0.0.0/24; // Main Office LAN
127.0.0/24; // localhost
};
key "ddns-key.ns1.lan.quarium.com" {
algorithm hmac-sha256;
secret "<a Base64 string>";
};
key "ddns-key.ns2.lan.quarium.com" {
algorithm hmac-sha256;
secret "<a different Base64 string>";
};
options {
directory "/var/cache/bind";
forwarders {
75.75.75.75; // Main Office ISP primary DNS
75.75.76.76; // Main Office ISP secondary DNS
};
dnssec-enable no;
dnssec-validation no;
auth-nxdomain no; # conform to RFC1035
listen-on-v6 { none; };
allow-transfer { "none"; };
notify no;
version "restricted";
rate-limit {
responses-per-second 1;
log-only yes;
};
allow-recursion { none; };
additional-from-cache no;
recursion no;
};
view "internal" {
match-clients { internals; };
allow-query { internals; };
allow-recursion { internals; };
additional-from-cache yes;
recursion yes;
/* insert zones visible internally here */
include "/etc/bind/named.conf.default-zones";
};
view "external" {
match-clients { any; };
/* insert zones visible externally here */
include "/etc/bind/named.conf.default-zones";
};

The actual zone descriptions are different. For the “internal” zones, we
specify:
zone "lan.quarium.com" {
type slave;
file "internal.lan.quarium.com.zone";
masters { masters; };
};
zone "0.0.10.in-addr.arpa" {
type slave;
file "internal.0.0.10.in-addr.arpa.zone";
masters { masters; };
};
zone "quarium.com" {
type slave;
file "internal.quarium.com.zone";
masters { masters; };
};
zone "38.255.50.in-addr.arpa" {
type slave;
file "internal.38.255.50.in-addr.arpa.zone";
masters { masters; };
};

And for the “external” zones we specify different zone file names, so zone
transfers will not conflict:
zone "quarium.com" {
type slave;
file "external.quarium.com.zone";
masters { masters; };
};
zone "38.255.50.in-addr.arpa" {
type slave;
file "external.38.255.50.in-addr.arpa.zone";
masters { masters; };
};

Once the slave server is enabled and started, we should see log file entries in
“/var/log/named/bind.log” describing the zone transfers:
20-Oct-2018 04:49:20.555 general: info: zone quarium.com/IN/internal: Transfer
started.
20-Oct-2018 04:49:20.556 xfer-in: info: transfer of 'quarium.com/IN/internal' from
10.0.0.40#53: connected using 10.0.0.31#52581
20-Oct-2018 04:49:20.558 general: info: zone quarium.com/IN/internal: transferred
serial 2018101901
20-Oct-2018 04:49:20.558 xfer-in: info: transfer of 'quarium.com/IN/internal' from
10.0.0.40#53: Transfer status: success
20-Oct-2018 04:49:20.558 xfer-in: info: transfer of 'quarium.com/IN/internal' from
10.0.0.40#53: Transfer completed: 1 messages, 13 records, 318 bytes, 0.002 secs
(159000 bytes/sec)

We should see copies of the zone files getting cached in “/var/cache/bind”.


Unfortunately these files are no longer merely exact copies of the original
zone text files and so problems are harder to diagnose.

Zone transfers are especially tricky: While there is no particular reason to


prohibit them, other than holding-off a denial-of-service attack if zones are
large, we are prohibiting everything except simple record requests from
outside. We are setting up the master server to notify slaves of any changes in
zone files and we allow slave servers to transfer zones from masters without
notification cascades.

Client Configuration
Clients are configured through their “/etc/netplan/*” file. See network
configuration above.
Chapter 9 - DHCP
Originally, all network interfaces on all systems had to be manually
configured with an IP address. This was tedious and error-prone, so in 1993,
the Dynamic Host Configuration Protocol (DHCP, currently RFC 2131) was
introduced.

In this protocol, a system broadcasts a UDP request to “discover” if a DHCP


server exists on a particular network. It is possible that multiple servers exist
and one or all may return a UDP “offer” containing an IP address. The client
then sends a “request” for the offer it decides to accept and the offering server
finalizes the transaction with an “acknowledgement”. (This is known as the
DORA sequence.) The client then uses the Address Resolution Protocol
(ARP, RFC 826) to verify that no other client is using the same IP address
(possibly allocated to it by a different DHCP server).

DHCP Client and Server


Installation
The default Ubuntu 20.04 LTS installation includes a DHCP client.

In our infrastructure, we will install a DHCP primary server and failover


server on our office and development networks. Office and development
workstations will obtain their IP addresses from these DHCP servers.

Ubuntu uses the Internet Systems Consortium (ISC) DHCP server, which
implements a failover protocol that allows two DHCP servers to redundantly
manage one pool of IP addresses. We can install the DHCP server package
using:
apt install isc-dhcp-server
File Structure
The configuration of the IPV4 DHCP server is stored in the file
“/etc/dhcp/dhcpd.conf” and the configuration for IPV6 is stored in
“/etc/dhcp/dhcpd6.conf”. We will only configure the IPV4 version in our
infrastructure.

Service Operations
Server operations are managed by the “isc-dhcp-server.service”. Some
common operations are:
systemctl enable isc-dhcp-server
systemctl start isc-dhcp-server
systemctl restart isc-dhcp-server
systemctl stop isc-dhcp-server
systemctl disable isc-dhcp-server

We can observe ongoing server operations using


systemctl status isc-dhcp-server h
journalctl --unit= isc-dhcp-server
journalctl --follow --unit= isc-dhcp-server

We can combine log messages from multiple modules into one stream for
clarity:
journalctl --follow --unit=isc-dhcp-server --unit=named

We can observe the current leases of the DHCP server using:


dhcp-lease-list --parsable

or more directly by:


cat /var/lib/dhcp/dhcpd.leases

We can clear the current leases by stopping the DHCP server, emptying the
contents of “dhcpd.leases” and deleting the file “dhcpd.leases~” and then
restarting the DHCP server.
Automatic Allocation
We configure a range of IP addresses for a particular subnet by adding the
following to the configuration file:
subnet 10.0.0.0 netmask 255.255.255.0 {
authoritative;
range dynamic-bootp 10.0.0.100 10.0.0.200;
default-lease-time 3600;
max-lease-time 3600;
option routers 10.0.0.1;
option subnet-mask 255.255.255.0;
option nis-domain "lan.quarium.com";
option domain-name "lan.quarium.com";
option domain-name-servers 10.0.0.30, 10.0.0.31;
option time-offset -28800; # PST
option ntp-servers 10.0.0.30;
}

Static Allocation
Within a subnet we can statically allocate an IP address to a particular MAC
address by adding:
host somehost {
option host-name "somehost.lan.quarium.com";
hardware ethernet 00:17:88:13:66:0e;
fixed-address 10.0.0.201;
}

Dynamic DNS
We can tell a DHCP server to update a DNS zone and the corresponding
reverse zone with the IP addresses and host names it allocates by adding:
ddns-update-style interim;
ddns-domainname "lan.quarium.com";
ddns-rev-domainname "0.0.10.in-addr.arpa";
ignore client-updates;
update-static-leases on;
key "ddns-key.ns1.lan.quarium.com" {
algorithm hmac-sha256;
secret "<a Base64 string>";
};
zone lan.quarium.com. {
primary 10.0.0.30;
key ddns-key.ns1.lan.quarium.com;
}
zone 0.0.10.in-addr.arpa. {
primary 10.0.0.30;
key ddns-key.ns1.lan.quarium.com;
}

The secret string is calculated using the “bind9utils” command:


dnssec-keygen -a HMAC-MD5 -b 128 -n USER ns1.lan.quarium.com

Failover Configuration
The failover mechanism relies on the two system clocks being closely
synchronized. All systems in our infrastructure should be configured as NTP
clients of network clocks, so that should not be a problem. First, we need
another key to allow the servers to communicate without outside interference:
dnssec‐keygen ‐a HMAC‐MD5 ‐b 512 ‐n USER DHCP_OMAPI

To configure the failover primary:


failover peer "failover-partner" {
primary;
address 10.0.0.40;
port 647;
peer address 10.0.0.41;
peer port 847;
max-response-delay 60;
max-unacked-updates 10;
mclt 3600;
split 128;
load balance max seconds 3;
}
key omapi_key
{
algorithm hmac-md5;
secret "<your own secret string here>";
};
omapi-port 7911;
omapi-key omapi_key;

And in the subnet on the primary:


pool {
failover peer "failover-partner";
range 10.0.0.150 10.0.0.200;
}

To configure the failover secondary:


failover peer "failover-partner" {
secondary;
address 10.0.0.41;
port 847;
peer address 10.0.0.40;
peer port 647;
max-response-delay 60;
max-unacked-updates 10;
load balance max seconds 3;
}
key omapi_key
{
algorithm hmac-md5;
secret "<your own secret string here>";
};
omapi-port 7911;
omapi-key omapi_key;

And in the secondary subnet:


pool {
failover peer "failover-partner";
range 10.0.0.150 10.0.0.200;
}

Observe how the servers behave in the system logs while shutting down and
restarting the primary.
Chapter 10 - LDAP
The Lightweight Directory Access Protocol, or LDAP, is a protocol for
querying and modifying a X.500-based directory service running over
TCP/IP. The current version is LDAPv3 (RFC 4510, a subset of X.500) and
the implementation used in Ubuntu is OpenLDAP.

X.500 is one of those old ISO protocols that were designed by committee (the
International Telecommunications Union (ITU) in the 1980’s), intending to
solve all possible problems for each member of the committee. LDAP was an
attempt to extract a useful subset of functions that can be used over TCP/IP
but even this protocol is essentially obsolete. Unfortunately there is no easily-
configured REST replacement and it is still used heavily by MacOS,
Windows and Linux.

OpenLDAP Client and Server


Installation
OpenLDAP is not installed by default on either the server or the desktop
version of Ubuntu 20.04. The LDAP server requires that the system has a
static IP address and that the “/etc/hosts” file contains the fully-qualified
domain name of the server.

Install the LDAP server software using:


apt install slapd ldap-utils

This will ask for a new password for an LDAP administrator account.

File Structure
The installation and the database are stored in “/etc/ldap” and
“/etc/ldap/slapd.d” and by default include the “core”, “cosine”,
“inetorgperson” and “nis” database schemas. We will use the “inetorgperson”
and “nis” schemas for our LDAP directory and we use draft RFC 2307bis for
mapping to Linux authentication.

Typical use will not require modification of any of the configuration files
directly. Most administrative procedures are performed using command line
utilities and may even be performed remotely.

Service Operations
Server operations are managed by the “slapd.service”. Some common
operations are:
systemctl enable slapd
systemctl start slapd
systemctl restart slapd
systemctl stop slapd
systemctl disable slapd

We can observe ongoing server operations using


systemctl status slapd
journalctl --unit=slapd
journalctl --follow --unit=slapd
journalctl --follow --unit=isc-dhcp-server --unit=named --unit=slapd

Test if the database has been installed correctly on a server using:


slapcat

Test if the server is accessible and working properly from any system using:
ldapsearch -h <server> -x -b '<dn>'

The default installation already allows us to add users and user groups to the
LDAP database. To do this conveniently, download the free administration
tool “LDAPAdmin” for Windows from “http://www.ldapadmin.org”. This is
a much better way to maintain an LDAP directory than using command-line
tools and “.ldif” files. If this tool is used to set user passwords, use the “SHA-
512 Crypt” hash setting. Manually place the application in a directory, for
example “C:\Program Files (x86)\LDAPAdmin” and create a shortcut from
there to the desktop.

Log in to host “ns1.lan.quarium.com” with base “dc=lan, dc=quarium,


dc=com”. Log in as administrator with “cn=admin, dc=lan, dc=quarium,
dc=com” and the password provided during installation of the package. We
will store groups under “ou=groups, dc=lan, dc=quarium, dc=com” and users
under “ou=people, dc=lan, dc=quarium, dc=com”. As in UNIX and Linux,
users and groups are identified by their attributes “uidNumber” and
“gidNumber”. Other fields match the fields in the standard UNIX and Linux
authentication “/etc/password”, “/etc/shadow” and “/etc/group” files as
proposed in draft RFC 2307bis.

There are two common methods for associating users with groups: Each user
has a primary group identified by single user attribute “gidNumber” which
must match an existing group “gidNumber”. In addition, groups can have
zero or more “memberUid” attributes each of which must match a
“uidNumber”. As in UNIX and Linux, groups are typically not nested and
many applications that use LDAP cannot authenticate to nested groups. All of
the following command-line operations are much easier to perform using the
“LDAPAdmin” tool described above.

If you are not using the above Windows tool, you could also use the
following command-line tool to change user passwords:
ldappasswd -x -W -D 'cn=admin,dc=lan,dc=quarium,dc=com' --S 'uid=
<username>,ou=people,dc=lan,dc=quarium,dc=com'

This prompts for the new <username> password twice, and then prompts for
the administrator password. Note that you must first change the LDAP
database to use for example SHA-512 encryption for passwords as the
default, since the command does not have a parameter to specify this.

Use the following to delete users:


ldapdelete -x -W -D 'cn=admin,dc=lan,dc=quarium,dc=com' 'uid=
<username>,ou=people,dc=lan,dc=quarium,dc=com'

Use the following to delete groups:


ldapdelete -x -W -D 'cn=admin,dc=lan,dc=quarium,dc=com' 'cn=
<groupname>,ou=groups,dc=lan,dc=quarium,dc=com'

Primary Peer Configuration


We can configure two LDAP servers to mirror their databases by installing
the following file “server1_sync.ldif” on the first server and a slight
modification on the second server:
#Load the syncprov module.
dn: cn=module{0},cn=config
changetype: modify
add: olcModuleLoad
olcModuleLoad: syncprov
#Set syncprov parameters
dn: olcOverlay=syncprov,olcDatabase={1}mdb,cn=config
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: syncprov
olcSpSessionLog: 100
#Set a unique server id
dn: cn=config
changetype: modify
replace: olcServerID
olcServerID: 101
#Configure the mirror parameters
dn: olcDatabase={1}mdb,cn=config
changetype: modify
add: olcSyncRepl
olcSyncRepl: rid=001
provider=ldap://ns2.lan.quarium.com
bindmethod=simple
binddn="cn=admin,dc=fc,dc=quarium,dc=com"
credentials=<password>
searchbase="dc=fc,dc=quarium,dc=com"
scope=sub
schemachecking=on
type=refreshAndPersist
retry="30 5 300 3"
interval=00:00:05:00
-
add: olcMirrorMode
olcMirrorMode: TRUE
-
add: olcDbIndex
olcDbIndex: entryCSN eq
-
add: olcDbIndex
olcDbIndex: entryUUID eq
dn: olcOverlay=syncprov,olcDatabase={1}mdb,cn=config
changetype: add
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: syncprov

Note that this is one of those unfortunate languages and formats like Python
and Yaml where whitespace and line indentation is significant. Make sure
that blank lines do not have spaces on them and that continuation lines with
dashes do not have spaces around them. We can apply this file with:
ldapadd -Y EXTERNAL -H ldapi:/// -f ~quarium/Scripts/server1_sync.ldif

We can determine if the “syncprov” module has been properly loaded with:
ldapsearch -LLL -Q -Y EXTERNAL -H ldapi:/// -b cn=module{0},cn=config

Secondary Peer Configuration


We can then apply a very similar file “server2_sync.ldif” on a second LDAP
server:
#Load the syncprov module.
dn: cn=module{0},cn=config
changetype: modify
add: olcModuleLoad
olcModuleLoad: syncprov
#Set syncprov parameters
dn: olcOverlay=syncprov,olcDatabase={1}mdb,cn=config
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: syncprov
olcSpSessionLog: 100
#Set a unique server id
dn: cn=config
changetype: modify
replace: olcServerID
olcServerID: 102
#Configure the mirror parameters
dn: olcDatabase={1}mdb,cn=config
changetype: modify
add: olcSyncRepl
olcSyncRepl: rid=001
provider=ldap://ns1.lan.quarium.com
bindmethod=simple
binddn="cn=admin,dc=lan,dc=quarium,dc=com"
credentials=password
searchbase="dc=lan,dc=quarium,dc=com"
scope=sub
schemachecking=on
type=refreshAndPersist
retry="30 5 300 3"
interval=00:00:05:00
-
add: olcMirrorMode
olcMirrorMode: TRUE
-
add: olcDbIndex
olcDbIndex: entryCSN eq
-
add: olcDbIndex
olcDbIndex: entryUUID eq
dn: olcOverlay=syncprov,olcDatabase={1}mdb,cn=config
changetype: add
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: syncprov

Using the command:


ldapadd -Y EXTERNAL -H ldapi:/// -f ~quarium/Scripts/server2_sync.ldif

Extensions
We could create a file “openssh-lpk_openldap.ldif” to allow the addition of
public SSH keys to user accounts:
cat <<EOF >~/Scripts/openssh-lpk_openldap.ldif
dn: cn=openssh-lpk_openldap,cn=schema,cn=config
objectClass: olcSchemaConfig
cn: openssh-lpk_openldap
olcAttributeTypes: {0}( 1.3.6.1.4.1.24552.500.1.1.1.13 NAME 'sshPublicKey' DESC
'MANDATORY: OpenSSH Public key' EQUALITY octetStringMatch SYNTAX
1.3.6.1.4.1.1466.115.121.1.40 )
olcObjectClasses: {0}( 1.3.6.1.4.1.24552.500.1.1.2.0 NAME 'ldapPublicKey' DESC
'MANDATORY: OpenSSH LPK objectclass' SUP top AUXILIARY MAY ( sshPublicKey $uid ) )
EOF

Add the schema to the database using:


ldapadd -Y EXTERNAL -H ldapi:/// -f ~/Scripts/openssh-lpk_openldap.ldif

Client Configuration
Some machines may allow users with an LDAP account to log in, for
example to retrieve email or access a version control database like git. Begin
by installing the required packages:
apt install libnss-ldap

During installation you will be asked to provide the URI to your LDAP
server which will be stored in “/etc/ldap.conf”. Multiple servers can be listed
separated by a space:
ldap://ns1.lan.quarium.com ldap://ns2.lan.quarium.com

Next, enter the distinguished name of the user database:


dc=lan,dc=quarium,dc=com

Select protocol version 3. Do not make the local root a database administrator
(or a password will be saved on the machine in plaintext). Since we set up
our LDAP server on our private network, we do not need to login to it to
authenticate users. This information can be updated later by executing:
dpkg-reconfigure ldap-auth-config

Now configure the LDAP profile for NSS in “/etc/nsswitch.conf”:


passwd: files systemd ldap
group: files systemd ldap
shadow: files ldap

Configure the system to use LDAP for authentication in “/etc/pam.d”:


pam-auth-update

From the menu, choose LDAP and any other authentication mechanisms you
need. You should now be able to log in to the machine using valid LDAP
credentials. The setup can be tested by logging in as a local administrator and
listing all visible LDAP and local accounts:
getent passwd
getent group
Chapter 11 - Email
Electronic mail or “email” is almost as old as operating systems. Early shared
computer systems allowed messages to be exchanged between users that
were both logged in to the same system at the same time, or allowed them to
be stored until a message recipient logged in to a terminal attached to the
system.

As soon as computer systems were connected to each other using dedicated


circuits or dial-up modems, email was being stored and forwarded,
sometimes over multiple hops between computers. Initially, such networks
were supposed to be used for ethical scientific and personal purposes only.
Commerce, especially in the form of unsolicited messages, was seen as
inappropriate. But direct advertising via physical mail was already a large
industry and it was inevitable that the reduced cost of electronic message
delivery would lead to the first spam on the ARPANET in 1978.

Since studies indicate that around 1 in 5 spam recipients has purchased


something based on a spam message, this annoyance will not go away
anytime soon. This irresponsible human behavior is one of the most
important vectors of identity theft, computer security breaches and viruses.

For legal purposes, since the actual delivery path of email cannot be
guaranteed over the Internet, in the USA email is treated as “interstate
communications” and therefore subject to Federal law. In many cases
unencrypted email is even routed through potentially adversarial foreign
servers which is the reason its use for example between customers and
medical and financial professionals and government is extremely limited.

Organizations can use commercial email services (including advertising-


funded and politically compromised gmail) to protect themselves from the
worst security risks associated with email, or they can run their own email
server. This should only be done with great care by experienced
professionals. In the 1970’s, users of early UNIX systems communicated
email, files and public news (USENET) postings over UUCP (Unix-to-Unix
Copy).
In 1982, the simple mail transfer protocol (SMTP) was defined in RFC 821,
later updated to RFC 5321. In 1984 followed the post office protocol (POP1)
in RFC 918, later updated to POP3 in RFC 1939, 2449 and 1734. In 1986
followed the internet message access protocol (IMAP2) in RFC 1064, later
updated to IMAP4 in RFC 3501.

Today, a complete “email system” consists of a number of typically separate


software packages. Users compose messages using a message user agent
(MUA) which then communicates them via SMTP to a message transfer
agent (MTA). The MTA forwards the messages (possibly via SMTP through
a number of intermediate MTAs) to a message delivery agent (MDA), which
stores them until they are retrieved by the recipient MUA using POP3 or
IMAP.

After UUCP, the “sendmail” package was used for many years as an MTA to
store and forward email on UNIX and Linux systems. This package was
incredibly flexible and had a very powerful configuration language.
Unfortunately this language was very much like Perl a write-only language
and it was incredibly difficult to create anything but a very basic server with
any level of verifiable security.

As of July, 2020, sendmail only retained a 3.74% market share of 3.8 million
accessible email servers. The market is now split between “Exim” at 56.97%
and “Postfix” at 35.32% market share respectively. The remaining 3.97% is
split among a very large number of lesser-known packages, including
“Microsoft mail” at 0.44%.

There are a very large number of smarter and less scrupulous people in the
world than this author, the authors of the software described here and our
readers and users. We include the following instructions therefore with the
usual software caveat that they are “for entertainment purposes only”:

Postfix
This book describes how to install and configure Postfix due to its simpler
configuration, better adaptation to the Debian/Ubuntu way of handling
configuration files, better security partitioning of the applications and better
queuing of large volumes of mail. Postfix is the default MTA for Ubuntu.

File Structure
Postfix configurations are stored in “/etc/postfix”. The main setting file is
“/etc/postfix/main.cf”. Another file “/etc/postfix/master.cf” controls
scheduling and parameters of various postfix applications. There does not
seem to be a “/etc/default/postfix”. Email is stored in “/var/spool/postfix” and
additional working files are stored in “/var/lib/postfix”.

Another important configuration file is “/etc/aliases” which should forward


specific email to the main hostmaster, the postmaster and the webmaster
(according to RFC 2142):
root: [email protected]
hostmaster: [email protected]
postmaster: [email protected]
webmaster: [email protected]
abuse: hostmaster
info: postmaster
marketing: postmaster
noc: hostmaster
sales: postmaster
security: hostmaster
support: postmaster

Each time this file is updated, execute:


newaliases

One thing that you should never do (for example using this file) is to forward
email received at a local address out of the server. Such behavior will
immediately be exploited for spam and will almost as quickly land your
server on a blacklist. This means that none of your legitimate local senders
will be able to communicate with people using common email services like
gmail. The server should only do two things: Forward email from an
authenticated local sender to anywhere, including other local mailboxes, and
receive external email for delivery to a local mailbox.

Once your service grows in popularity you should consider the services of an
email spam filtering service.

Service Operations
Server operations are managed by the “postfix.service”. Some common
operations are:
systemctl start postfix
systemctl restart postfix
systemctl stop postfix

We can observe ongoing server operations using


postfix status
systemctl status postfix
journalctl --unit=postfix
journalctl --follow --unit=postfix

To show all settings in “/etc/postfix/main.cf” that are different from the


default settings, execute:
comm -23 <(postconf –n|sort) <(postconf -d|sort)

Replace the “-23” with “-12” to show settings that duplicate default settings.

To check if Postfix is configured properly, execute:


postfix check

To display all entries in the mail queue:


mailq

To display the contents of a specific entry in the mail queue:


postcat -vq XXXXXXXXXX

To delete all entries in the mail queue:


postsuper -d ALL
postsuper -d ALL deferred
SMTP Satellite Server Installation
The following setup is relatively safe, since the email software will only
accept messages that originate on the machine it is installed on.

On server installations and minimal desktop installations, add Postfix to


forward local administrative email to a hostmaster:
apt install postfix mailutils

Select a “satellite system” and enter the FQDN of the system and of the
server that will forward SMTP. This does not completely configure a satellite
system. Configure the remaining parameters using:
dpkg-reconfigure postfix

This allows for the setting of a root and postmaster mail recipient, for
example “[email protected]”, more domain names the system may
be known at, for example internal LAN names and other settings for which
the defaults are sufficient.

SMTP Email Server Installation


The following setup is relatively unsafe. While it followes established
security practices, it exposes the machine to messages generated outside it.

We especially attempt to stop the service from forwarding messages


generated by sources outside the domains it serves, but this is one of the most
risky areas, since doing this incorrectly can land your server on spam
blacklists. One of the ways to do this is to insist that all sources of email that
must be forwarded outside the server must use encrypted connections and
that they must authenticate themselves. This works both ways: the servers
can trust the senders and the senders can trust that they are submitting email
to the correct server.

Install the email security key, certificate and authority chain in the usual
location and with the usual ownerships and permissions in “/etc/ssl” (see the
chapter on HTTP certificates).

If postfix has not been installed on the server (for example as a satellite),
make sure all packages needed for an email server are installed:
apt install postfix postfix-pcre postfix-ldap

Configure the server as an “internet site”. For now, accept the suggested
“mail name”.

Configure the email server firewall to permit access on ports 25 (SMTP), 110
(POP3), 143 (IMAP), 995 (POP3S) and 587 (ESMTPS). If the email server
runs on a dedicated internal virtual machine, also configure the corporate
firewall server to forward traffic on these port to the internal email server
address.

If postfix was previously installed on the server, execute


dpkg-reconfigure postfix

Select the “Internet Site” configuration.

Set the system mail name to “quarium.com”.

Do not create a root account mailbox since the “/etc/aliases” file will take
care of forwarding.

Do not add any other email destinations at this time.

Do not force synchronous updates to the mail queue since our traffic volume
will be light.

Do not change the local networks at this time.

Do not set a mailbox limit at this time, which is indicated by “0”.

Do not set a local address extension character.

Select the “ipv4”, “ipv6” or “all” protocols depending on your situation.


Configure Postfix SMTP authentication to use dovecot-sasl. This will also
install Dovecot email reader package which we will configure below:
apt install mail-stack-delivery

If email will already be running on a separate server, you can disable the
chroot environment by changing a line in “/etc/postfix/master.cf”:
# service type private unpriv chroot wakeup maxproc command + args
smtp inet n - n - - smtpd

Replace the entire “/etc/postfix/main.cf” file with the following:


# main.cf -- postfix configuration
# ----- local identity -----
# we accept email for these domains:
mydestination =
quarium.com, quarium.net,
$myhostname, localhost.$mydomain, localhost
# ----- user authentication -----
# we use SASL to authenticate users:
smtpd_sasl_auth_enable = yes
# use the same mechanism as dovecot to authenticate users:
smtpd_sasl_type = dovecot
smtpd_sasl_path = private/dovecot-auth
# if authenticated users receive email, report their actual name:
smtpd_sasl_authenticated_header = yes

# ----- relay and other permissions -----


# wait to reject HELO until we have a matching command:
smtpd_delay_reject = yes
# require a HELO or EHLO on each connection
smtpd_helo_required = yes
strict_rfc821_envelopes = yes
disable_vrfy_command = yes
# only accept connections from servers that look real:
#smtpd_helo_restrictions =
# reject_unknown_helo_hostname,
# permit
# we are a virtual machine getting forwarded smtp connections, so we trust only
authenticated users to forward email outside:
smtpd_relay_restrictions =
reject_unauth_pipelining,
reject_non_fqdn_recipient,
reject_unknown_recipient_domain,
permit_sasl_authenticated,
reject_unauth_destination
# reject unverified or non-local destinations:
smtpd_recipient_restrictions =
reject_unknown_sender_domain,
reject_unknown_recipient_domain,
reject_non_fqdn_recipient,
check_sender_access hash:/etc/postfix/sender_checks,
reject_unauth_pipelining,
# permit_mynetworks,
permit_sasl_authenticated,
reject_unauth_destination,
reject_rhsbl_helo dbl.spamhaus.org,
reject_rhsbl_sender dbl.spamhaus.org,
reject_rbl_client cbl.abuseat.org,
reject_rbl_client sbl-xbl.spamhaus.org,
reject_rbl_client bl.spamcop.net,
reject_rhsbl_sender dsn.rfc-ignorant.org
# reject email from people who don't say who they are exactly:
smtpd_sender_restrictions =
reject_non_fqdn_sender,
reject_sender_login_mismatch,
reject_unknown_sender_domain
smtpd_sender_login_maps = hash:/etc/postfix/smtpd_sender_login_maps
# reject out-of-sequence requests, i.e. wait for authentication to succeed or fail
before accepting data:
smtpd_data_restrictions =
reject_unauth_pipelining
# ----- general settings -----
# we only use /etc/aliases and not nis:mail.aliases:
alias_maps = hash:/etc/aliases
# there are no local users on the email server, so do not notify any:
biff = no
# there might be a few clients out there that use an obsolete version of the AUTH
command:
broken_sasl_auth_clients = yes
# remote users will pick up their email using dovecot from here:
home_mailbox = Maildir/
# we are still not quite ready to grow up yet:
inet_protocols = ipv4
# we use the latest version settings:
compatibility_level=2
# increase the maximum size of email
message_size_limit = 20480000
# ----- dovecot integration -----
# use dovecot to deliver email
mailbox_command = /usr/lib/dovecot/deliver -c /etc/dovecot/dovecot.conf -m
"${EXTENSION}"
# ----- TLS encryption -----
# we offer optional TLS encryption (which we will make mandatory on port 587):
tls_random_source = dev:/dev/urandom
smtp_use_tls = yes
smtpd_use_tls = yes
smtpd_tls_security_level = may
smtpd_sasl_security_options = noanonymous
smtpd_sasl_local_domain = $myhostname
smtpd_tls_received_header = yes
# we only offer authentication over encrypted connections:
smtpd_tls_auth_only = yes
# define our local TLS identity:
smtpd_tls_key_file = /etc/ssl/private/email.quarium.com.key
smtpd_tls_cert_file = /etc/ssl/certs/email.quarium.com.crt
# we optimize TLS connections by allowing sessions to resume:
smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache
# for now, as a debugging option, log successful TLS negotiations:
smtpd_tls_loglevel = 1

The rules in the “_restriction” settings are evaluated in the order specified and
the first rule that matches wins.

In “/etc/postfix/master.cf”, enable ESMTPS email submission and only allow


authenticated users to relay mail from port 587 (again disabling chroot):
submission inet n - n - - smtpd
-o syslog_name=postfix/submission
-o smtpd_tls_security_level=encrypt
# -o smtpd_sasl_auth_enable=yes
# -o smtpd_reject_unlisted_recipient=no
# -o smtpd_client_restrictions=$mua_client_restrictions
# -o smtpd_helo_restrictions=$mua_helo_restrictions
# -o smtpd_sender_restrictions=$mua_sender_restrictions
# -o smtpd_recipient_restrictions=
# -o smtpd_relay_restrictions=permit_sasl_authenticated,reject
# -o milter_macro_daemon_name=ORIGINATING

Verify the configuration using:


postfix check
It is also a good idea to keep an eye on the daily operations of an email server
by installing a small executable script “/etc/cron.daily/0pflogsumm”:
#!/bin/bash
perl /usr/sbin/pflogsumm -d yesterday /var/log/mail.log | mail -s "Daily Email
Report" postmaster

The command itself can be installed with:


apt install pflogsumm

After modifying “/etc/postfix/main.cf” or “/etc/postfix/master.cf”, be sure to


run
postfix reload

The differences with the default settings will now be:


alias_maps = hash:/etc/aliases
biff = no
broken_sasl_auth_clients = yes
compatibility_level = 2
disable_vrfy_command = yes
home_mailbox = Maildir/
inet_protocols = ipv4
mailbox_command = /usr/lib/dovecot/deliver -c /etc/dovecot/dovecot.conf -m
"${EXTENSION}"
message_size_limit = 20480000
mydestination =
quarium.com,
quarium.net,
$myhostname,
localhost.$mydomain,
localhost
smtpd_data_restrictions = reject_unauth_pipelining
smtpd_helo_required = yes
smtpd_recipient_restrictions =
reject_unknown_sender_domain,
reject_unknown_recipient_domain,
reject_non_fqdn_recipient,
check_sender_access hash:/etc/postfix/sender_checks,
reject_unauth_pipelining,
permit_sasl_authenticated,
reject_unauth_destination,
reject_rhsbl_helo dbl.spamhaus.org,
reject_rhsbl_sender dbl.spamhaus.org,
reject_rbl_client cbl.abuseat.org,
reject_rbl_client sbl-xbl.spamhaus.org,
reject_rbl_client bl.spamcop.net,
reject_rhsbl_sender dsn.rfc-ignorant.org
smtpd_relay_restrictions =
reject_unauth_pipelining,
reject_non_fqdn_recipient,
reject_unknown_recipient_domain,
permit_sasl_authenticated,
reject_unauth_destination
smtpd_sasl_auth_enable = yes
smtpd_sasl_authenticated_header = yes
smtpd_sasl_local_domain = $myhostname
smtpd_sasl_path = private/dovecot-auth
smtpd_sasl_type = dovecot
smtpd_sender_login_maps = hash:/etc/postfix/smtpd_sender_login_maps
smtpd_sender_restrictions =
reject_non_fqdn_sender,
reject_sender_login_mismatch,
reject_unknown_sender_domain
smtpd_tls_auth_only = yes
smtpd_tls_cert_file = /etc/ssl/certs/email.quarium.com.crt
smtpd_tls_key_file = /etc/ssl/private/email.quarium.com.key
smtpd_tls_loglevel = 1
smtpd_tls_received_header = yes
smtpd_tls_security_level = may
smtpd_use_tls = yes
smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache
smtp_use_tls = yes
strict_rfc821_envelopes = yes

Dovecot
Dovecot is a mail delivery agent (MDA). Of 3.8 million email servers
accessible in July of 2020, it has an installed base of 76.22%. No other MDA
has a double-digit-percentage installed base. Microsoft Exchange only has a
1.28% installed base. Of course these numbers are server counts, not end-user
counts.

Installation
Install or complete the installation of the dovecot packages:
apt install dovecot-core dovecot-imapd dovecot-pop3d

File Structure
The configuration files are stored in “/etc/dovecot”. The main configuration
file is “/etc/dovecot/dovecot.conf”. There are a number of specific
configuration files in “/etc/dovecot/conf.d”. There is also a
“/etc/default/dovecot” configuration file for the service.

Service Operations
Server operations are managed by the “dovecot.service”. Some common
operations are:
systemctl start dovecot
systemctl restart dovecot
systemctl stop dovecot

We can observe ongoing server operations using


systemctl status dovecot
journalctl --unit=dovecot
journalctl --follow --unit=dovecot

POP3 and IMAP Email Server


If you are not supporting IPV6, change the following in
“/etc/dovecot/dovecot.conf”:
listen = *

In the file “/etc/dovecot/conf.d/10-auth.conf” set


disable_plaintext_auth = no

auth_mechanisms = plain login
For debugging of the connections, it may be useful to change
“/etc/dovecot/conf.d/10-logging.conf” to:
auth_verbose = yes

verbose_ssl = yes
In the file “/etc/dovecot/conf.d/10-ssl.conf” set the locations of your actual
security keys and certificates:
ssl_cert = </etc/ssl/certs/email.quarium.com.crt
ssl_key = </etc/ssl/private/email.quarium.com.key
...
ssl_ca = </etc/ssl/certs/DV_NetworkSolutionsDVServerCA2.crt
...
ssl_cipher_list = ALL:!LOW:!SSLv2:ALL:!aNULL:!ADH:!eNULL:!EXP:RC4+RSA:+HIGH:+MEDIUM
Add the no longer included file “/etc/dovecot/conf.d/99-mail-stack-
delivery.conf“:
# Some general options
# Installed protocols are now auto-included by /etc/dovecot/dovecot.conf
# Since mail-stack-delivery depends on them it is more flexible to not
# explicitly list them here, but achieves the same.
# protocols = imap pop3
disable_plaintext_auth = yes
# Since 18.04 basic SSL enablement is set up by dovecot-core and configured
# in /etc/dovecot/conf.d/10-ssl.conf.
# So by default basic enablement is no more done here. The old section is kept
# as comment for reference to the old defaults.
#
# ssl = yes
# ssl_cert = </etc/dovecot/dovecot.pem
# ssl_key = </etc/dovecot/private/dovecot.pem
#
# If you keep a formerly used custom SSL enablement in this file it will (as
# before) continue to overwrite the new defaults in 10-ssl.conf as this file is
# sorted later being 99-*.conf
#
# If you choose to take the new defaults (no ssl config in this file) please
# make sure you have also chosen the package defaults for 10-ssl.conf (to enable
# it there) when dovecot-core configures. Also check that the links for cert/key
# set up there got created correctly (they would not be created if they conflict
with your
# old keys done by mail-stack-delivery).
#
mail_location = maildir:~/Maildir
auth_username_chars =
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890.-_@
# IMAP configuration
protocol imap {
mail_max_userip_connections = 10
imap_client_workarounds = delay-newmail
}
# POP3 configuration
protocol pop3 {
mail_max_userip_connections = 10
pop3_client_workarounds = outlook-no-nuls oe-ns-eoh
}
# LDA configuration
#protocol lda {
# postmaster_address = postmaster
# mail_plugins = sieve
# quota_full_tempfail = yes
# deliver_log_format = msgid=%m: %$
# rejection_reason = Your message to <%t> was automatically rejected:%n%r
#}
# Plugins configuration
#plugin {
# sieve=~/.dovecot.sieve
# sieve_dir=~/sieve
#}

# Authentication configuration
auth_mechanisms = plain login
service auth {
# Postfix smtp-auth
unix_listener /var/spool/postfix/private/dovecot-auth {
mode = 0660
user = postfix
group = postfix
}
}

After this, the non-default settings (“doveconf –n”) will be:


# 2.2.33.2 (d6601f4ec): /etc/dovecot/dovecot.conf
# Pigeonhole version 0.4.21 (92477967)
# OS: Linux 4.15.0-101-generic x86_64 Ubuntu 18.04.4 LTS
auth_mechanisms = plain login
auth_verbose = yes
mail_location = maildir:~/Maildir
mail_privileged_group = mail
managesieve_notify_capability = mailto
managesieve_sieve_capability = fileinto reject envelope encoded-character vacation
subaddress comparator-i;ascii-numeric relational regex imap4flags copy include
variables body enotify environment mailbox date index ihave duplicate mime
foreverypart extracttext
namespace inbox {
inbox = yes
location =
mailbox Drafts {
special_use = \Drafts
}
mailbox Junk {
special_use = \Junk
}
mailbox Sent {
special_use = \Sent
}
mailbox "Sent Messages" {
special_use = \Sent
}
mailbox Trash {
special_use = \Trash
}
prefix =
}
passdb {
driver = pam
}
plugin {
sieve = ~/.dovecot.sieve
sieve_dir = ~/sieve
}
protocols = " imap sieve pop3"
service auth {
unix_listener /var/spool/postfix/private/dovecot-auth {
group = postfix
mode = 0660
user = postfix
}
}
ssl_ca = </etc/ssl/certs/DV_NetworkSolutionsDVServerCA2.crt
ssl_cert = </etc/ssl/certs/email.quarium.com.crt
ssl_cipher_list = ALL:!LOW:!SSLv2:ALL:!aNULL:!ADH:!eNULL:!EXP:RC4+RSA:+HIGH:+MEDIUM
ssl_client_ca_dir = /etc/ssl/certs
ssl_key = # hidden, use -P to show it
userdb {
driver = passwd
}
protocol lda {
deliver_log_format = msgid=%m: %$
mail_plugins = sieve
postmaster_address = postmaster
quota_full_tempfail = yes
rejection_reason = Your message to <%t> was automatically rejected:%n%r
}
protocol imap {
imap_client_workarounds = delay-newmail
mail_max_userip_connections = 10
}
protocol pop3 {
mail_max_userip_connections = 10
pop3_client_workarounds = outlook-no-nuls oe-ns-eoh
}

Spamassassin
Spamassassin is a Perl filter for email. It examines email headers and content
in a variety of ways to determine if a message is likely to be spam. It assigns
a score to each message and the MTA can then decide to forward a message,
forward it with its score or discard it.

Installation
Install the spamassassin package using
apt install spamassassin spamc
File Structure
The configuration files are found in “/etc/spamassassin”. The only file that
may need modification is “/etc/spamassassin/local.cf”. Additional work files
are stored in “/var/lib/spamassassin”.

There is also a configuration file for the service “/etc/default/spamassassin”.

Service Operations
Server operations are managed by the “spamassassin.service”. Some common
operations are:
systemctl start spamassassin
systemctl restart spamassassin
systemctl stop spamassassin

We can observe ongoing server operations using


systemctl status spamassassin
journalctl --unit=spamassassin
journalctl --follow --unit=spamassassin

Filtering
Add a filter to “/etc/postfix/master.cf”:
spamassassin unix - n n - - pipe
user=debian-spamd argv=/usr/bin/spamc -f -e
/usr/sbin/sendmail -oi -f ${sender} ${recipient}

In the same file the smtpd daemon must be told to use the filter:
smtp inet n - n - - smtpd
-o content_filter=spamassassin

Turn on updates in “/etc/default/spamassassin”:


CRON=1
Verify the default settings in “/etc/spamassassin/local.cf” and optionally set:
rewrite_header Subject ***** SPAM _SCORE_ *****

Enable and start the service:


systemctl enable spamassassin.service
systemctl start spamassassin.service

In addition, it may be useful to add free personal or paid commercial domain


blacklisting settings in “/etc/postfix/main.cf”:
smtpd_recipient_restrictions =
...
reject_rhsbl_helo dbl.spamhaus.org,
reject_rhsbl_sender dbl.spamhaus.org

See the “spamhaus.org” web site for restrictions on use.


Chapter 12 - SQL Database

MySQL Database Server


Ubuntu 20.04 uses MySQL 5.7 which is currently published by Oracle but
was named after the daughter of one of the original authors “My Widenius”.
Its author continues work on a new GPL version named after his other
daughter “Maria”. Oracle also publishes “MySQL Workbench”, a very useful
GUI for the development and management of MySQL databases. Both
MySQL and MySQL Workbench run on a variety of operating systems
including Linux, Windows and MACOS.

Installation
The default server installation of Ubuntu 20.04 does not include MySQL, but
we can install it using:
apt install mysql-server

Note that this also installs the MySQL client application.

File Structure
The configuration files for MySQL are stored in “/etc/mysql”. Unlike for
other Linux subsystems, these files apply both to the server component
“mysqld” and the client command “mysql”. The files still follow the old
Windows “.ini” file format. Server and client settings are differentiated
through “[mysql]” and “[mysqld]” sections in these files.

In Ubuntu 20.04, the main file “/etc/mysql/mysql.cnf” first includes files


from “/etc/mysql/conf.d” and then overrides them with files from
“/etc/mysql/mysql.conf.d”. One would expect that local configurations
should be placed in “conf.d”, but then they get clobbered by the distribution
files in “mysql.conf.d”. The safest way, in Ubuntu 20.04, is to place a file in
“mysql.conf.d” with a name alphabetically after “m”, for example
“quarium.cnf”.

Service Operations
Server operations are managed by the “mysql.service”. Some common
operations are:
systemctl start mysql
systemctl restart mysql
systemctl stop mysql

We can observe ongoing server operations using


systemctl status mysql
journalctl --unit=mysql
journalctl --follow --unit=mysql
ss -tulpn | grep mysql

Configuration
The initial installation does not set a password for the “root” MySQL user
(which is not related to the Linux “root” user account but which has a similar
function) and it installs a number of test features we do not need in a
production installation. One of the first things we must do to secure the
installation is to execute:
mysql_secure_installation

The script will ask to install the “validate password” plugin. This is a good
idea, so reply “y”.

The script will then ask for a password for the MySQL “root” account. It will
also ask for a repeat for confirmation.

The script will then ask to remove anonymous users. This is useful, so reply
“y”.
The script will then ask to disable remote “root” logins. We will always login
locally or through an encrypted SSH tunnel, so reply “y”.

The script then asks to remove the test database. We do not need it, so reply
“y”.

The script then asks to reload the privilege tables to apply the changes. Reply
“y”.

The configuration file “/etc/mysql/mysql.conf.d/mysqld.cnf” contains a


“bind-address = 127.0.0.1” setting which prevents the server from accepting
network requests, which improves security for single-server applications but
does not work for our clustered infrastructure. We will only operate MySQL
on virtual machines that are protected from external access so it is acceptable
to change this setting.

Add a file “/etc/mysql/mysql.conf.d/quarium.cnf” with:


[mysqld]
bind-address = 0.0.0.0
collation-server = utf8mb4_general_ci
character-set-server = utf8mb4

Make sure mysql can read the file:


chmod 644 /etc/mysql/mysql.conf.d/quarium.cnf

When upgrading from some older configurations you may want to add:
lower_case_table_names = 1

Restart MySQL with


systemctl restart mysql

and check if mysql is now listening to all network addresses:


ss –tulpn | grep mysql

It should now be possible to login to the database using:


mysql –u root -p
In Ubuntu systems running MySQL 5.7 (and later versions), the root MySQL
user is set to authenticate using the auth_socket plugin by default rather than
with a password. This allows for some greater security and usability in many
cases, but it can also complicate things when you need to allow an external
program (for example, MySQL Workbench and phpMyAdmin) to access the
user. Determine if this is the case using:
SELECT user,authentication_string,plugin,host FROM mysql.user;
Change the setting using:
ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY
'<password>';
Then execute:
FLUSH PRIVILEGES;

and:
QUIT;

We can return the installation to its original insecure state using:


mysqld --initialize-insecure

In some strange circumstances, the MySQL package will refuse to be


upgraded. This can be repaired by restarting the server and then checking the
upgrade:
mysql_upgrade --defaults-file=/etc/mysql/debian.cnf

After that, comment out the failing “mysql_upgrade” around line 320 in the
file
vi /var/lib/dpkg/info/mysql-server-5.7.postinst

Then re-run the system update again.

Common Database Operations


Log in to MySQL using the root password with:
mysql -u root -p
Create a new database using
CREATE DATABASE <db_name>;

Assign a responsible user using:


GRANT ALL PRIVILEGES ON <db_name>.* TO <user_name>@localhost IDENTIFIED BY
'<password>' WITH GRANT OPTION;

This only allows direct access through the command:


mysql -u <user_name> -p

Assign another responsible user using:


GRANT ALL PRIVILEGES ON <db_name>.* TO <user_name>@'%' IDENTIFIED BY '<password>'
WITH GRANT OPTION;

This allows access via the network and from applications for example
Symfony, Drupal and MediaWiki. Then execute:
FLUSH PRIVILEGES;

and:
QUIT;

To allow root to access the database from all servers execute:


GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '<password>' WITH GRANT
OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' IDENTIFIED BY '<password>' WITH
GRANT OPTION;
FLUSH PRIVILEGES;

To list all databases use:


SHOW DATABASES;

To list all tables in the current database use:


SHOW TABLES;

To list all fields in a table use:


DESCRIBE [<database>.]<table>

To list all indices on a table use:


SHOW INDEX FROM [<database>.]<table>

To delete a database log in to MySQL as the root user and execute:


DROP DATABASE <db_name>

To describe the contents of a database:


mysqlshow -u root -p --count <database>

To verify and repair the integrity of the database:


mysqlcheck --host=somehost --user=<user> --password=<password> -Acg --auto-repair

Use the root user to verify all schemas or another user to verify only the
subset the user can access. In some cases MySQL table names are case-
sensitive. If such databases are transferred for example from an old system
with a case-insensitive file system to Ubuntu with a case-sensitive file system
tables may need to be renamed, for example with the following script:
#!/bin/bash
# uppercase_tables.sh -- rename all database tables to uppercase
DB_HOST=<host>
DB_SCHEMA=<schema>
DB_USER=<user>
DB_PASSWORD=<password>
EXISTING_TABLES=`echo "show tables;" | mysql -u ${DB_USER} --
password=${DB_PASSWORD} -h ${DB_HOST} --skip-column-names ${DB_SCHEMA}`
for EXISTING_TABLE in ${EXISTING_TABLES}
do
UPPERCASE_TABLE=`echo "${EXISTING_TABLE}" | tr "[:lower:]" "[:upper:]"`
if [ "${EXISTING_TABLE}" != "${UPPERCASE_TABLE}" ]
then
echo "ALTER TABLE ${EXISTING_TABLE} RENAME TO ${UPPERCASE_TABLE};"
fi
done | mysql -u ${DB_USER} --password=${DB_PASSWORD} -h ${DB_HOST} ${DB_SCHEMA}

Replication
The most common replication method uses binary logs. At some point GTIDs
(Global Transaction IDs) will gain popularity.
A cluster consists of two servers in a multi-master replication configuration
plus zero or more remote slaves that access the cluster through an encrypted
VPN connection. Ensure all servers have the same version of MySQL (or
slaves higher than the master).

On all participating systems, if possible, start with no schemas and no users


except root. Verify that all participating systems can access each other
through their firewalls at tcp port 3306:
ufw status

On all servers, create a user “replicator” with a not terribly secret password,
allow it to log in from “%” and grant it only the global privilege
“REPLICATION_SLAVE”. Note that the account name and password are
case sensitive.

On all servers, stop the database service:


systemctl stop mysql

On all servers, edit “/etc/mysql/mysql.conf.d/quarium.cnf” to set the server id


and to enable binary logging. The server id must be a unique uint32 number
within the replication graph, for example one higher than the previous server
installation or the LSB of the IP address. Do NOT add slave replication
settings here.
[mysqld]
...
log-bin=<my hostname>-bin
relay-log=<the other hostname>-relay-bin
server-id=<ip>
innodb_flush_log_at_trx_commit=1
sync_binlog=1

Make sure mysql can read the file:


chmod 644 /etc/mysql/mysql.conf.d/quarium.cnf

Each server also creates a unique UUID in “/var/lib/mysql/auto.cnf”. If you


clone a virtual machine, just delete this file and restart or replication will fail.

Also clear out any prior binary logs from “/var/lib/mysql” and empty out the
“/var/log/mysql/error.log”.

Reboot the server to make sure everything starts up properly, or just:


systemctl start mysql

On each master in turn, log into mysql and determine the name and location
of the binary log:
SHOW MASTER STATUS;

On each master or slave replicating off that master, start following the binary
log using:
CHANGE MASTER TO MASTER_HOST='10.0.0.<ip>', MASTER_USER='replicator',
MASTER_PASSWORD='<secret>', MASTER_LOG_FILE='<other hostname>-bin.000001',
MASTER_LOG_POS=156;
START SLAVE;

Check if the replication connection has connected successfully:


SHOW SLAVE STATUS;

At this point schemas, tables, records and users created on one server will be
replicated to the other. Also useful is:
STOP SLAVE;

Backup and Restore


To create a backup of all databases use:
mysqldump --all-databases --events -u root -p > fulldump.sql

For a specific database use:


mysqldump –u <user_name> -p <db_name> > <filename>.sql

In some cases, the content of a database is backed-up to a version control


system like git. In that case it is more useful to place each database record on
a separate line of SQL text:
mysqldump –u <user_name> -p --extended-insert <db_name> > <filename>.sql

To import a specific database into MySQL, from the Linux command line
use:
mysql -u <user_name> -p <db_name> < <filename.sql>
Chapter 13 – Version Control

Git Client and Server


Installation
Git is not included in the standard installation of Ubuntu server and desktop.
To install git use:
apt install git

Configure the preferred editor for commit messages and for various settings:
git config --global core.editor vi
git config --global user.name "Bart Besseling"
git config --global user.email [email protected]
git config --global push.default simple

File Structure
Git does not install a server service component and there is no preferred
location for git repositories. A typical use is to provide a repository directory
that can be accessed by server user accounts that are members of a particular
repository user group.

Another typical use is to create a common “git” user or a dedicated repository


user with an SSH “authorized_keys” entry for each remote user. The git user
will run a “/usr/bin/git-shell” on login which restricts the operations remote
users can perform.

Creating a new Repository


On the git server execute:
cd ~git/data
git init --bare Repo.git

On the development client use TortoiseGit to clone the repository into


“C:\Code\Repo” from the URL:
ssh://[email protected]/home/git/data/Repo.git

On the development server, the staging server and the live server execute:
cd /var/www
git clone ssh://[email protected]/home/git/data/Repo.git .
Chapter 14 - HTTP

Apache Server
On Ubuntu 20.04 we use the Apache 2.4 web server. Created in 1995,
Apache became the first web server software to serve more than 100 million
websites in 2009. As of June 2020, it was estimated to serve 25% of 189
million active web sites. Its main competitor is the free nginx web server
serving 37% but nginx does not have the robust open-source history of
Apache. Microsoft’s proprietary server is the third runner up at 11%. All
other competitors serve only single-digit percentages.

Installation
By default, the Apache web server is not installed but it can be added to an
installation with
apt install apache2 apache2-utils w3m

For diagnostics it is also useful to install


apt install curl

File Structure
The configuration of the Apache web server is located in “/etc/apache2”. In
this directory, the main configuration file is “/etc/apache2/apache2.conf”, but
most configuration can be found in three pairs of directories
“/etc/apache2/conf*”, “/etc/apache2/mods*”, “/etc/apache2/sites*” that
contain links to available and enabled settings.

Web sites are typically stored in “/var/www” and must be accessible to the
“www-data” user and the “www-data” group.
Log files are located in “/var/log/apache2”.

Service Operations
Server operations are managed by the “apache2.service”. Some common
operations are:
apachectl start
systemctl start apache2
apachectl restart
apachectl graceful
systemctl restart apache2
apachectl stop
systemctl stop apache2

We can observe ongoing server operations using


systemctl status apache2
journalctl --unit=apache2
journalctl --follow --unit=apache2
ss -tulpn | grep apache2

The configuration files can be verified with:


apachectl configtest

Basic Web Configuration


An initial default HTTP web site is automatically active at “/var/www/html”.

The default site is configured in “/etc/apache2/sites-available/000-


default.conf” and enabled through a symbolic link in “/etc/apache2/sites-
enabled”.

Additional virtual sites can be configured by adding configurations to “sites-


available” and by enabling the correct set in “sites-enabled”. The order of
default and other sites is important (default has to be first), so make sure file
names sort properly, for example “001-alternate.conf”. It is better to use the
following commands:
a2ensite <site>
a2dissite <site>

Many web sites, including Symfony-based sites, Drupal, Wordpress and


MediaWiki, require these modules to be enabled:
a2enmod status
a2enmod rewrite
a2enmod auth_basic
a2enmod authnz_ldap
a2enmod expires

In some cases we want to use “.htaccess” files included with web applications
like Drupal. Add the following to all files “/etc/apache2/sites-available/*”:
<Directory "/var/www/html">
AllowOverride All
</Directory>

If a system is not going to serve its own web site, redirect any browsers to the
main corporate site in “/var/www/html/.htaccess”:
Redirect 301 / http://www.quarium.com

In some cases we want clients not to cache content. Add the following to the
proper files “/etc/apache2/sites-available/*”:
<IfModule mod_expires.c>
ExpiresActive On
ExpiresDefault "access"
</IfModule>

Make sure to set the ownership of all served files correctly:


chown -R www-data:www-data /var/www

Frequently test the web site configuration with:


apachectl configtest

Restart the Apache web server with:


apachectl graceful
systemctl restart apache2
Generating and Installing Security Certificates
The installation of Apache also installs the “OpenSSL” package. This
package contains utilities for the creation and management of security keys
and certificates. Configuration files, certificates and private keys are located
in “/etc/ssl”.

OpenSSL implements the X.509 standard which uses distinguished encoding


rules (DER) files stored in privacy-enhanced electronic mail (PEM) format
described in RFC 1421 through 1424 (Base64 encoded binary according to
RFC 4648).

While there is no clear standard and the purpose of a Base64 blob is


described by its “-----HEADER----”, file names should by convention have
the following extensions: Private key files should be for example
“quarium.key”. Certificate requests should be for example “quarium.csr”.
Certificates should be for example “quarium.crt” or “quarium.pem”.
Although they are not normally used to configure SSL, public keys should be
for example “quarium.pub”. A wild-card request or certificate (for example
“*.quarium.com”) should NOT contain the special “*” character in the
filename, but rather be named for example “star.quarium.com”.

Multiple PEM blobs may be combined into one file in any order, so an SSL
web site really only needs one security file “quarium.pem” which should
contain its private key, its domain certificate and the certificate of the CA
(certificate authority) which has issued the domain certificate. It is identified
to Apache using the “SSLCertificateFile” directive. The disadvantage of this
is that the key file is less secure, so we DO NOT use this method.

Generate a new private key without a passphrase:


openssl genpkey -algorithm RSA -out star.quarium.com.key -pkeyopt
rsa_keygen_bits:2048

Generate a domain certificate request using the private key and some
organizational input. Note the exact spelling of the organization name, with
case and punctuation. Our entire infrastructure should only need one domain
certificate for “*.quarium.com”:
openssl req -new -key star.quarium.com.key -out star.quarium.com.csr
Country Name (2 letter code) [AU]:US
State or Province Name (full name) [Some-State]:California
Locality Name (eg, city) []:San Francisco
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Quarium, Inc.
Organizational Unit Name (eg, section) []:
Common Name (for example server FQDN or YOUR name) []:*.quarium.com
Email Address []:[email protected]
Please enter the following 'extra' attributes to be sent with your certificate
request
A challenge password []:
An optional company name []:

If a domain certificate must be renewed, a request file can be regenerated


from a key and an existing certificate:
openssl x509 -x509toreq -in star.quarium.com.crt -signkey star.quarium.com.key -out
star.quarium.com.csr

Verify the domain certificate request using:


openssl req -in star.quarium.com.csr -text -verify -noout

Provide the request to the CA who will return its own CA certificate file and
the new domain certificate file.

Alternatively, for testing, generate a self-signed CA and domain certificate


from a request:
openssl x509 -req -days 730 -in star.quarium.com.csr -extensions v3_ca -signkey
star.quarium.com.key -out ca.quarium.com.crt
openssl x509 -req -days 730 -in star.quarium.com.csr -extensions v3_usr -CA
ca.quarium.com.crt -CAkey star.quarium.com.key -CAcreateserial -out
star.quarium.com.crt

Verify the certificate authority certificate file and the new user certificate
using:
openssl x509 -in ca.quarium.com.crt -noout -text
openssl x509 -in star.quarium.com.crt -noout -text

Install the private key and the certificates (concatenated into one file):
cp star.quarium.com.key /etc/ssl/private
cat star.quarium.com.crt ca.quarium.com.crt >/etc/ssl/certs/star.quarium.com.pem
cp star.quarium.com.key /etc/ssl/private
cp star.quarium.com.crt /etc/ssl/certs/star.quarium.com.pem
cp ca.quarium.com.crt /etc/ssl/certs/ca.quarium.com.pem
Or for the official domain certificate:
cp star.quarium.com.key /etc/ssl/private
cat star.quarium.com.crt ca.thawte.com.crt >/etc/ssl/certs/star.quarium.com.pem
cp star.quarium.com.key /etc/ssl/private
cp star.quarium.com.crt /etc/ssl/certs/star.quarium.com.pem
cp ca.thawte.com.crt /etc/ssl/certs/ca.thawte.com.pem

although some older iOS browsers do not like all parts to be in one file.

Correct the ownership and permissions:


chown root:ssl-cert /etc/ssl/private/star.quarium.com.key
chmod 640 /etc/ssl/private/star.quarium.com.key

Make sure that the site configuration includes a proper server name in the
“sites-available” files or some browsers and Java 7 will not negotiate SNI
correctly:
ServerName www.quarium.com
ServerAlias *.quarium.com

Point the SSL virtual host configuration to the security files:


SSLCertificateKeyFile /etc/ssl/private/star.quarium.com.key
SSLCertificateFile /etc/ssl/certs/star.quarium.com.crt
SSLCertificateChainFile /etc/ssl/certs/ca.thawte.com.crt

In some cases a private key may have been protected with a password, which
would require that password to be entered each time a server or service is
restarted. Remove the password from the key file using:
openssl rsa -in protected.star.quarium.com.key -out star.quarium.com.key

For diagnostics, a private key file can be decomposed into its components
using:
openssl rsa -in star.quarium.com.key -text -noout

For diagnostics, a private key file can be used to extract a public key: (Some
applications may need an -RSAPublicKey_out option.)
openssl rsa -in star.quarium.com.key -pubout -out star.quarium.com.pub
In some cases a certificate may be stored in its stricter DER format. Convert
back and forth using:
openssl x509 -in star.quarium.com.crt -outform der -out star.quarium.com.der
openssl x509 -in star.quarium.com.der -inform der -outform pem -out
star.quarium.com.crt

Secure Web Configuration


The installation of Apache also installs the “OpenSSL” package. This
package contains libraries for application use of security keys and
certificates. Configuration files, certificates and private keys are located in
“/etc/ssl”. OpenSSL is not automatically active in Apache. Enable the SSL
module and all its dependencies using “a2enmod” and “a2dismod”:
a2enmod ssl

An initial default HTTPS web site is not automatically active at


“/var/www/html”. It is configured in “/etc/apache2/sites-available/default-
ssl.conf”. Enable it by using “a2ensite” and “a2dissite”:
a2ensite default-ssl

In the file “default-ssl.conf”, point the entries “SSLCertificateFile” and


“SSLCertificateKeyFile” to the correct files in the “/etc/ssl” directory.

In older versions of Ubuntu, in the file “/etc/apache2/mods-enabled/ssl.conf”


change the “SSLProtocol” specification to the following to protect against the
CRIME attack and the Poodle attack on SSL v3:
SSLCompression off
SSLProtocol all -SSLv3 -SSLv2

There are a number of useful public free HTTP HTTPS and HTML check
services on the Internet. They will test a set of standard configuration criteria
and report any problems found for a specified domain. Most problems will be
found in the rapidly evolving security area where all SSL and all TLS
protocols except TLS 1.3 are now more or less seriously compromised.
Basic Authentication with File, PAM or LDAP
We only use basic authentication (RFC 7617) for secure sites. The encryption
of the secure site is much better than that of digest authentication, and digest
authentication is too complicated for non-browser clients.

For file authentication, create a password file in “/etc/apache2”:


htpasswd -c htpasswd quarium

Create a group file in “/etc/apache2/htgroup”:


ops: quarium other

Add the following to the site configuration:


<Location /awstats>
AuthType basic
AuthName "Quarium Operations"
AuthUserFile /etc/apache2/htpasswd
AuthGroupFile /etc/apache2/htgroup
Require group ops
Require all denied
</Location>

For PAM authentication using “mod_authnz_external” and “pwauth”, which


can be set up to use LDAP:
<Location /awstats>
AuthType basic
AuthName "Quarium Operations"
AuthBasicProvider external
AuthExternal pwauth
Require group ops
Require all denied
</Location>

Or best, use LDAP authentication with “mod_authnz_ldap”:


<Location /awstats>
AuthType basic
AuthName "Quarium Operations"
AuthBasicProvider ldap
AuthLDAPURL "ldap://ns1.lan.quarium.com
ns2.lan.quarium.com:389/ou=people,dc=lan,dc=quarium,dc=com?uid" NONE
AuthLDAPGroupAttribute memberUid
AuthLDAPGroupAttributeIsDN off
Require ldap-group cn=ops,ou=groups,dc=lan,dc=quarium,dc=com
Require all denied
</Location>

Digest Authentication
Do not use digest authentication! It is not sufficiently secure and it is too
complicated for some REST clients. But just to see how it works, create a
password file in “/etc/apache2”:
htdigest -c htdigest "Quarium Ops" quarium

Create a group file in “/etc/apache2/htgroup”:


ops: quarium other

Add the following to the site configuration:


<Location /awstats>
AuthType digest
AuthName "Quarium Ops"
AuthUserFile /etc/apache2/htdigest
AuthGroupFile /etc/apache2/htgroup
AuthDigestDomain /awstats/
Require group ops
Require all denied
</Location>

The PAM and LDAP variations should be obvious, but should not be used
either.

WebDAV Configuration
Web Distributed Authoring and Versioning (WebDAV, RFC 4918) allows
users to POST files up to a web site. Desktop Operating Systems like
Windows and MacOS can mount DAV servers as remote file systems.
WebDAV can operate securely over HTTPS.

Turn on the WebDAV and related modules in Apache using:


a2enmod dav_fs
a2enmod dav_lock
Create a data directory and allow apache to write to it:
mkdir -p /var/www/userdata
chown -R www-data:www-data /var/www/userdata
chmod 770 /var/www/userdata

Create a password file for basic authentication:


cd /etc/apache2
htpasswd htpasswd quariumclients

or use the LDAP authentication method described elsewhere.

Duplicate the default site in “/etc/apache2/sites-available” and make the


following changes with the proper DNS names:
...
ServerName data.quarium.com
...
DocumentRoot /var/www/data
...
<Directory /var/www/data/>
Options Indexes MultiViews
Require all granted
</Directory>
<Location />
DAV On
AuthType Basic
AuthName "Quarium"
AuthBasicProvider file
AuthUserFile /etc/apache2/htpasswd
Require user quariumclients
</Location>
...
# <Location /cgi-bin>
# Require all granted
# </Location>
...

Access and Error Log Files


In each regular and secure web site configuration file add the following
(substituting the proper host names). The “-l” option causes the log files to be
rotated at the local midnight of the server. The “-f” and “-c” options cause the
log files to be created even in the absence of traffic to prevent “awstats” error
messages.
ErrorLog "|/usr/bin/rotatelogs -l -f -c ${APACHE_LOG_DIR}/somehost_error.%Y-%m-
%d.log 86400"
CustomLog "|/usr/bin/rotatelogs -l -f -c ${APACHE_LOG_DIR}/somehost_access.%Y-%m-
%d.log 86400" combined

Disable the default log rotation because it interferes with our log file naming
and with awstats:
mv /etc/logrotate.d/apache2 /etc/apache2/logrotate.old

The normal logging level is “warn”. This can be increased in


“/etc/apache2/apache2.conf”:
LogLevel debug

Awstats Log File Analytics


The “awstats” package provides advanced web statistics using a Perl CGI
script. This is not the most modern or secure web analytics available, but it is
easy to install and it provides a reasonable level of reporting on web site visit
statistics. In the default configuration, a cron script builds static web pages
with results. We will change the update frequency to once every five minutes
to avoid having to enable real-time CGI collection.

Installation
Install the awstats package using:
apt install awstats

Also enable CGI:


a2enmod cgi

Make sure the “/var/log/apache2” folder can be read by the “www-data” user:
chmod 755 /var/log/apache2
File Structure
The configuration for awstats is stored in “/etc/awstats”. The main
configuration file is “/etc/awstats/awstats.conf”. Any local parameters can be
placed in “/etc/awstats/awstats.conf.local”. Additional service parameters can
be configured in “/etc/default/awstats”.

Awstats does not have a service daemon. Updates are scheduled through the
“/etc/cron.d/awstats” script. The script is “/usr/lib/cgi-bin/awstats.pl”. Asset
files (icons etc.) are in “/usr/share/awstats”.

Processed data files are in “/var/lib/awstats” and “/var/cache/awstats”. To


start over with collecting data, just remove everything from these two
directories and wait for the cron job to reprocess the current log file.

Configuration
On a server with a single site, edit “awstats.conf.local” or duplicate the
configuration file to “awstats.quarium.conf” for example “quarium.com.conf”
for each separate site.

Make the following changes to the configuration file(s):


LogFile="/var/log/apache2/quarium_access.%YYYY-0-%MM-0-%DD-0.log"
LogFormat=1
SiteDomain="quarium.com"

Change the “update.sh” line in “/etc/cron.d/awstats” to avoid “missing log


file” emails around midnight local time:
5,15,25,35,45,55 * * * * www-data [ -x /usr/share/awstats/tools/update.sh ] &&
/usr/share/awstats/tools/update.sh

Add the following to each "<VirtualHost _default_:443>":


# allow execution of the awstats.pl script
<Directory "/usr/lib/cgi-bin">
AllowOverride None
Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
Require all granted
</Directory>
Redirect 404 /cgi-bin
# require ldap authentication for awstats reports
<Location /awstats>
AuthType basic
AuthName "Quarium Operations"
AuthBasicProvider ldap
AuthLDAPURL "ldap://ns1.lan.quarium.com
ns2.lan.quarium.com:389/ou=people,dc=lan,dc=quarium,dc=com?uid" NONE
AuthLDAPGroupAttribute memberUid
AuthLDAPGroupAttributeIsDN off
Require ldap-group cn=ops,ou=groups,dc=lan,dc=quarium,dc=com
Require all denied
</Location>
# map to awstats scripts and assets
ScriptAlias /awstats /usr/lib/cgi-bin
Alias /awstatsclasses "/usr/share/awstats/lib/"
Alias /awstats-icon "/usr/share/awstats/icon/"
Alias /awstatscss "/usr/share/doc/awstats/examples/css"

Access the statistics page, for example from:


https://quarium.com/awstats/awstats.pl?config=quarium.com
Chapter 15 – PHP
As of June 2020, the open source language PHP was used on 78.8% of all
web sites for which the language was known. The proprietary Microsoft
ASP.net was second at 10.3%. No other server application programming
language including Java, Ruby and Python had more than a single-digit
market share.

Apache PHP Server Module


Ubuntu 20.04 supports PHP version 7.4 but this support is not installed by
default in either the desktop or the server configuration.

The main use of PHP is as a module integrated with Apache. It is also


occasionally used as a command-line scripting languages due to its
portability among operating systems.

Installation
To install the default version of PHP with a few useful extensions, execute
the following:
apt install php libapache2-mod-php php-mysql php-cli
apt install php-curl php-gd php-mbstring php-ldap php-intl php-zip
apt install php-uploadprogress
apt install php-xml

For development systems it may also be useful to install:


apt install php-xdebug

In the rare case when it is necessary to compile PHP extensions from source
we could add:
apt install php-dev
File Structure
The various PHP configuration files are located in “/etc/php/7.4”. Two
subdirectories “apache2” and “cli” separately configure the web server and
the commandline settings. In each, the main configuration file is “php.ini”. A
directory “conf.d” is used for both system and user configuration files.

Configuration
Add the following in a “conf.d/99-quarium.ini” file for each environment:
cat <<EOF >/etc/php/7.4/apache2/conf.d/99-quarium.ini
memory_limit = 256M
max_execution_time = 60
date.timezone = "UTC"
date.default_latitude = 37.58417
date.default_longitude = -122.365
EOF
cp /etc/php/7.4/apache2/conf.d/99-quarium.ini /etc/php/7.4/cli/conf.d/99-
quarium.ini
chmod 777 /etc/php/7.4/*/conf.d/99-quarium.ini

You can obtain exact latitude and longitude information for a server location
from Google maps.

Upgrading
To upgrade a default version to a newer version, for example the version 7.2
used in Ubuntu 18.04 LTS to 7.4, first add some new repositories:
add-apt-repository ppa:ondrej/php
add-apt-repository ppa:ondrej/apache2

We can then perform the usual package updates:


apt update && apt upgrade

Also carry any local modifications from the “/etc/php/7.2/*/php.ini” files to


the “7.4” versions. We can then verify which version is now enabled:
php --version

We also need to enable the correct module in Apache:


a2dismod php7.2 && a2enmod php7.4 && apachectl graceful

Composer Configuration
Composer is by far the most popular application-level package manager
utility for PHP libraries, including those published on the Packagist
repository.

Ubuntu 20.04 includes a package “composer” but unfortunately it only


installs a version which does not know the command “self-update”. Instead,
follow the latest instructions from “http://getcomposer.org” using a script
“install_composer.sh”:
#!/bin/sh
set -x
EXPECTED_CHECKSUM="$(wget -q -O - https://composer.github.io/installer.sig)"
php -r "copy('https://getcomposer.org/installer', 'composer-setup.php');"
ACTUAL_CHECKSUM="$(php -r "echo hash_file('sha384', 'composer-setup.php');")"
if [ "$EXPECTED_CHECKSUM" != "$ACTUAL_CHECKSUM" ]
then
>&2 echo 'ERROR: Invalid installer checksum'
rm composer-setup.php
exit 1
fi
php composer-setup.php
RESULT=$?
rm composer-setup.php
exit $RESULT

Copy the resulting command:


mv composer.phar /usr/local/bin/composer
chmod 755 /usr/local/bin/composer

Make sure the “unzip” command is installed:


apt install unzip

Verify proper installation with:


composer --version
Regularly check for updates using:
composer self-update

GeoIP configuration
First install the corresponding packages:
apt install geoipupdate geoip-database php-geoip

Install and update the database files in “/var/lib/GeoIP”:


geoipupdate –v

The free databases are not updated frequently, or at all anymore, it seems.
Updating them now requires a license key. Comment-out the update in
“/etc/cron.d/geoipupdate”:
# 47 6 * * 3 root test -x /usr/bin/geoipupdate && /usr/bin/geoipupdate
Chapter 16 - Wiki
In the 1960’s, Ted Nelson’s project Xanadu was the first to explore
“hypertext” and later “stretchtext”. When an actual implementation never
materialized, in the 1980’s, Tim Berners Lee was among several others
experimenting with its concepts in the form of a “World Wide Web”. In
1995, Ward Cunningham published the first“WikiWikiWeb”, a user-editable
website. On Monday 15 January 2001, “wikipedia.org” went online and by
September of the same year it was widely popular with over 10,000 entries.
After 2002, Wikipedia ran on its own PHP wiki software, “MediaWiki”
which was published as free open-source software in 2003. In June 2020,
Wikipedia exceeded 50 million articles across 310 language editions.

MediaWiki Server
Installation
On the git version control server, create an empty repository for a new
project:
git init --bare someproject.git

Make sure the repository has the correct access permissions.

Create a Ubuntu 20.04 LTS application virtual machine with fixed-address


lan networking, ufw, SSH, a satellite email server, a MySQL server, an
Apache web server and the PHP web server plugin with the suggested PHP
libraries.

Make sure the virtual machine that will serve the wiki(s) has an SSH private
key “~root/.ssh/id_rsa” and make sure that the contents of the corresponding
“~root/.ssh/id_rsa.pub” have been added to a user account on the git server
that has access to the repository.
Remove all existing files from “/var/www/html” and clone the git repository
into the “/var/www/html” directory or a subdirectory:
git clone ssh:…someproject.git .

Create a subfolder “backups” and inside it retrieve the current release of the
MediaWiki software (1.34 as of this writing):
mkdir backups
cd backups
wget https://releases.wikimedia.org/mediawiki/1.34/mediawiki-1.34.2.tar.gz

Unpack the software and move it to the top level of the repository:
tar xzvf mediawiki-1.34.2.tar.gz
(cd mediawiki-1.34.2 && find . -print | cpio -pduvm ../..)
rm –rf mediawiki-1.34.2

Make sure the web directory has the ownerships needed for access by the
Apache web server:
chown –R www-data:www-data /var/www/html

Now is a good time to add and commit the initial untouched release files into
the git repository and push them to the git server:
git add –all && git commit && git push

Next, create a MySQL database with a user and password for the wiki.

File Structure
The configuration of the wiki is stored in a file “LocalSettings.php” in the top
directory of the application. If you are using the same repo for multiple
separate wiki application servers, you can keep these local settings out of the
repo using “.gitignore” and instead make copies of the different
configurations in a “backups” subdirectory you can also use to store database
backups.
Site Configuration
Point a web browser to the URL that serves the wiki and follow the
configuration instructions. The first page merely says that the wiki has not
been set up yet and provides a link to do so.

Select the languages for the user and the wiki. Both default to “en – English”.

The next screen provides the results of a check if the server has been set up
properly. Follow the instructions provided to make any corrections necessary.

The next two pages ask for the database host (“localhost”), the database
name, the database user and the password.

The next page asks for a wiki name, for example “Company Project Wiki”
and for the account name of the “webmaster”, another password and an email
address.

For basic wikis this is sufficient but for more secure corporate wikis we can
select “private wiki”. This requires all users to log in before reading or
editing any wiki topics.

Set the email return address to an account that is able to receive comments
from users, for example “[email protected]”.

Decide if you want to enable watchlist and user talk page notifications.

Enable “Special pages” “ReplaceText”. Enable “Parser hooks”


“CategoryTree” and “ParserFunctions”. Enable “MediaHandler”
“PdfHandler”. Enable “file uploads”.

The next page will start the wiki database installation.

After the process is complete you will be able to download a new


“LocalSettings.php” file that must be placed in the wiki directory on the
server to enable operations.

You may install a 135 by 135 pixel RGB or RGBA logo image in the
“images” directory and refer to it from “LocalSettings.php”:
$wgLogo = "$wgResourceBasePath/images/<logo name>.png";

You may enable image resize on upload by un-commenting the following in


“LocalSettings.php”:
#$wgUseImageMagick = true;
#$wgImageMagickConvertCommand = "/usr/bin/convert";

Make sure this and all files in the wiki directory have the correct ownership
and permissions:
chown –R www-data:www-data /var/www/html

It should now be possible to log into the wiki using the webmaster account.

Metrolook Skin Extension


This skin modernizes the user interface of a wiki. Use the MediaWiki web
site to download, add and push the distribution file of latest version of the
“Metrolook” skin into the “backups” directory of the repository. Unpack the
download into a folder “Metrolook” and move it to the “extensions” folder
using:
tar xzvf Metrolook-REL1_34-48c1b49.tar.gz
rm –rf ../skins/Metrolook
mv Metrolook ../skins

Enable the extension by adding the following to the end of the


“LocalSettings.php” file of each wiki:
$wgDefaultSkin = "metrolook";
wfLoadSkin( 'Metrolook' );

Backups and Updates


Periodically, the wiki software should be updated, possibly from the version
control database. The contents of the version control database can be
retrieved using:
git fetch && git pull

Then, the database that stores the wiki content should be backed up and
possibly also stored in the version control database. A backup can be made
using:
mysqldump --host=localhost --user=<user> --password=<password> <database>
>backups/<database>Content.sql

Also back up the configuration file using:


cp LocalSettings.php backups/<database>Settings.php

Keep the top-level “LocalSettings.php” file out of the git repository using a
file “.gitignore” containing:
LocalSettings.php

After this, the database can be updated using:


php maintenance/update.php

Make sure that all files can be accessed by the web server:
chown -R www-data:www-data /var/www/html

Then add the new backup information into the version control database:
git add --all && git commit && git push
Chapter 17 - Blog

WordPress Server
Wordpress is an open source content management system (CMS) written in
PHP that is mostly used for blogging. The software was first released in 2003
and as of October 2018 it had an installed base of between 19 and 76 million
sites.

Installation
On the git version control server, create an empty repository for a new
project:
git init --bare someproject.git

Make sure the repository has the correct access permissions.

Create a Ubuntu 20.04 LTS application virtual machine with fixed-address


lan networking, ufw, SSH, a satellite email server, a MySQL server, an
Apache web server and the PHP web server plugin with the suggested PHP
libraries.

Make sure the virtual machine that will serve the blog(s) has an SSH private
key “~root/.ssh/id_rsa” and make sure that the contents of the corresponding
“~root/.ssh/id_rsa.pub” have been added to a user account on the git server
that has access to the repository.

Remove all existing files from “/var/www/html” and clone the git repository
into the “/var/www/html” directory or a subdirectory:
git clone ssh:…someproject.git .

Create a subfolder “backups” and inside it retrieve the current release of the
Wordpress software:
mkdir backups
cd backups
wget https://wordpress.org/latest.tar.gz

Unpack the software and move it to the top level of the repository:
tar xzvf latest.tar.gz
(cd wordpress && find . -print | cpio -pduvm ../..)
rm –rf wordpress
cd ..
cp wp-config-sample.php wp-config.php
mkdir wp-content/upgrade
find . -type d -exec chmod 750 {} \;
find . -type f -exec chmod 640 {} \;

Make sure the web directory has the ownerships needed for access by the
Apache web server:
chown –R www-data:www-data /var/www/html

File Structure
The configuration of the blog is stored in a file “wp-config.php” in the top
directory of the application. If you are using the same repo for multiple
separate blog application servers, you can keep these local settings out of the
repo using “.gitignore” and instead make copies of the different
configurations in a “backups” subdirectory you can also use to store database
backups.

Site Configuration
Create a MySQL database with a user and password for the blog and add the
values for “DB_NAME”, “DB_USER” and “DB_PASSWORD” to the “wp-
config.php” file.

In the file “wp-config.php” add the following to facilitate updates without


using FTP:
define('FS_METHOD', 'direct');

Next, obtain a number of unique key values for the installation:


curl -s https://api.wordpress.org/secret-key/1.1/salt/

Replace the placeholder values in the “wp-config.php” file with the output.

Point a web browser to the URL that serves the blog and follow the
configuration instructions.

Select “English (United States)” as the installation language.

Enter the site name and a user name (“webmaster”), password and email
address for an administrative user account.

Backups and Updates


Periodically, the blog software should be updated, in the case of Wordpress
using its built-in mechanism, but possibly also from the version control
database. The contents of the version control database can be retrieved using:
git fetch && git pull

Then, the database that stores the blogi content should be backed up and
possibly also stored in the version control database. A backup can be made
using:
mysqldump --host=localhost --user=<user> --password=<password> <database>
>backups/<database>-content.sql

Also back up the configuration file using:


cp wp-config.php backups/<database>-config.php

Make sure that all files can be accessed by the web server:
chown -R www-data:www-data /var/www/html

Then add the new backup information into the version control database:
git add --all && git commit && git push
Chapter 18 – CMS

Drupal Server
Drupal is an open source content management system (CMS) written in PHP.
The project was started in 2001 and as of June 2020 it was up to version 9.0.1
and it had an installed base of 1.2 million sites.

Installation
On the git version control server, create an empty repository for a new
project:
git init --bare someproject.git

Make sure the repository has the correct access permissions.

Create a Ubuntu 20.04 LTS application virtual machine with fixed-address


lan networking, ufw, SSH, a satellite email server, a MySQL server, an
Apache web server and the PHP web server plugin with the suggested PHP
libraries.

Make sure the virtual machine that will serve the CMS has an SSH private
key “~root/.ssh/id_rsa” and make sure that the contents of the corresponding
“~root/.ssh/id_rsa.pub” have been added to a user account on the git server
that has access to the repository.

Remove all existing files from “/var/www/html” and clone the git repository
into the “/var/www/html” directory or a subdirectory:
git clone ssh:…someproject.git .

Create a subfolder “backups” and inside it retrieve the current release of the
Drupal software:
mkdir backups
cd backups
wget https://www.drupal.org/download-latest/tar.gz

Unpack the software and move it to the top level of the repository:
tar xzvf tar.gz
(cd drupal-9.0.1 && find . -print | cpio -pduvm ../..)
rm –rf drupal-9.0.1

Prevent dependent files from being added to the git repository:


cd ..
mv example.gitignore .gitignore

Complete the installation by retrieving the dependent modules with


composer:
composer install --no-dev
composer update

Install the Drupal shell (“drush”):


composer global require drush/drush:dev-master

Drush can then be used to determine the status of the installation:


$HOME/.composer/vendor/bin/drush status

Next, create a MySQL database with a user and password for the CMS.

File Structure
The configuration of the content management system is stored in a file
“sites/default/settings.php” in the application. If you are using the same repo
for multiple separate content management system application servers, you
can keep these local settings out of the repo using “.gitignore” and instead
make copies of the different configurations in a “backups” subdirectory you
can also use to store database backups.

Site Configuration
Point a web browser to the URL that serves the CMS and follow the
configuration instructions.

Select the default language “English”.

Select the “Standard” profile.

Verify the installation requirements are met, ignore only a possible “clean
URL” notice for now and “continue anyway”.

Enter the database schema name and the database user name and password.

Observe the initial installation of the site.

Enter the domain name, “[email protected]”, the management user


name and password and the default country “United States” for the site.

Observe the initial home page for the site (which will automatically log you
in to the administrative user account) and make sure the URL rewrite
specified in the “.htaccess” file is working by selecting any administrative
menu entry.

Check the status report for any problems with the configuration.

Add the following to the “sites/default/settings.php” file to indicate which


hosts are trusted:
$settings['trusted_host_patterns'] = array(
'^quarium\.com$',
'^.+\.quarium\.com$',
'^quarium\.net$',
'^.+\.quarium\.net$',
);

Use the “drupal.org” website to obtain the “tar.gz” URL for the “bootstrap 4”
theme. Then use the “appearance” menu to install and enable it as the default.

Use the “drupal.org” website to obtain the “tar.gz” URL for the “admin
toolbar” module. Then use the “extend” menu to install and enable it and its
sub-modules.
Create a new basic page with title “Access Denied” with URL “/access-
denied” and the content:
We're sorry, but you must have permission to view the page you requested.
If you are already a registered member of this site, please try logging in.
If you are not a member, you need to join us.
If you have any questions about our site or group, please feel free to contact us.

Create a new basic page with title “Page Not Found” with URL “/page-not-
found” and the content:
We're sorry, but the page you were looking for currently does not exist.
We redesign our site frequently and many pages may have changed.
If you are unable to find something on our new site or have a question about our
site or services feel free to contact us.

In Configuration -> System -> Basic Site Settings, inspect and complete the
settings including the “/access-denied” and “/page-not-found” pages.

In Configuration -> People -> Account Settings, set the name of the
anonymous user to “guest”. In Appearance -> Settings, set the logo image
and the favicon.

Create a user with role “Administrator” and one with role “Authenticated
User” and verify that the site can send email with the proper “from” address.

Backups and Updates


Periodically, the CMS software should be updated, in the case of Drupal
using its built-in mechanism, but possibly also from the version control
database. The contents of the version control database can be retrieved using:
git fetch && git pull

Then, the database that stores the CMS content should be backed up and
possibly also stored in the version control database. A backup can be made
using:
mysqldump --host=localhost --user=<user> --password=<password> <database>
>backups/<database>-content.sql
Also back up the configuration file using:
cp sites/default/settings.php backups/<database>-settings.php

Make sure that all files can be accessed by the web server:
chown -R www-data:www-data /var/www/html

Then add the new backup information into the version control database:
git add --all && git commit && git push
Chapter 19 – Framework

Symfony Server
Symfony is a PHP web application framework and a set of reusable PHP
components/libraries published as free software since 2005. It is sponsored
by SensioLabs, a French software developer and professional services
provider. Symfony is currently the second most popular web framework after
Laravel and ahead of CodeIgniter and Zend, but it has such major advantages
that it has been adopted by other major PHP frameworks such as Laravel
itself and Drupal for their internal functionality.

Installation
On the git version control server, create an empty repository for a new
project:
git init --bare someproject.git

Make sure the repository has the correct access permissions.

Create a Ubuntu 20.04 LTS application virtual machine with fixed-address


lan networking, ufw, SSH, a satellite email server, a MySQL server, an
Apache web server and the PHP web server plugin with the suggested PHP
libraries.

Make sure the virtual machine that will serve the site has an SSH private key
“~root/.ssh/id_rsa” and make sure that the contents of the corresponding
“~root/.ssh/id_rsa.pub” have been added to a user account on the git server
that has access to the repository.

On the application server, remove all existing files from “/var/www/html”


and clone the git repository into the “/var/www/html” directory:
git clone ssh:…someproject.git .

Create a subfolder “backups” and inside it retrieve the current release of the
Symfony software:
mkdir temp backups
cd temp
composer create-project symfony/skeleton .
find . -print | cpio -pduvm ..
cd ..
rm –rf temp

We use an intermediate directory “temp” to create the initial project because


Symfony insists that the project directory be completely empty. The
developers could have made an exception for a “.git” directory.

Make sure the web directory has the ownerships needed for access by the
Apache web server:
chown –R www-data:www-data /var/www/html

Unlike the other PHP web applications, Symfony does not serve its top-level
directory, but only its “public” subdirectory. In the “/etc/apache2/sites-
available/*” files, make the following change:
DocumentRoot /var/www/html/public

Tell apache to pick up the configuration change:


apachectl graceful

File Structure
Typically, the local configuration parameters of a Symfony application are
passed in as environment values by the web server. In our case they would be
stored in the “/etc/apache2/sites-available/*” files. In these cases, the git repo
will not contain any server-specific configuration files but only the
application and its assets.

Site Configuration
Point a web browser to the URL that serves the site. The initial version of
Symfony will not have any functionality other than an information page.

The second book in this series explains how to turn this initial distribution
into a complete application for serving web pages and a REST API.
Chapter 20 - Global Load Balancing
Our infrastructure consists of one or more firewall servers and one or more
application servers. The purpose of the firewall servers is to isolate the
application servers from the public Internet. The purpose of the application
servers is to run web sites and REST APIs and various supporting DHCP,
DNS, LDAP, SMTP, POP3 and IMAP applications on the internal network.
The purpose of having multiple servers for both firewall functions and
application functions is that we can perform maintenance on any one server
without disabling the application service.

In addition, the configuration described here provides load balancing over all
available physical and virtual machines. This must not only work over the
entire world, but by preference, requests from users in particular geographic
locations should be served by application servers located nearest in terms of
jurisdictions and then Internet communication hops and bandwidth. Some
servers will be located in jurisdictions that require information about their
citizens to be stored inside the jurisdiction only. Other jurisdictions will on
occasion or permanently prevent their citizens from communicating with
servers located outside the jurisdiction.

Packet Load Balancing


The firewall servers are connected both to the public Internet and to an
internal network at fixed IP addresses. The application servers are only
connected to the internal network at fixed IP addresses. The fixed public IP
addresses of the firewall servers are each published at
“somehost.quarium.com” and in round-robin fashion at “quarium.com”. The
fixed internal IP addresses of the application servers are managed by an
internal DNS zone as “somehost.lan.quarium.com”.

When a remote system requests resolution of the “quarium.com” address, it


receives a set of DNS server addresses for the domain from the parent “.com”
domain name servers in no particular order.
The resolver on the remote system then sends a DNS resolution request to
each of these name servers in turn, until it receives a response or until the
requests all time out. If the remote system succeeds in getting a response
from any DNS server for the domain, it receives a list of fixed public IP
addresses of all firewall servers in no particular order.

The remote system then tries to contact each server in turn, using either a
UDP or a TCP protocol, until it succeeds in establishing a connection to one
of the firewall servers. The firewall servers forward all requests from remote
systems to application servers on the internal network.

This load-balanced redundant mechanism is all already part of the standard


way TCP/IP and DNS work. Effectively, if the service has a presence in any
political jurisdiction that may or may not allow access to the Internet in other
jurisdictions, as long as any DNS server and any application server is
running, users will be able to contact an instance of the service.

Protocol Load Balancing


On a firewall server running the Ubuntu distribution of the Linux Operating
System, some requests for supporting protocols, such as DNS, SMTP and
POP3S, are forwarded at a packet level to the proper application server by
“netfilter” modules in the kernel of the firewall server under control of the
“iptables” utility and the “ufw” wrappers.

The netfilter modules perform “network address translation” (NAT). In all


packets received from a remote system, they substitute the packet source [A1,
P1] with the internal IP address and an unused port of the firewall server [A3,
P3] and they substitute the packet destination [A2, P2] with the internal IP
address and port of the application server [A4, P4]. The netfilter modules
then maintain a table of the resulting address and port mapping:

Cli IP Cli Port FW IP1 FW port1 FW IP2 FW port2 App IP App Port

A1 P1 A2 P2 A3 P3 A4 P4
The firewall server may have multiple public and private addresses, so the
table must also remember which of its own addresses A2 and A3 were used.

The tricky part in any NAT is how to allocate port numbers P3. As long as a
particular combination [A3, P3, A4, P4] is unique then a response from an
application server [A4, P4] to the NAT [A3, P3] can be returned to the
correct client [A1, P1]. If this combination (table key) is already allocated
and is still active, another P3 must be calculated. Linux “netfilter” originally
chooses P3 = P1 and for additional connections [A1, P1] it increments a
previously used P3. This predictable behavior makes NAT traversal
algorithms possible.

When a packet is received from an application server, the netfilter modules


substitute the packet destination [A3, P3] to the IP address and port of the
remote system [A1, P1] and the packet source to the IP address of the firewall
server [A2, P2]. This guarantees that the remote system will be unaware of
the existence of a separate application server and its internal IP address and
port [A4, P4].

When an application server needs to access a remote server, for example to


update its Ubuntu installation or to communicate with another application
server located elsewhere, a similar NAT mechanism and the same mapping
table are used. The outgoing packets from the application server are
“masqueraded” on the firewall before being forwarded to the remote server.
Response packets are masqueraded back to the application server. To a
remote server, all requests from internal servers appear to originate on the
firewall server.

In practice we forward DNS requests that arrive at each different firewall


server to a separate redundant DNS application server. We forward all SNMP
and POP3S requests arriving at any firewall server to a single SMTP and
POP3S application server.

We could also use this same mechanism to forward all requests for the web
protocols HTTP and HTTPS to a single application server. Instead, we may
want to operate a number of separate web and REST API applications, and in
that case we must forward such requests based on the HTTP header of the
request. This is beyond the capabilities of the netfilter modules and we will
operate a “reverse proxy” Apache web server on each firewall server.

Application Load Balancing


We must configure each reverse proxy Apache web server to only forward
requests to application servers and never serve any such requests from files
stored on the firewall server. We do want to shield the application servers
from the configuration and processing involved in securing HTTPS requests.
We want the firewall servers to maintain secure connections to remote
systems but we want them to forward all requests on the internal network as
plaintext HTTP requests. This is safe since the requests will only be
forwarded on the bridge device of the virtual machine host, not on an actual
cable exposed in a data center.

We can configure Apache web servers on firewall servers to do this in two


ways: We can separate applications by domain name and by URL path.

Separation by Domain
When we separate applications by domain name, we create a separate
“quarium.com” DNS entry that points to our firewall servers for each
application. This domain is then load-balanced as described above.

We must purchase a separate security certificate for each quarium.com, and


possibly even for each “<subdomain>.quarium.com” or wild card
“*.quarium.com” and we must install each such certificate in the
configuration of the “openssl” package on each firewall server.

Then we create one Apache configuration file for the HTTP virtual host and a
separate one for the HTTPS virtual host, for example “<nnn>-quarium.conf”
and “<nnn>-quarium-ssl.conf”:
<VirtualHost *:80>
ServerName quarium.com
ServerAlias *.quarium.com

</VirtualHost>
and
<IfModule mod_ssl.c>
<VirtualHost _default_:443>
ServerName quarium.com
ServerAlias *.quarium.com

</VirtualHost>
</IfModule>

The Apache configuration files are processed in reverse sorted order and the
files with the alphabetically first name, for example “000”, will match
requests that do not specify a domain. Requests for specified domains that are
not configured will be rejected by the Apache server.

If we configure each virtual host configuration on each firewall server to


forward requests to an application server, we satisfy the requirement that no
files are to be served locally from any firewall server.

At this point we can consider redirecting all HTTP requests to their


corresponding HTTPS endpoints, to force remote systems to use secure
connections for all transactions with our application servers. We do this by
adding the following to each HTTP configuration file:
Redirect 301 / https://quarium.com

Each application server will have its own Apache web server with its own
configuration files, typically “000-default.conf” and “default-ssl.conf” which
serve the local application. If we let the firewall server handle the secure
channel, we only need to enable the “000-default.conf” file on the application
server.

Separation by Path
In addition to separation by domain, we can separate applications for a single
domain by URL path. We must make sure that these paths do not collide
between applications and in some cases they should not be obvious to users.
We can for example choose randomized paths in the same way we choose
randomized passwords. In other cases they can be obvious, for example
“quarium.com/wiki”.
In this example, we’ll use two paths “quarium.com” and
“quarium.com/babUb4HAWret”. Note that when a web server is asked for
non-file paths like this, it will actually redirect the request to the directory
path “quarium.com/babUb4HAWret/” and then serve one of the files
specified in a “DirectoryIndex” directive, for example “index.php”.

We’ll configure URL paths by creating a separate definition for each path in
both HTTP and HTTPS firewall server configuration files for the domain (or
only in the HTTPS file, if we have redirected all HTTP requests). We can
serve as many different paths as needed. In Apache, each URL application
path is called a “location” and for each location we will perform a reverse
proxy to the proper application server. The Apache web server does this
using the following optional modules:
a2enmod proxy
a2enmod proxy_http
a2enmod proxy_html
a2enmod proxy_balancer
a2enmod substitute

To operate these modules, we add a few directives that are global to each
virtual host. This first directive tells the proxy module to disable its “forward
proxy” and only operate in its “reverse proxy” functions. This is critical for
the security of the firewall:
ProxyRequests off

We then add a set of directives specific to each location in each virtual host
for the “mod_proxy” module:
ProxyPass /babUb4HAWret http://somehost2.lan.quarium.com
<Location /babUb4HAWret>
SetEnv filter-errordocs
ProxyPassReverse http://somehost2.lan.quarium.com
ProxyPassReverseCookieDomain lan.quarium.com quarium.com
ProxyPassReverseCookiePath / /babUb4HAWret/
</Location>
ProxyPass / http://somehost1.lan.quarium.com/
<Location />
SetEnv filter-errordocs
ProxyPassReverse http://somehost1.lan.quarium.com/
ProxyPassReverseCookieDomain lan.quarium.com quarium.com
</Location>

These directives remap each incoming request HTTP header to the proper
application server and if the application server then responds, for example
with a redirect location HTTP header, it maps those locations in reverse.

Note that the use or omission of trailing “/” in the entire section is critically
important but is definitely not always obvious.

Note the inconsistent trailing-slash behavior of “ProxyPass” for the two


locations and note that “ProxyPassReverse” does not have a lot of impact on
the results in our application since it only modifies the rare “Location” and
“Content-Location” response headers.

If we try these by themselves, we’ll notice that any links inside the HTML
content (for example in the <a href=””> tags) still point to the application
server and not to the public URL path. To make this work, we’ll use the
“proxy_html” module. We expand the location to:
ProxyPass /babUb4HAWret http://somehost2.lan.quarium.com
<Location /babUb4HAWret>
SetEnv filter-errordocs
ProxyPassReverse http://somehost2.lan.quarium.com
ProxyPassReverseCookieDomain lan.quarium.com quarium.com
ProxyPassReverseCookiePath / /babUb4HAWret/
SetOutputFilter INFLATE;DEFLATE;
ProxyHTMLEnable on
ProxyHTMLURLMap / /babUb4HAWret/ c
ProxyHTMLURLMap http://somehost2.lan.quarium.com /babUb4HAWret c
</Location>
ProxyPass / http://somehost1.lan.quarium.com/
<Location />
SetEnv filter-errordocs
ProxyPassReverse http://somehost1.lan.quarium.com/
ProxyPassReverseCookieDomain lan.quarium.com quarium.com
SetOutputFilter INFLATE;DEFLATE;
ProxyHTMLEnable on
ProxyHTMLURLMap http://somehost1.lan.quarium.com/ / c
ProxyHTMLURLMap http://somehost1.lan.quarium.com / c
</Location>

The output filter allows us to map URLs in compressed content as well. The
“proxy_html” filter is automatically inserted between the “INFLATE” and
“DEFLATE” filters. And again we want to map both URLs that include a
host name and those that do not.

Strangely, for the “/” location in the secure configuration only, the two
“ProxyHTMLURLMap” directives must be reversed!!!
We do not want to expand just any “/” or URL in the HTML content. Apache
will only rewrite paths inside attributes of elements listed in
“ProxyHTMLLinks” directives. Fortunately, a good default set is part of the
standard configuration of the “proxy_html” module and we do not have to
add any in our own configuration files. This is the standard set in “mods-
available/proxy_html.conf” in Ubuntu 20.04 LTS:
ProxyHTMLLinks a href
ProxyHTMLLinks area href
ProxyHTMLLinks link href
ProxyHTMLLinks img src longdesc usemap
ProxyHTMLLinks object classid codebase data usemap
ProxyHTMLLinks q cite
ProxyHTMLLinks blockquote cite
ProxyHTMLLinks ins cite
ProxyHTMLLinks del cite
ProxyHTMLLinks form action
ProxyHTMLLinks input src usemap
ProxyHTMLLinks head profile
ProxyHTMLLinks base href
ProxyHTMLLinks script src for

The only addition we might like to add to our own site configuration file is:
ProxyHTMLLinks button formaction

Similarly, we need to indicate which scripting events should be mapped.


Fortunately there is a default set for these also:
ProxyHTMLEvents onclick ondblclick onmousedown onmouseup \
onmouseover onmousemove onmouseout onkeypress \
onkeydown onkeyup onfocus onblur onload \
onunload onsubmit onreset onselect onchange

There is a remaining problem: The application server sees all requests as if


they are coming from the firewall server, based on the source address of the
incoming request packets. This prevents us from doing analytics or
application-level geolocation of our users. We can include the following
global directive to provide the application with an “X-Forwarded-For” header
(“$_SERVER['HTTP_X_FORWARDED_FOR']” in PHP):
RemoteIPHeader X-Forwarded-For

Now all URLs in HTTP headers and HTML content are mapped correctly.
But this is not the case for any supporting “.js”, “.css” and “.json” files or the
JSON output of an API.

You can turn on debugging for proxy operations using:


LogLevel warn proxy:trace3 proxy_html:trace3

If you then request any of these files, you’ll see the message: “Non-HTML
content; not inserting proxy-html filter” This means that the “proxy_html”
filter is only used for HTML content. The documentation is explicit on that
point as well: “Note that the proxy_html filter will only act on HTML data
(Content-Type text/html or application/xhtml+xml) and when the data are
proxied.”

The filter does process CSS and JavaScript, but only if they are embedded in
the text of the HTML document! Even this could be very dangerous: In
HTML, the “/” character only occurs in URLs and outside element attributes
as content. In CSS and JavaScript, comments are delimited by “/*” and “*/”.
In JavaScript we can also use “//”. We do not want these “/” characters
interpreted as the root URL and have them expanded into
“/babUb4HAWret*”. This is the reason for the “c” flag at the end of the line:
ProxyHTMLURLMap / /babUb4HAWret/ c

The flag prevents the rule from being applied in embedded CSS and
JavaScript. This does mean that an embedded “window.location.href = ‘/’;”
may not have the intended effect.

Clearly we’ll need some other way to correctly reverse proxy secondary files
that may contain URLs. Fortunately, the “.js” and “.css” files are always used
in conjunction with HTML and we can employ the best practice of only using
relative URLs in these files.

A bigger problem exists with “.json” files (or really any other textual data
files) and with JSON API output. For this we’ll need to add a separate output
filter and then we tell it to substitute the external URLs for the internal URLs.
We could complicate things more by trying to map relative URLs correctly as
well, but instead we recommend the best practice of only embedding absolute
URLs in REST responses. The final reverse proxy location now looks like
this:
ProxyPass /babUb4HAWret http://somehost2.lan.quarium.com
<Location /babUb4HAWret>
SetEnv filter-errordocs
ProxyPassReverse http://somehost2.lan.quarium.com
ProxyPassReverseCookieDomain lan.quarium.com quarium.com
ProxyPassReverseCookiePath / /babUb4HAWret/
SetOutputFilter INFLATE;DEFLATE;
ProxyHTMLEnable on
ProxyHTMLURLMap / /babUb4HAWret/ c
ProxyHTMLURLMap http://somehost2.lan.quarium.com /babUb4HAWret c
AddOutputFilterByType INFLATE;SUBSTITUTE;DEFLATE application/hal+json
application/json
Substitute
"s!http://somehost2.lan.quarium.com!https://www.quarium.com/babUb4HAWret!"
</Location>
ProxyPass / http://somehost1.lan.quarium.com/
<Location />
SetEnv filter-errordocs
ProxyPassReverse http://somehost1.lan.quarium.com/
ProxyPassReverseCookieDomain lan.quarium.com quarium.com
SetOutputFilter INFLATE;DEFLATE;
ProxyHTMLEnable on
ProxyHTMLURLMap http://somehost1.lan.quarium.com/ / c
ProxyHTMLURLMap http://somehost1.lan.quarium.com / c
AddOutputFilterByType INFLATE;SUBSTITUTE;DEFLATE application/hal+json
application/json
Substitute "s!http://somehost1.lan.quarium.com!https://www.quarium.com!"
</Location>

This should now work as expected, but there are a surprising number of
special cases. We need to test if this is operating as expected in each case. To
do this we create a small test application.

First, we’ll need a pure HTML file “proxytest1.html” on an application server


to see if Apache performs the correct translation of HTTP headers and of
URLs in the HTML content without involving the PHP interpreter module
yet (substitute the correct “somehost1” or “somehost2”):
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<link rel="icon" type="image/x-icon" href="favicon.ico" />
<title>Proxy Mapping Test 1</title>
<link href="/css/proxytest.css" rel="stylesheet">
</head>
<body>
<h1>Hello</h1>
<p>This is proxy mapping test 1.</p>
<p>We do not want this / or this URL http://somehost.lan.quarium.com to be
proxied.</p>
<p>Click <a href="/proxytest2.php">here</a> to go to the other test page.</p>
<p>Click <a href="http://somehost.lan.quarium.com/proxytest2.php">here</a> to
also go to the other test page.</p>
<p>Click <a href="/">here</a> to go to the root page.</p>
<p>Click <a href="http://somehost.lan.quarium.com">here</a> to also go to the
root page.</p>
<p>Click <a href="http://somehost.lan.quarium.com/">here</a> to also go to the
root page.</p>
<p><img src="/images/proxytest1.png" width="50" height="50" )/></p>
<p><img src="http://somehost.lan.quarium.com/images/proxytest1.png" width="50"
height="50" )/></p>
<script> /* this is a comment */ </script>
</body>
</html>

Next, we can add a simple PHP file “proxytest2.php” to see if Apache


performs the same operations correctly on such files:
<?php
?>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<link rel="icon" type="image/x-icon" href="favicon.ico" />
<title>Proxy Mapping Test 2</title>
<link href="http://somehost.lan.quarium.com/css/proxytest.css" rel="stylesheet">
</head>
<body>
<h1>Hello</h1>
<p>This is proxy mapping test 2 for <?php echo $_SERVER['HTTP_X_FORWARDED_FOR'];
?>.</p>
<p>Click <a href="/proxytest1.html">here</a> to go to the other test page.</p>
<p>Click <a href="http://somehost.lan.quarium.com/proxytest1.html">here</a> to
also go to the other test page.</p>
<p>Click <a href="/">here</a> to go to the root page.</p>
<p>Click <a href="http://somehost.lan.quarium.com">here</a> to also go to the
root page.</p>
<p>Click <a href="http://somehost.lan.quarium.com/">here</a> to also go to the
root page.</p>
<p><img src="/images/proxytest1.png" width="50" height="50" )/></p>
<p><img src="http://somehost.lan.quarium.com/images/proxytest1.png" width="50"
height="50" )/></p>
</body>
</html>

We can test if “.css” (and “.js”) files are processed correctly with a new file
“css/proxytest.css” (and some corresponding “images/proxytest1.png” and
“images/proxytest2.jpg”):
/* proxytest.css */
body {
background-image: url('../images/proxytest2.jpg');
}
We can test if URLs in response headers are also mapped correctly with a file
“proxytest3.php”:
<?php
header("Location: http://somehost.lan.quarium.com /proxytest1.html");
http_response_code(307);
exit;
?>

Note that this URL must be absolute.

We can now inspect if the mappings are correct using for example
curl --trace - --location http://www.quarium.com/babUb4HAWret
curl --trace - --location http://www.quarium.com/babUb4HAWret/
curl --trace - --location http://www.quarium.com/babUb4HAWret/proxytest1.html
curl --trace - --location http://www.quarium.com/babUb4HAWret/proxytest1.php
curl --trace - --location http://www.quarium.com/babUb4HAWret/proxytest3.php
curl --trace - --location http://www.quarium.com
curl --trace - --location http://www.quarium.com/
curl --trace - --location http://www.quarium.com/proxytest1.html
curl --trace - --location http://www.quarium.com/proxytest1.php
curl --trace - --location http://www.quarium.com/proxytest3.php

Now that our application servers are behind a reverse proxy, they will see all
HTTP requests arriving from the reverse proxy internal IP address. For
analytics purposes we would like to log the original external client IP. For
this we enable an additional Apache module on each application server:
a2enmod remoteip

We then direct this module to use the “X-Forwarded-For” header as the


source of client addresses but only for the IP address of the reverse proxy
(here for example “10.0.0.1”). If we access the application server from some
other internal address, that address will be logged correctly too:
<IfModule mod_remoteip.c>
RemoteIPHeader X-Forwarded-For
RemoteIPInternalProxy 10.0.0.1
</IfModule>

We must also update “%h” to the “%a” field in the “combined” log format in
the “/etc/apache2/apache2.conf” file:
LogFormat "%a %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
Load Balancing by Geolocation
We are going to assume that when a new user first contacts our online
service, they are doing so from within their home jurisdiction.

In addition, we must provide mechanisms whereby an increasingly mobile


world population can access their information on our servers from anywhere
in the world, subject to restrictions by the user and server jurisdictions.

In addition, we must provide mechanisms whereby we can migrate user


information if we determine that they mostly contact our service from
locations that are geographically far apart from the server on which their
information resides.

When a request from an as-yet unidentified user reaches any of our servers,
that server should determine which <hostname>.quarium.com is
geographically nearest to the IP address from which the request originates. If
the server determines that another server is better situated to serve the
request, it should redirect the user to that server. When such a user then either
identifies themselves, or registers a new account, they will do so on the server
(cluster) that can best serve them.

If the user then identifies themselves as an account that is not stored on the
current server, the server should instigate a search for the account on all other
servers it can reach. If the account is found on another server, the user can be
authenticated and then redirected to the server on which their account resides.

There are obviously many failure modes for this entire global load balancing
mechanism. Users must be carefully educated on what could be wrong when
they are denied access to their account.

We’ll discuss the server application code for this in the next book in this
series.
About the Book
An Online Infrastructure with Ubuntu 20.04 LTS LAMP demonstrates the
detailed development and testing of a server configuration for the Online
Service and the Online Client App described in the companion books of the
series. It can also be used for Symfony, WordPress, MediaWiki, Drupal, Git,
Jira or Icinga deployment. The Online Infrastructure runs on private server
farms or cloud-hosted virtual machines at scales ranging from simple local
sites to large-scale world-wide applications.

About the Author


The author has been active in the field since the days before UNIX, when
magnetic core memory and large spinning drums could barely store one
program at a time. He was educated as an electrical engineer but quickly
figured out that hardware was becoming software (thank you Frank).

He helped build some of the first 6809 UCSD Pascal and 68000 UNIX
microcomputers (thank you Patrick, Henk, Don and Zion).

He built and presented papers on computer networks before Ethernet existed


and joined the computer game industry when interactive compact disks ran
OS/9 and took a separate department and an hour each to burn (thank you
Steve).

He shipped titles (including a Google-Earth-like title called 3D Atlas) for


CD/I, 3DO, PS-2, the Apple Pippin, for MacOS and for Windows from 3.1 to
10. He was fortunate enough to learn from the best in the business (thank you
Larry, Bing, John, Scott, Gifford, Rich and Don) and to work on most major
game franchises, from John Madden Football, to Medal of Honor, James
Bond, Lord of the Rings, the Sims, Lara Croft, Star Wars, Red Faction and
even on minor delights as 3D Atlas, Sesame Street, Anpanman and Deadliest
Catch.
Today he is building world-wide infrastructures like the one described in this
book and their associated desktop and mobile client applications which are
used by millions for entertainment, education (thank you Doug) and
commerce.

Colophon
This book was written using MediaWiki, Microsoft Word and Sigil. Titles are
set in Impact. Headings are set in Adobe Myriad Pro. Text is set in Adobe
Minion Pro. Code examples are set in Ubuntu Mono.

The book is published in eBook and Paperback format. We also sell a


“Developer Edition” in PDF with all parameters for your specific application
included, for cut-and-paste to follow along with the development and a full
ZIP archive for plug-and-play. We also sell development services to expand
the code for your specific application.

All trademarks and other proprietary information is the property of their


respective owners.

You might also like