An Online Infrastructure With Ubuntu 20.04 LTS LAMP
An Online Infrastructure With Ubuntu 20.04 LTS LAMP
04 LTS LAMP
by Bart Besseling
All rights reserved. Published in the United States of America. No part of this
book may be used or reproduced in any manner whatsoever without written
permission except in the case of brief quotations embodied in critical articles
or reviews.
10 9 8 7 6 5 4 3 2 1
This book addresses the fifth component: the custom infrastructure needed to
support the online service, its clients and its organization. The two other
books in the series address the third and fourth components.
Chapter 1 - Design
Technologies
The Internet
Since the early 1960s, the RAND Corporation had been researching systems
that could survive nuclear war based on the idea of distributed adaptive
message block switching. Independently, in 1966, work began on the
ARPANET to share the then-scarce research computers among
geographically dispersed scientists.
Funding and management for the ARPANET came from the US military for
the first two decades of its operation until the transition in 1990 to the
National Science Foundation (the successor to Office of Scientific Research
and Development which ran the Manhattan Project) and the Internet.
IPV4
Internet Protocol Version 4 (IPV4, RFC 791) was adopted by the ARPANET
in 1983 and is today the most widely used communications protocol in
history. It divides communications into packets between 20 and 216-1 =
65,535 bytes. Each packet consists of a 20 to 60-byte header section and a
data section. The header section contains a 4-byte source address and a 4-byte
destination address. This gives the protocol potentially 232 = 4.3 billion
addresses, divided into public, private and administrative ranges.
On January 31st 2011, all public IPV4 addresses were allocated while only
about 14% were actually in use. To counteract this address exhaustion, more
and more public addresses are being split into smaller blocks and assigned to
network address translating routers (NATs) from behind which large and
small private networks can still reach all other public addresses.
When systems on private networks want to communicate directly with each
other (for example in IP telephony, multi-player games and file sharing
applications) elaborate NAT traversal protocols such as Session Traversal
Utilities for NAT (STUN, RFC 5389) are used.
Ethernet
The TCP/IP protocols are named “Internet” protocols because they operate
end-to-end on top of whatever local data links exist between communicating
systems. The vast majority of such data links now use the Ethernet protocol, a
Carrier Sense Multiple Access with Collision Detection (CDMA/CD)
protocol introduced in 1980. Its design was inspired by the 1971 ALOHANet,
the first publicly demonstrated packet switching network developed at the
University of Hawaii.
Ethernet has a Media Access Control (MAC) address of 48 bits that is large
enough to accommodate a unique address for each manufactured device
interface up to a density of one for each six square feet of the land surface of
the Earth or 30,000 for every human alive today.
IPV6
By 1992 it was clear that IPV4 would not have sufficient public addresses for
future needs and work started on a new protocol. In 1998 Internet Protocol
Version 6 (IPV6) became a draft standard and 25 years after the work started
in 2017 it finally became an Internet Standard (RFC 8200).
IPV6 uses 128-bit addresses, divided into a 64-bit network part and a 64-bit
device part. This is sufficient to automatically generate a public address from
the Ethernet MAC address of each interface. This has all kinds of privacy
concerns since it would then be possible to track each unique stationary or
mobile device and its user. This problem was fixed in the “Secure Neighbor
Discovery” protocol (SEND, RFC 3971, 4861 and 6494) by allowing devices
to choose cryptographically random addresses that are still globally unique.
Public address ranges are allocated as 248 very large blocks of 280 addresses.
Each human currently alive can be allocated 30,000 of such blocks or a total
of 3.96 * 1028 addresses each.
We must also develop a strategy for numbering our private subnets, our
gateways and the well-known internal servers attached to them. IPV4
allocates three private address ranges: 10.0.0.0/8, 172.16.0.0/10 and
192.168.0.0/16. Many home and small office networks use the
192.168.0.0/16 range.
We will think big and use the 10.0.0.0/8 range. We will allocate one address
byte value to up to 256 physical locations. Each location can then have up to
256 local area networks, virtual machine bridge networks and virtual private
networks, each with up to 253 devices, a gateway address and a broadcast
address.
Our our first office will be location 0 and it will use the range of addresses
between 10.0.0.0 and 10.0.255.255. Our first LAN will be subnet 0 and be
10.0.0.0/24. We will use the common convention that on each of our private
networks .0 is not used, .1 is a gateway towards the Internet and .255 is a
broadcast address. Effectively we will have up to 65,536 internal networks
10.x.y.0/24.
We do not have this problem with the IPV6 protocol. Links are always
automatically assigned a link-local address constructed of the “fe80::/64”
prefix and the modified IEEE 64-bit Extended Unique Identifier (modified
EUI-64) version of the interface Ethernet MAC address through stateless
address auto configuration (SLAAC, RFC 4862). A MAC address is turned
into its modified EUI-64 version by inserting “ff:fe” in the middle, for
example a device with a MAC address of “52:54:00:14:79:4a” always has at
least the IPV6 address “fe80::5054:ff:fe14:794a/64”. But unless we use IPV6
for communication on the public Internet, there is no point in using IPV6
locally, other than in preparation for some distant future.
Ubuntu Linux
UNIX
The UNIX operating system was developed in the late 1960s and early 1970s at
the computing research center of Bell Laboratories, a research and scientific
development company founded by Western Electric and AT&T as a
successor to Alexander Graham Bell’s original laboratory.
UNIX was first presented formally to the outside world at the 1973 Symposium
on Operating Systems Principles, where Dennis Ritchie and Ken Thompson
delivered a paper on “The UNIX Timesharing System”. According to them
“Perhaps the most important achievement of UNIX is to demonstrate that a
powerful operating system for interactive use need not be expensive either in
equipment or in human effort: UNIX can run on hardware costing as little as
$40,000, and less than two man years were spent on the main system
software.”
Linux
Due to the high cost of UNIX licenses and the onerous terms of its license
agreements, the GNU project had been working on a free UNIX-like operating
system since 1983. In 1991 Linus Torvalds released the first source code for
his personal version, “Linux”. By 1998, Linux was perceived as a major
threat by Microsoft, the main commercial provider of operating systems. By
2012, the aggregate Linux server market revenue exceeded that of the rest of
the UNIX market.
Today Linux is the single most popular operating system in the world,
running on more than 2 billion devices. Except for what has always been the
weak spot of both UNIX and Linux: personal productivity computers with
graphical user interfaces. Unfortunately, until the rise of smart phones, this
constituted the bulk of the general purpose computer market and due to lack
of a compelling single standard Linux GUI, Windows and MacOS retain firm
control. Even the smart phone market is sharply divided between the Android
and iOS GUI variants. MacOS, iOS, Android and Raspberry Pi IoT in the
meantime all have Linux as their core operating system.
Ubuntu
In 2020, the Linux market is more fragmented than the UNIX market ever was,
with an uncountable number of different distributions “distros”. Most are
related to one of the major branches Debian, RedHat and Slackware.
This book is based on the most recent version, Ubuntu 20.04 LTS (released in
the 4th month of 2020) which offers significant improvements over the
previous LTS versions 16.04 and 18.04 and which will be supported until
April 2025. We have been using Ubuntu as our main deployment platform
since version 14.04 LTS and have always been able to upgrade to newer
versions with minimal changes.
Tools
The largest expense in the vast majority of software projects is people. If you
are doing business, you should use the tools and components that have the
largest commercial user base. You should then hire the people that have the
most experience in delivering products using those tools and components, so
they can get the job done the fastest and with the highest quality.
This chapter is about the tools we have in our own software development
toolkit. There are many alternatives but over the years these are the ones that
we have used most.
Development Techniques
Tools are useless without a proper understanding of how they are applied to a
problem. Knowledge itself is always our first and most important tool.
And, since we are here to sell information to people so they can improve their
lives and so that we can make money for ourselves, it is time to buy a Ford F-
150 pickup truck, the most popular motor vehicle of all time and cry about
that decision all the way to the bank. In the online commerce world, that is
the venerable open-source “LAMP” (Linux, Apache, MySQL and PHP)
system that we have been installing since the 1990s (except “P” stood for
“Perl” back then).
But to do this effectively, we need a proper education in the backgrounds of
our profession.
Dijkstra suggested, and Böhm and Jacopini proved that all procedural
problems could be solved by simple nested “sequences”, “decisions” and
“loops” in “structured code” instead of “spaghetti code”.
Even more generally, we can take literally any problem in any field and
divide it into a tree of sub-problem sequences, decisions and loops. We repeat
this process until the solution to each leaf problem is trivial. Then we
assemble the solution to the original problem from the leaves back to the
trunk, taking care to prove that each sub-assembly is working properly and
that it is only attached to the rest of the tree at one point (its interface). We
can then work on each sub-problem and sub-assembly in isolation, possibly
even in parallel in a development team, easily within our “Human Conceptual
Limits” [George Miller].
We should build this initial version and all subsequent improvements in short
development “sprints”, carefully limiting ourselves to things that can be built
and tested and integrated within the time of one sprint. We should use formal
project management tools, such as Kanban, to document everything that
needs to be done and its current state. This is nothing but top-down structured
design, formally applied to a development process.
If anybody “breaks the build”, that person does not eat or sleep until the build
is working again.
When we got the first “glass tty”, we could “edit” (“ed”) the consecutive lines
of our code without wasting miles of paper and ribbons of ink. Some
programmers worked better if they kept their entire program organized in
their head, like some chess players do with their games. Others limited each
routine to a length they could see in its entirety on their 24-line VT100 smart
terminal.
After that, the “what you see is what you get” (WYSIWYG) and “graphical
user interface” (GUI) fashions never really caught on in programming, but
syntax highlighting, auto-completion and real-time syntax checking have
again dramatically improved programmer productivity.
There are a number of computer science papers from the 1950s onwards that
demonstrate that visual pattern recognition and compile-time checks have the
largest impact on software quality. These lessons have clearly been ignored
by the creators of late-binding languages (Java, Objective-C) and languages
where white space is a critical semantic element (Python, YAML).
Visual Studio supports most programming languages, for example PHP using
a “PHP Tools for Visual Studio” plug-in published by “DevSense”, and most
CSS, HTML, JavaScript and SQL variations. The debugger and the
performance tools are some of the best available and we frequently take a
peek at some binary file using the binary file editor. We can perform all git
version control functions without switching to another environment.
Apple XCode
Apple is not nearly as good as Microsoft in providing consistency and
general usability to its developers. Periodically we have to rewrite all of our
Apple software in a completely new language (First Basic, then Pascal, then
HyperCard and MPW Script, then C and C++, then Objective-C and now
Swift, but minus OpenGL soon). Then we have to periodically rewrite all of
our software for different processors and operating systems (68000,
PowerPC, Intel and now ARM, through Next and now Linux with proprietary
drivers and a proprietary GUI that changes often). In many cases it is even
literally impossible to provide customers with a minor update to software that
was developed only recently. The expensive Apple hardware also has a nasty
habit of obsoleting itself: With every Apple operating system update a
generation of Apple hardware turns into e-waste.
And finally, long after Microsoft has given up on its “Metro” desktop UI,
every Apple desktop will now be made to look like a mobile device. With
Jobs and Ive gone, we can only wait in terror to see what that will look like.
But, as an Apple support person once literally told us, “we should not want to
know that”.
For the most part, after the initial delight of an update when our complex
build-and-sign process has to be changed completely overnight, again,
XCode is very similar to Visual Studio. It fully supports git and all of the
languages currently approved by Apple for use on Apple systems.
Our only peeve is that there are so many updates of its different components
and libraries that the first development of the day typically requires some
patience. These changes also frequently break the way for example a package
with less frequent updates like Cordova generates Android projects especially
for “Gradle”, which, to its credit, the IDE then knows how to re-factor.
Download and install Android Development Studio from the
“developer.android.com” web site. Make sure to use a “bundle” version and
not a “stand-alone IDE” version, even if you must use an older version. Any
older version will update itself anyway. The default settings should be
sufficient for our purpose. Install all suggested updates.
MySQL Workbench
The purchase of the free open-source database MySQL by Sun Microsystems
which was then itself purchased by Oracle was not a smooth transition.
Fortunately, the situation now appears to have stabilized and Oracle publishes
the excellent MySQL Workbench as free software.
This is essentially an IDE for the MySQL database, which can even migrate
basic schemas to and from Oracle databases. The package supports modern
“secure shell tunneling” so we can operate on database servers that are hidden
behind a firewall with SSH access.
Photoshop
Photoshop has long been our graphics IDE of choice. Unfortunately, its
publisher Adobe has jumped on the subscription bandwagon and now you
can only rent a copy if you keep paying for it over and over and over again,
even during times when you do not use it.
For software where frequent updates are essential to its operation, for
example a virus checker or a tax preparation program, this business model
makes sense. For a mature piece of software that does not evolve a lot, this is
not a business model that aligns with the needs of the customer.
Git
Git was developed to manage the distributed development of Linux and it is
currently the most popular version control system with a reported market
share above 90%.
A typical use case of git consists of a project directory with a “.git” sub-
directory. The sub-directory is the “repository” or “repo” and the project
directory is the “working tree”. On version control servers, projects are
typically stored in “bare” repositories (without a working tree) in a
“someproject.git” directory.
Different servers can automatically pull updates and, for example, perform
continuous integration builds of applications or automatically deploy
development, staging or live web sites.
All of our projects always begin with the creation of a new empty bare
repository on our version control server, which is then “cloned” to each of
our development workstations, continuous integration build servers and
development, staging and publication web servers.
One of our strict process rules is that no file may ever be deployed to any
server or published to any customer unless it comes out of a git repository.
TortoiseGit
TortoiseGit is a free open-source GUI wrapper for the git version control
system for Windows. It tightly integrates with the Windows File Explorer
and allows users to control directory and file versions using right-click menus
directly in File Explorer. This makes its use highly intuitive and eliminates
any excuse not to use git.
Atlassian SourceTree
Atlassian SourceTree is a free client for git, available for MacOS and
Windows. It consists of a full-featured graphical user interface. This makes it
easier to show people who are not familiar with version control what the
structure and development history of the project is.
Over the years we have used many different systems. The ones that stand out
are “Bugzilla”, originally published as an open source software web
application by Netscape and “DevTrack” by “TechExcel” as an excellent and
fast client-server application.
Atlassian Jira
Atlassian, the publisher of the “SourceTree” git GUI client, also publishes
“Jira” a web-based project planning application. It can operate in several
popular modes including as a “ticket” tracker and as a “Kanban” planning
board. While it is not as fast and lightweight as “DevTrack”, it is remarkable
useable. It is properly integrated with git and email. It has an import facility
which allows new projects of a common structure to be set up from a
spreadsheet template very quickly.
The development of all software in these books was organized using Jira. The
infrastructure book in this series describes in detail how you can set up a
system to host Jira for yourself.
MediaWiki
Not too long ago, every engineer worth his salary (paid in salt, in Roman
times) kept engineering notebooks. These were invaluable, not only for
remembering how you did something a few weeks ago, but also how to back-
track a dead-end in a solution to a new problem. The problem of course is
that you cannot edit or search or share a paper notebook very well.
Which is why today we use the robust MediaWiki software. This is the same
PHP software that powers Wikipedia. It is free and it is easy to set up on a
LAMP server. There is a set of useful plug-ins that do everything from giving
the wiki a more corporately branded look, to providing a WYSIWYG editor,
to generating a table-of-contents tree. The books in this series were all written
originally as engineering notebooks using MediaWiki.
The infrastructure book in this series describes in detail how you can set up a
MediaWiki system for yourself.
Components
We will design our infrastructure to satisfy the following requirements:
Our customers will be using mobile and desktop machines with browsers and
application-specific software. We will develop and serve the first phases of
the service from the first corporate offices:
In some parts of the world, where cloud services are commercially not
available or are politically not permitted to do business, we will co-locate a
similar set of physical servers:
1. Individual servers for central services like DNS, DHCP, LDAP, Email,
Wiki, a project management database and services monitoring, and
2. Groups of virtual machines operating as application clusters:
The physical machines will act as hosts for the virtual machines, providing
them with processing, memory and mass storage and network interfaces. As
far as the virtual machines are aware, they are all connected to a local area
network behind a firewall and they receive requests forwarded to them by the
physical machine:
Chapter 2 – Implementation
Machine Types
Physical Servers
For physical servers we face the two eternal dilemmas: how many of what
kind do we need and do we make them or buy them?
The quantity versus performance question is easy: Buy more units of the
cheapest hardware that will do the job. This reduces the cost of failure and
improves scalability. Hosting of Linux virtual machines on large mainframes
like the IBM Z only makes business sense for certain load characteristics.
The make or buy decision is mostly a question of time-to-market and cost. If
we look at some typical modern components:
This configuration costs around $1,500 plus the labor costs of buying the
parts, putting them together and testing the system. The configuration will be
unique and difficult to reproduce but at least you can follow the latest trends
in hardware.
For less than twice as much we can quickly get a similar pre-assembled 1U or
2U server from a reputable system manufacturer delivered almost anywhere
in the world from a production series that will likely last several more years.
The price difference is equivalent to about 10 installation and test hours of an
in-house build, which is quite competitive. There are still many territories in
the world that are under-served by the large cloud hosting companies and it is
good to have a physical server selected that can go into a remote colocation
facility at short notice.
Still, for a lean but scalable startup it is best to start with two in-house-build
servers for the first office and then, when you have more money and less
time, scale up with cloud servers or pre-assembled servers.
Virtual Machines
We could start out with just cloud-hosted virtual machines and this seems
like the cheapest option until we consider the needs of the corporate offices:
We will at least need some local firewall, some local storage and some
redundancy on both. This can quickly add up to the cost of two of the
physical servers described above. So we will develop and host our initial
virtual machines on physical machines in our office and then scale up by
cloning fully configured and tested virtual machines to a cloud hosting
service or run them on physical machines in some remote colocation facility.
For all corporate and application functions, virtual machines are the way to
go except for the hosting of the virtual machines themselves.
Application Containers
Application containers such as Docker rely on application isolation
mechanisms in the host operating system to provide large numbers of light
virtual machines that do not need a lot of administration themselves. This
technology is useful in the higher-scale smaller-configuration stages of a
mid-size to large company.
There are now also container orchestration tools like Kubernetes that can
manage very large installations. We are not discussing Kubernetes in this
version of this book. We are focusing on the installation and configuration of
the individual services used by the infrastructure. Until we have proven that
the product sells there is no need to actually scale it up. It is sufficient to
know the implementation is flexible enough to scale when it has to.
Actual virtual machines have the advantage over containers that they can be
hosted without change or limitation on any type of physical system for
example on Windows or MacOS development systems or large Windows
hosts. They use more storage than containers but at $20 per Terabyte retail
storage is a lot cheaper than development and maintenance time.
Cloud Hosting
Cloud hosting of virtual machines is an excellent way for a small company to
scale up to mid-size.
After the customer base and the profitability of a company grow to a certain
level, it has a choice between vetting and trusting its own employees and
facilities or letting the faceless hordes of the hosting company (some of who
actually work for the NSA and others who work for China) hold on to its
data. That the data in such facilities is almost always encrypted is only a
protection from people that do not have physical access to the underlying
internal system communications. Cloud hosting makes the most sense for
large quantities of static data, for example video or game data files.
Installation Types
We are going to need several different types of installations of Ubuntu for
different purposes.
Fortunately, the procedure for the basic server installation of Ubuntu is the
same for the first three configurations and we can prototype a first installation
or an upgrade very easily on for example a Windows development system
with a virtual machine host such as the free Oracle VirtualBox software.
After each installation step we clone the virtual disk image so we can quickly
recover from any installation mistakes and we base the more complicated
installations on the simpler ones.
Specific Installations
We are going to construct our world-wide business infrastructure using the
following specific installations:
Location Infrastructure
Each business office and each colocation facility has redundant or high-
availability connections to the Internet. The main office and all colocation
facilities have static IPV4 addresses.
The business needs at least one domain name, for example “quarium.com”
and a wild card security certificate for each domain, for example
“*.quarium.com”. The certificate will be based on one corporate master
private key.
Both machines have NICs (network interface cards, although these days the
hardware is actually integrated on motherboards) for Internet and local
network connections configured with dual bridged interfaces and
masquerading forwarding firewall settings.
Both machines allow remote access via SSH. Both machines provide
virtualization via KVM. Both machines run a satellite configuration of
Postfix to forward administrative email. No other software is run on these
machines directly and no data other than virtual machine disk images is
stored on these machines directly.
Management Servers
The business also runs a number of the application virtual machine
installations to host Jira project databases, MediaWiki documentation
databases and Icinga service monitoring applications.
Chapter 3 – The Operating System
In this chapter we describe the installation and configuration of the operating
systems for each of the installation types. It is only necessary to do an
installation once for each virtual machine installation type.
Ubuntu Server
Installation
This process is used to prepare the basic, application and hosting installation
types above.
Download the most recent server installation image from “ubuntu.com”, for
example “ubuntu-20.04-live-server-amd64.iso”. For installation on physical
machines this image must be copied to a USB storage device or to an optical
disk. Ubuntu recommends the free application “Rufus” for creating bootable
USB sticks on Windows. On the Desktop version of Ubuntu you can use an
application called “startup disk creator”. For installation on virtual machines
the downloaded image file can be used directly.
On a new virtual machine created on a host system with a GUI (for example
desktop Ubuntu or Windows with Virtual Box), mount the image file on the
virtual optical drive and boot the machine. The first few installer questions
are about the user interface:
Select the default “English” as the installation language. Choose to update the
installer that was included with the distribution ISO.
Select a network interface that can be used for updates (usually “eth0”).
Select the default “Use an Entire Disk” partitioning. Set up this disk as an
LVM group.
Select the default disk to install to, review the suggested partitioning as a
single mount point “/” and confirm it.
Then we create a first (probably administrative and only) user account: Enter
a full name for a user for example “Quarium Administrator”, a server base
name “ubuntu20base”, a user name “quarium”, a password and a password
confirmation.
Choose to install the OpenSSH server but do not install an SSH identity at
this point.
This is all of the configuration we are able to do at this point: Observe the
installation progress. This is relatively quick. Then observe the online update.
This may take considerably more time.
Reboot the machine. When the installer complains, remove the installation
medium or to disconnect the installation file from the virtual machine.
Download the most recent desktop installation image from “ubuntu.com”, for
example “ubuntu-20.04-desktop-amd64.iso”. For installation on physical
machines this image must be copied to a USB storage device or to an optical
disk. Ubuntu recommends the free application “Rufus” for creating bootable
USB sticks on Windows. On the Desktop version of Ubuntu you can use an
application called “startup disk creator”. For installation on virtual machines
the downloaded image file can be used directly.
On a new virtual machine created on a host system with a GUI (for example
desktop Ubuntu or Windows with Virtual Box), mount the image file on the
virtual optical drive and boot the machine. The GUI requires more RAM, so
initially allocate 1 processor, 4096GB RAM and 100GB disk storage. This
installation will also use more of the allocated disk storage. An initial
installation will use about 8.6GB.
The first few installer questions are about the user interface:
Then we create a first (probably administrative and only) user: Set the
appropriate time zone. Typically desktop systems operate in the local time
zone of the user and in this installation that also affects language selection.
Enter a full name for a user for example “Quarium Administrator”, a fully
qualified server name “ubuntu20desktop.quarium.com”, a user name
“quarium”, a password and a password confirmation. In Ubuntu 20.04, DO
NOT select “require my password to log in”. When this version was released,
the installer does not install or configure the login screen correctly. Install
with automatic login and then change this setting after installation in the
“Users” settings.
This is all of the configuration we are able to do at this point. Any network
interfaces will be configured automatically by NetworkManager. Observe the
installation progress. This is relatively quick.
Use the administrator account to log in. If after login the screen appears
garbled, try booting the system with the display disconnected and then log in
to the lower resolution default screen.
In the settings interface selected from the top-right corner of the screen,
disable power saving.
Common Configurations
Before configuring an Ubuntu system for any particular services, we will
make some changes to default settings that improve the manageability and
operations of servers. We do this same work on all installation types.
Configuration files (for example with daemon startup options) are located in
“/etc/default”.
The system log can be read using the “journalctl” command. Some common
operations are:
journalctl –follow # display log file entries as they occur
journalctl --since "2015-01-10" --until "2015-01-11 03:00"
journalctl --since yesterday
journalctl --unit=<name>.service # display all log entries for a service
journalctl --list-boots # list all system boot times
journalctl -b <name>.service # display log entries for a service since boot
Sudo Settings
On all Linux systems it is common to prohibit direct login as the super user
(“root”). Instead, we add certain user accounts to a group “sudo” in
“/etc/group”. If they execute a “sudo” command, for example “sudo bash” or
“sudo poweroff”, the system in its default configuration asks for the user
password and then executes the command with super user permissions. On
our servers we only have one administrative user account and we only use it
to get super user access to commands and files. We do not use passwords for
remote login. If the administrative user has a valid access key but no
password, “sudo” may be configured not to ask for it. This can be done by
using the “visudo” command to change a line in a configuration file to:
%sudo ALL=(ALL:ALL) NOPASSWD:ALL
Software Updates
Before we do anything else to a new or cloned Ubuntu installation we must
obtain the latest updates using these commands, which we could save in a file
“~/Scripts/update_system.sh”:
#!/bin/bash
set –x
apt update
apt full-upgrade --assume-yes
apt autoremove --assume-yes
apt clean --assume-yes
update-grub
Run the script and reboot the system. If the upgrade process is interrupted it
may be necessary to manually correct the package data using:
dpkg --configure -a
Then we must tell the system to install updates automatically. The package
“unattended-upgrades” is installed by default. Review the file
“/etc/apt/apt.conf.d/50unattended-upgrades” and un-comment the following:
"${distro_id}:${distro_codename}-updates";
...
Unattended-Upgrade::Mail "root";
...
Unattended-Upgrade::MailReport "only-on-error";
...
Unattended-Upgrade::Automatic-Reboot "true";
...
Unattended-Upgrade::Automatic-Reboot-Time "02:00"; // NOTE: This is GMT0! Set
this to 2AM local time, for example 10:00
To enable automatic updates, edit the file “/etc/apt/apt.conf.d/10periodic” and
set the appropriate apt configuration options:
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Download-Upgradeable-Packages "1";
APT::Periodic::AutocleanInterval "7";
APT::Periodic::Unattended-Upgrade "1";
It is not recommended, but you could upgrade to a new Ubuntu main release
using:
do-release-upgrade
System Name
When we create a single virtual machine disk image, we need to change the
system name on each copy we put into service. Recent versions of Ubuntu
automatically install and activate the package “cloud-init”. This is supposed
to make configurations of new virtual machines easier. Unfortunately this
package is still documented poorly and any scripts written for it must be
adapted to the latest software updates before they are run for the first time
(alternatively they must be re-tested almost daily). This is an excellent way to
introduce all kinds of untraceable automated configuration changes at the
critical time of new installation. We won’t be using this mechanism in this
book and we must prevent “cloud-init” from changing our carefully
configured host name on every reboot by making the following change:
echo "preserve_hostname: true" >/etc/cloud/cloud.cfg.d/99_hostname.cfg
When servers must exchange files and databases, we must make sure that we
do not need to translate any dates and times between different time zones, or
that we might copy files back in time: If a file created in London on June 2nd
2018 at 6AM local time is immediately copied to a server in Los Angeles, it
will arrive there the day before at 9PM local time. If both servers were set to
local time its creation timestamp would indicate it came from the future!
Instead we will configure all our servers to operate at Coordinated Universal
Time (UTC), which prior to 1972 was called Greenwich Mean Time (GMT).
This way all servers will agree which events occurred before or after which
other events. Whenever a date or time must be presented to a user, the display
application will convert it to local time on their client display.
In Ubuntu 20.04, the system clock and the hardware clock are managed by
the “systemd-timesyncd.service”. We can check the current settings with:
timedatectl
Make sure the local hardware real-time clock (RTC) is set to UTC as well:
timedatectl set-local-rtc 0
We must also make sure that all servers agree on the current time. For this we
connect them to the world-wide atomic clock service using the Network Time
Protocol (NTP, RFC 5905). If we see that our server is not set up to
synchronize its clock we can correct that with:
timedatectl set-ntp on
These are the only operations we need to keep all clocks of a world-wide
network of servers in sync. On desktop systems these operations can also be
performed with the settings GUI.
User Accounts
Each user account on a production server is a potential security hole.
We will protect the servers from most random account and password
guessing attacks by only allowing encrypted remote login with access keys
that are stored on the server. In our service we will also only have a single
administrative account on each server (the account created during
installation). This account will have the public security keys of all users that
are allowed to manage the server. This is a compromise between auditability
of the system (who has logged in) and obscuring which users are allowed
access (who is allowed to log in, i.e. which account do we need to
compromise to get in). We believe preventing damage is more likely to
succeed than auditing who caused it. After all, once logged on to a system it
is relatively easy to erase malicious footsteps.
We should also restrict the default permissions for new user-created files by
changing a setting in “/etc/login.defs”:
UMASK 027
This causes new user files and directories to be inaccessible to accounts that
are not in the same user group by default. It is also useful to inspect the
“/etc/passwd” file and set or correct the full username (GECOS) field of the
root and the administrative users.
If you will be using the “vi” or “vim” editors to paste-in indented content it
may be useful to add the following line to “/etc/vim/vimrc”:
set paste
Servers should be configured with enough system memory for all standard
operations and they should almost never swap. We add a few parameters in
the file “/etc/sysctl.conf”:
vm.swappiness = 10
vm.dirty_ratio = 40
vm.dirty_background_ratio = 10
vm.dirty_writeback_centisecs = 6000
Installation
The standard server and desktop installations of Ubuntu 20.04 LTS both
include the netplan package. The “Network Manager” package is
automatically installed in the desktop installation.
File Structure
The netplan interface configuration YAML files are located in the directory
“/etc/netplan/”. These files are then “rendered” by either the “networkd”
service on servers or the “NetworkManager” service on workstations into
files in the “/run/systemd/network” directory.
Note that this new standardization has given the maintainers of the “udev”
device manager new freedom and it is no longer easy to predict what the
name of network devices is going to be. Worse even, the names now seem to
change from version to version on the same hardware. This unnecessarily
complicates documentation and scripting. Historically these devices used to
be named “eth[0-9]”. In these examples we use “eth[0-9]” for clarity but you
should substitute the names used in your various installations as needed.
We also need to know which of these addresses is allocated to our ISP router.
Some ISPs use the first usable address in the range (.25) and some use the
last (.31). In each case, the other end address is the broadcast address in the
range (.31 or .24, respectively). So each such /29 range has 6 usable
addresses.
The ISP typically also provides two IP addresses (not in this range) for their
DNS servers. We will be using our own DNS servers but it is useful to have
these extra DNS servers as a backup configuration.
Many ISP-provided routers also serve the DHCP protocol on our static IP
range. It should be possible to disable or ignore this since the DHCP protocol
contains a mechanism for making sure allocated addresses are not already in
use, either by fixed allocation or by another DHCP server.
Bridged Configuration
This static configuration is not sufficient if we use a physical machine to host
a set of virtual machines. In that case we must configure both the network
“ethernets” and “bridges” for each:
network:
version: 2
renderer: networkd
ethernets:
eth0:
dhcp4: false
eth1:
dhcp4: false
bridges:
br0:
interfaces: [eth0]
addresses: [50.255.38.81/29,50.255.38.83/29]
gateway4: 50.255.38.86
parameters:
stp: false
forward-delay: 0
br1:
interfaces: [eth1]
addresses: [10.0.0.10/24]
nameservers:
addresses: [10.0.0.30,10.0.0.31]
parameters:
stp: false
forward-delay: 0
The settings are applied to the bridges which then carry them forward into
their associated interfaces.
Chapter 5 - Firewall
A firewall protects a server and any local area networks behind it from
unauthorized incoming Internet traffic while permitting and facilitating
outgoing traffic.
Iptables is quite flexible but it is also quite complicated. Most systems would
benefit from a simple standard set of rules. Ubuntu provides a front-end
called the “uncomplicated firewall” (ufw) and its configuration files which
are stored in “/etc/ufw”.
Installation
The standard server and desktop installations of Ubuntu 20.04 LTS both
include the netfilter, iptables and ufw firewall software packages.
File Structure
The ufw package stores its configuration files in “/etc/ufw” and there is an
additional configuration file “/etc/defaults/ufw”.
Service Operations
We can control and determine the status of the firewall using:
ufw enable
ufw status
ufw show raw
ufw logging on
ufw logging off
ufw disable
An Ubuntu desktop installation initially does not listen on TCP port 22 for
the “Secure Shell” (SSH), but it does listen on TCP and UDP ports 53 for the
“Domain Name System” (DNS), UDP port 68 for the “Dynamic Host
Configuration Protocol” (DHCP) client, UDP port 5353 for Avahi mDNS and
TCP and UDP ports 631 for the “Common Unix Printing System” (CUPS).
We will not use the mDNS and the CUPS protocols and we will let the
firewall block access to these ports.
We run this script once when a new machine is installed and from then on
ufw will protect the system from all other access and from a set of malformed
or malicious network packets.
First we must allow the gateway kernel to forward packets between the
network interfaces or bridges in “/etc/ufw/sysctl.conf”:
net/ipv4/ip_forward=1
If for example our infrastructure offers web and REST API services it does so
on ports 80 and 443 of a public IPV4 address. The physical machine must
forward packets received on these ports to the equivalent ports on the internal
address of a virtual machine dedicated to that service. Then, response packets
from the virtual machine must be returned to the proper public client address
(where it may be forwarded again to an internal workstation). Two common
cases are forwarding TCP connections, for example SMTP:
-A PREROUTING -i br0 -p tcp -d 50.255.38.82 --dport 25 -j DNAT --to-destination
10.0.0.34:25
-A POSTROUTING -o br1 -p tcp -d 10.0.0.34 --dport 25 -j SNAT --to-source 10.0.0.20
Or connectionless UDP, for example DNS:
-A PREROUTING -i br0 -p udp -d 50.255.38.82 --dport 53 -j DNAT --to-destination
10.0.0.31:53
-A PREROUTING -i br0 -p tcp -d 50.255.38.82 --dport 53 -j DNAT --to-destination
10.0.0.31:53
-A POSTROUTING -o br1 -p tcp -d 10.0.0.31 --dport 53 -j SNAT --to-source 10.0.0.20
For more details, see the later chapter on “Global Load Balancing”.
Chapter 6 - Remote Access
SSH replaces the older “telnet”, “ftp”, “rlogin” and “rsh” applications which
are not secure and which should never be installed or used in any production
environment.
Installation
The base Ubuntu 20.04 server installation automatically installs the
“OpenSSH” client and server packages. On the desktop installation we must
install the server package manually. We can make sure the packages are
properly installed using:
apt install openssh-client openssh-server
apt list --installed | grep openssh
File Structure
The configuration files of the package are found in the “/etc/ssh” directory.
The client configuration is found in the file “/etc/ssh/ssh_config” with
overrides in the “/etc/ssh/ssh_config.d” directory.
Service Operations
Server operations are managed by the “ssh.service”. Some common
operations are:
systemctl start ssh
systemctl restart ssh
systemctl stop ssh
Server Configuration
The server configuration is found in the file “/etc/ssh/sshd_config”. We will
add our own rules in a file “/etc/ssh/sshd_config.d/quarium.conf” which will
override the default settings. We will require that all logins are only done
through less predictable account names:
PermitRootLogin no
We will require that users may only login if their public SSH key is present in
the “.ssh/authenticated_keys” file of the account they are logging in to:
PasswordAuthentication no
The normal security updates of our systems are continuously changing the
encryption cyphers, message authentication codes and key exchange
algorithms used by SSH. We can list the ones supported by our current
version with:
sshd -T | grep "\(ciphers\|macs\|kexalgorithms\)"
Each account that allows login using SSH should have a subdirectory “.ssh”
with permissions 700.
Each key is binary encoded using Base64. 3 binary bytes are encoded as 4
characters, so a 2048-bit key is encoded to 342 characters. The key name is
used to recognize incoming connection requests and is not related to (but
often the same as) the account name of the owner of the key. We can create a
new 2048-bit SSH private/public key pair with the command:
ssh-keygen –b 2048 -C <keyname>
Use an empty password to protect the SSH keys if asked and select a key
name that describes its purpose, for example a user name or a host name.
This command produces two files in the “.ssh” directory, by default “id_rsa”
and “id_rsa.pub”. This last file consists of the single line suitable for
appending to the “authorized_keys” files of accounts that may be logged in to
using this key pair.
Another option “-i” allows for the conversion of RFC 4716 SSH Public Key
File Format keys into OpenSSH key format.
Tunneling
Our infrastructure consists of (among others) a set of physical servers which
host virtual machines. The virtual machines perform services based on
packets that are forwarded to them by the “iptables” “ufw” configuration of
the physical machines. The virtual machines do listen on SSH port 22, but
only on their internal bridge network. To access a virtual machine, we first
have to login to the physical machine on port 22 of its public address, and
then we have to login to the virtual machine on port 22 of its internal bridge
network address.
This can be tedious and fortunately the SSH client application can use a
configuration file “~/.ssh/config” to set up complex connection forwarding
options. In the following example the local private key “~/.ssh/id_rsa”
authenticates a user “quarium” on a cluster consisting of servers “us1” and
“us11a” where only “us1” has a public IP address and “us11a” can be
reached through its internal bridge network only:
Host us1
Hostname us1.quarium.com
User quarium
Host us11a
Hostname us11a.lan.quarium.com
User quarium
ProxyCommand ssh -A -q us1 nc -q0 us11a.lan.quarium.com 22
The local user can connect to each of the systems by their short names
without having to specify a user name, for example:
ssh us11a
To allow other system users (for example crontab) to use the aliases, store the
file in “/etc/ssh/ssh_config.d” instead.
Chapter 7 - Virtualization
In early computers, physical processors were directly connected to physical
memory, i.e. the same program address always accessed the same physical
memory location. When a program needed more memory than available, it
had to resort to complex overlay methods to run.
The UNIX and Linux operating systems make heavy use of virtual memory and
can theoretically use entire mass storage devices as extensions of physical
memory. In 1979, the Motorola 68000 microprocessor was one of the first to
have a MMU. This meant that the UNIX operating system could run on it. In
1985, the Intel x86 processor line caught up with the 80386. Today, the ARM
processors in every smart phone and every “Internet of Things” (IoT) device
run operating systems which use virtual memory.
Installation
The virtualization packages are not installed by default and should only be
installed on physical Ubuntu 20.04 server or desktop installations. The base
packages are installed with:
apt install qemu qemu-kvm libvirt-daemon libvirt-clients virtinst bridge-utils
libguestfs-tools
Add all users that may control virtual machines (for example “quarium”) to
the “/etc/group” “libvirt”.
You should see “vhost_net” in the output of grep. Tell the kernel to load the
module at boot time using:
echo vhost_net >>/etc/modules
File Structure The disk images for virtual machines are stored in
“/var/lib/libvirt/images”. Service Operations Server operations are managed
by the “libvirtd.service”. Some common operations are:
systemctl start libvirtd
systemctl restart libvirtd
systemctl stop libvirtd
To get a list of just the virtual machines in a particular state for scripting
purposes, use:
virsh list --state-shutoff --name
virsh list --state-running --name
virsh list --state-paused –name
Choose a local install image and use the ISO image of the software
distribution medium.
Set the amount of RAM allocated to the virtual machine, for example to 2048
and set the number of processors, for example 1.
Create a mass storage image for the virtual machine, for example 100GB.
Set the name of the virtual machine, for example “ubuntu20base”, select to
customize the configuration before install and finish the installation.
In the “Overview” tab, set the “Title” to something meaningful, for example
“Ubuntu 20 Base” and “Apply” the setting.
In the “Boot Options” tab, select “Start virtual machine on host boot up” and
“Apply” the setting.
In the “NIC” tab, select “Specify shared device name” and enter “br1” and
“Apply” the setting.
Then “Begin Installation”. See the earlier chapters for configuration details.
Once a virtual machine is properly configured, dump its configuration into a
file that can be used to create the virtual machine on other physical servers
using:
virsh dumpxml somehost > somehost.xml
We use this utility to periodically make backup images of our (shut down)
production virtual machines which are then copied to other physical
machines for use in a standby virtual machine. We then resume production
operations with the sparsified disk image.
It is also possible to change the size of a virtual disk image. This can be done
using the utilities “fdisk”, “qemu-img”, “pvs”, “lvs” and “resize2fs”. This is a
complicated operation and it is probably easier to allocate new virtual
machines with a large amount of unused disk space to begin with (for
example 100GB). The unused space will only be allocated on the physical
disk if it is actually used.
Virtual disk images can be copied from one host server to another while
preserving sparse disk allocation using:
rsync -a -e ssh --sparse --compress --progress source destination
It is also possible to convert a virtual disk image from one format to another.
Some useful conversions are from the old “img” format to “qcow2” and from
“qcow2” to “vmdk”. Use the following command for conversion:
qemu-img convert -f raw -O qcow2 source.img target.qcow2
Chapter 8 - DNS
In the early days of the ARPANET, there was one centralized file
“HOSTS.TXT” that contained all mappings of system names to their network
addresses. The file was maintained manually at the Network Information
Center of the Stanford Research Institute and updates had to be
communicated to it by telephone, during business hours. In the early 1980’s
this mechanism became slow and unwieldy and the mechanism was replaced
with the Domain Name Service (DNS, RFC 882 and 883, currently 1034 and
1035).
Today on the Internet, there are two main types of applications: In one type, a
“server” listens for incoming requests from “clients” and sends responses to
those clients. In the other type, two “peers” exchange notifications, requests
and responses with each other.
In both cases one side of the application can “wait for something to happen”
while the other side must determine which remote IP address and port it must
“cause something to happen to”. Most peer-to-peer applications use some
kind of peer discovery server, so ultimately for every network application a
client must map a DNS name to an IP address and TCP or UCP port, and a
server must, mostly for management purposes, be able to map an IP address
and port back to a DNS name.
All Linux, Windows and MacOS systems still have a “hosts” file that is used
to map names, mostly locally, in case DNS servers cannot be reached. This
file can be used to map DNS names to IP addresses and back.
In 1984, the Berkeley Internet Name Domain (BIND) was the first
implementation of DNS for the UNIX system. One of its components was the
name daemon “named”. Although it has been revised a number of times (we
use bind version 9) it is still the most widely used DNS software on the
Internet.
Domains are registered with an Internet domain registrar, who publishes the
addresses of the authoritative DNS servers as part of a higher-level domain,
for example “.com”.
Typical domains have two or more DNS servers. One server is the “master”
and this server contains and serves the authoritative copies of the domain
“zone files”. One or more additional DNS servers act as “slaves” and serve
copies of the zone files obtained from the master.
Installation
The default Ubuntu20.04 LTS installation includes a DNS client resolver
library.
The DNS server software is not installed automatically in either the server or
the desktop distributions of Ubuntu 20.04 LTS. In our infrastructure, we will
install a DNS master and slave server on our office and development
networks. Office and development workstations will obtain their IP addresses
from these DNS servers. Install the DNS server software on both master and
slave servers but not on client systems using:
apt install bind9 dnsutils
File Structure
The DNS configuration files are stored in the “/etc/bind” directory. The
primary configuration file is “/etc/bind/named.conf”. Files for dynamic zones
can be found in “/var/lib/bind”. On slave servers, files are stored in
“/var/cache/bind”.
Service Operations
The name of package is “bind9”. This is confusing because to “systemctl”
and “journalctl” the associated service is known as “named”. Also, the
associated directories are named “bind”. Operate the service with the
following:
systemctl enable named
systemctl start named
systemctl restart named
systemctl stop named
systemctl disable named
or:
dig @server host
or:
nslookup host server
There are a number of useful public free DNS check services on the Internet.
They will test a set of standard configuration criteria and report any problems
found for a specified domain.
Master Configuration
The “/etc/bind” directory contains the server configuration files and the zone
files of the externally visible domains that are maintained manually. The
“/var/lib/bind” directory contains internally visible domains that can be
updated automatically by for example a DHCP server as systems are being
added to or removed from the internal networks.
We will begin by specifying a few access control lists (“acl”) to limit access
to certain resources. The examples assume we will be using the IPV4
addresses 10.0.0.30 and 10.0.031 for our DNS servers. We'll have a few
systems “ph[12]” for “physical host” and “ns[12]” for “name server”. All IP
address values, domain names and security keys are examples only and
should be changed to the actual values used in your infrastructure:
acl slaves {
10.0.0.31; // ns2.lan.quarium.com
};
acl internals {
50.255.38.80/29; // Main office ISP
10.0.0.0/24; // Main Office LAN
127.0.0/24; // localhost
};
Next, we’ll add a few options that limit our server to a certain level of
security. Clients can request individual records or transfers of entire zones.
The latter is not safe except for our own slave servers so we will block that
option:
options {
directory "/var/cache/bind";
forwarders {
75.75.75.75; // Main office ISP primary DNS
75.75.76.76; // Main office ISP secondary DNS
};
dnssec-enable no;
dnssec-validation no;
auth-nxdomain no; # conform to RFC1035
listen-on-v6 { none; };
allow-transfer { slaves; };
also-notify { 10.0.0.31; };
version "restricted";
rate-limit {
responses-per-second 10;
// log-only yes;
};
allow-recursion { none; };
additional-from-cache no;
recursion no;
};
Next, we add two “views”, one for our “internal” networks and one for our
“external” networks:
view "internal" {
match-clients { internals; };
allow-query { internals; };
allow-recursion { internals; };
additional-from-cache yes;
recursion yes;
/* insert zones visible internally here */
include "/etc/bind/named.conf.default-zones";
};
view "external" {
match-clients { any; };
/* insert zones visible externally here */
include "/etc/bind/named.conf.default-zones";
};
Then we add a “zone” specification for both our main internal and external
domains in the “internal” view. Note that we only allow private zones to be
updated:
zone "lan.quarium.com" {
type master;
file "/var/lib/bind/lan.quarium.com.zone";
allow-update {key "ns1.lan.quarium.com";};
};
zone "0.0.10.in-addr.arpa" {
type master;
file "/var/lib/bind/0.0.10.in-addr.arpa.zone";
allow-update {key "ns1.lan.quarium.com";};
};
zone "quarium.com" {
type master;
file "/etc/bind/quarium.com.zone";
};
zone "38.255.50.in-addr.arpa" {
type master;
file "/etc/bind/38.255.50.in-addr.arpa.zone";
};
In that case we also need to add the following zone specification to the
“internal” and “external” views in “/etc/bind/named.conf.local”:
zone "quarium.net" {
type master;
file "/etc/bind/others.zone";
};
Slave Configuration
On slave servers, overall the “/etc/bind/named.conf” file looks very similar,
except for the access control lists:
masters masters {
10.0.0.30; // ns1.lan.quarium.com
};
acl internals {
50.255.38.80/29; // Main Office ISP
10.0.0.0/24; // Main Office LAN
127.0.0/24; // localhost
};
key "ddns-key.ns1.lan.quarium.com" {
algorithm hmac-sha256;
secret "<a Base64 string>";
};
key "ddns-key.ns2.lan.quarium.com" {
algorithm hmac-sha256;
secret "<a different Base64 string>";
};
options {
directory "/var/cache/bind";
forwarders {
75.75.75.75; // Main Office ISP primary DNS
75.75.76.76; // Main Office ISP secondary DNS
};
dnssec-enable no;
dnssec-validation no;
auth-nxdomain no; # conform to RFC1035
listen-on-v6 { none; };
allow-transfer { "none"; };
notify no;
version "restricted";
rate-limit {
responses-per-second 1;
log-only yes;
};
allow-recursion { none; };
additional-from-cache no;
recursion no;
};
view "internal" {
match-clients { internals; };
allow-query { internals; };
allow-recursion { internals; };
additional-from-cache yes;
recursion yes;
/* insert zones visible internally here */
include "/etc/bind/named.conf.default-zones";
};
view "external" {
match-clients { any; };
/* insert zones visible externally here */
include "/etc/bind/named.conf.default-zones";
};
The actual zone descriptions are different. For the “internal” zones, we
specify:
zone "lan.quarium.com" {
type slave;
file "internal.lan.quarium.com.zone";
masters { masters; };
};
zone "0.0.10.in-addr.arpa" {
type slave;
file "internal.0.0.10.in-addr.arpa.zone";
masters { masters; };
};
zone "quarium.com" {
type slave;
file "internal.quarium.com.zone";
masters { masters; };
};
zone "38.255.50.in-addr.arpa" {
type slave;
file "internal.38.255.50.in-addr.arpa.zone";
masters { masters; };
};
And for the “external” zones we specify different zone file names, so zone
transfers will not conflict:
zone "quarium.com" {
type slave;
file "external.quarium.com.zone";
masters { masters; };
};
zone "38.255.50.in-addr.arpa" {
type slave;
file "external.38.255.50.in-addr.arpa.zone";
masters { masters; };
};
Once the slave server is enabled and started, we should see log file entries in
“/var/log/named/bind.log” describing the zone transfers:
20-Oct-2018 04:49:20.555 general: info: zone quarium.com/IN/internal: Transfer
started.
20-Oct-2018 04:49:20.556 xfer-in: info: transfer of 'quarium.com/IN/internal' from
10.0.0.40#53: connected using 10.0.0.31#52581
20-Oct-2018 04:49:20.558 general: info: zone quarium.com/IN/internal: transferred
serial 2018101901
20-Oct-2018 04:49:20.558 xfer-in: info: transfer of 'quarium.com/IN/internal' from
10.0.0.40#53: Transfer status: success
20-Oct-2018 04:49:20.558 xfer-in: info: transfer of 'quarium.com/IN/internal' from
10.0.0.40#53: Transfer completed: 1 messages, 13 records, 318 bytes, 0.002 secs
(159000 bytes/sec)
Client Configuration
Clients are configured through their “/etc/netplan/*” file. See network
configuration above.
Chapter 9 - DHCP
Originally, all network interfaces on all systems had to be manually
configured with an IP address. This was tedious and error-prone, so in 1993,
the Dynamic Host Configuration Protocol (DHCP, currently RFC 2131) was
introduced.
Ubuntu uses the Internet Systems Consortium (ISC) DHCP server, which
implements a failover protocol that allows two DHCP servers to redundantly
manage one pool of IP addresses. We can install the DHCP server package
using:
apt install isc-dhcp-server
File Structure
The configuration of the IPV4 DHCP server is stored in the file
“/etc/dhcp/dhcpd.conf” and the configuration for IPV6 is stored in
“/etc/dhcp/dhcpd6.conf”. We will only configure the IPV4 version in our
infrastructure.
Service Operations
Server operations are managed by the “isc-dhcp-server.service”. Some
common operations are:
systemctl enable isc-dhcp-server
systemctl start isc-dhcp-server
systemctl restart isc-dhcp-server
systemctl stop isc-dhcp-server
systemctl disable isc-dhcp-server
We can combine log messages from multiple modules into one stream for
clarity:
journalctl --follow --unit=isc-dhcp-server --unit=named
We can clear the current leases by stopping the DHCP server, emptying the
contents of “dhcpd.leases” and deleting the file “dhcpd.leases~” and then
restarting the DHCP server.
Automatic Allocation
We configure a range of IP addresses for a particular subnet by adding the
following to the configuration file:
subnet 10.0.0.0 netmask 255.255.255.0 {
authoritative;
range dynamic-bootp 10.0.0.100 10.0.0.200;
default-lease-time 3600;
max-lease-time 3600;
option routers 10.0.0.1;
option subnet-mask 255.255.255.0;
option nis-domain "lan.quarium.com";
option domain-name "lan.quarium.com";
option domain-name-servers 10.0.0.30, 10.0.0.31;
option time-offset -28800; # PST
option ntp-servers 10.0.0.30;
}
Static Allocation
Within a subnet we can statically allocate an IP address to a particular MAC
address by adding:
host somehost {
option host-name "somehost.lan.quarium.com";
hardware ethernet 00:17:88:13:66:0e;
fixed-address 10.0.0.201;
}
Dynamic DNS
We can tell a DHCP server to update a DNS zone and the corresponding
reverse zone with the IP addresses and host names it allocates by adding:
ddns-update-style interim;
ddns-domainname "lan.quarium.com";
ddns-rev-domainname "0.0.10.in-addr.arpa";
ignore client-updates;
update-static-leases on;
key "ddns-key.ns1.lan.quarium.com" {
algorithm hmac-sha256;
secret "<a Base64 string>";
};
zone lan.quarium.com. {
primary 10.0.0.30;
key ddns-key.ns1.lan.quarium.com;
}
zone 0.0.10.in-addr.arpa. {
primary 10.0.0.30;
key ddns-key.ns1.lan.quarium.com;
}
Failover Configuration
The failover mechanism relies on the two system clocks being closely
synchronized. All systems in our infrastructure should be configured as NTP
clients of network clocks, so that should not be a problem. First, we need
another key to allow the servers to communicate without outside interference:
dnssec‐keygen ‐a HMAC‐MD5 ‐b 512 ‐n USER DHCP_OMAPI
Observe how the servers behave in the system logs while shutting down and
restarting the primary.
Chapter 10 - LDAP
The Lightweight Directory Access Protocol, or LDAP, is a protocol for
querying and modifying a X.500-based directory service running over
TCP/IP. The current version is LDAPv3 (RFC 4510, a subset of X.500) and
the implementation used in Ubuntu is OpenLDAP.
X.500 is one of those old ISO protocols that were designed by committee (the
International Telecommunications Union (ITU) in the 1980’s), intending to
solve all possible problems for each member of the committee. LDAP was an
attempt to extract a useful subset of functions that can be used over TCP/IP
but even this protocol is essentially obsolete. Unfortunately there is no easily-
configured REST replacement and it is still used heavily by MacOS,
Windows and Linux.
This will ask for a new password for an LDAP administrator account.
File Structure
The installation and the database are stored in “/etc/ldap” and
“/etc/ldap/slapd.d” and by default include the “core”, “cosine”,
“inetorgperson” and “nis” database schemas. We will use the “inetorgperson”
and “nis” schemas for our LDAP directory and we use draft RFC 2307bis for
mapping to Linux authentication.
Typical use will not require modification of any of the configuration files
directly. Most administrative procedures are performed using command line
utilities and may even be performed remotely.
Service Operations
Server operations are managed by the “slapd.service”. Some common
operations are:
systemctl enable slapd
systemctl start slapd
systemctl restart slapd
systemctl stop slapd
systemctl disable slapd
Test if the server is accessible and working properly from any system using:
ldapsearch -h <server> -x -b '<dn>'
The default installation already allows us to add users and user groups to the
LDAP database. To do this conveniently, download the free administration
tool “LDAPAdmin” for Windows from “http://www.ldapadmin.org”. This is
a much better way to maintain an LDAP directory than using command-line
tools and “.ldif” files. If this tool is used to set user passwords, use the “SHA-
512 Crypt” hash setting. Manually place the application in a directory, for
example “C:\Program Files (x86)\LDAPAdmin” and create a shortcut from
there to the desktop.
There are two common methods for associating users with groups: Each user
has a primary group identified by single user attribute “gidNumber” which
must match an existing group “gidNumber”. In addition, groups can have
zero or more “memberUid” attributes each of which must match a
“uidNumber”. As in UNIX and Linux, groups are typically not nested and
many applications that use LDAP cannot authenticate to nested groups. All of
the following command-line operations are much easier to perform using the
“LDAPAdmin” tool described above.
If you are not using the above Windows tool, you could also use the
following command-line tool to change user passwords:
ldappasswd -x -W -D 'cn=admin,dc=lan,dc=quarium,dc=com' --S 'uid=
<username>,ou=people,dc=lan,dc=quarium,dc=com'
This prompts for the new <username> password twice, and then prompts for
the administrator password. Note that you must first change the LDAP
database to use for example SHA-512 encryption for passwords as the
default, since the command does not have a parameter to specify this.
Note that this is one of those unfortunate languages and formats like Python
and Yaml where whitespace and line indentation is significant. Make sure
that blank lines do not have spaces on them and that continuation lines with
dashes do not have spaces around them. We can apply this file with:
ldapadd -Y EXTERNAL -H ldapi:/// -f ~quarium/Scripts/server1_sync.ldif
We can determine if the “syncprov” module has been properly loaded with:
ldapsearch -LLL -Q -Y EXTERNAL -H ldapi:/// -b cn=module{0},cn=config
Extensions
We could create a file “openssh-lpk_openldap.ldif” to allow the addition of
public SSH keys to user accounts:
cat <<EOF >~/Scripts/openssh-lpk_openldap.ldif
dn: cn=openssh-lpk_openldap,cn=schema,cn=config
objectClass: olcSchemaConfig
cn: openssh-lpk_openldap
olcAttributeTypes: {0}( 1.3.6.1.4.1.24552.500.1.1.1.13 NAME 'sshPublicKey' DESC
'MANDATORY: OpenSSH Public key' EQUALITY octetStringMatch SYNTAX
1.3.6.1.4.1.1466.115.121.1.40 )
olcObjectClasses: {0}( 1.3.6.1.4.1.24552.500.1.1.2.0 NAME 'ldapPublicKey' DESC
'MANDATORY: OpenSSH LPK objectclass' SUP top AUXILIARY MAY ( sshPublicKey $uid ) )
EOF
Client Configuration
Some machines may allow users with an LDAP account to log in, for
example to retrieve email or access a version control database like git. Begin
by installing the required packages:
apt install libnss-ldap
During installation you will be asked to provide the URI to your LDAP
server which will be stored in “/etc/ldap.conf”. Multiple servers can be listed
separated by a space:
ldap://ns1.lan.quarium.com ldap://ns2.lan.quarium.com
Select protocol version 3. Do not make the local root a database administrator
(or a password will be saved on the machine in plaintext). Since we set up
our LDAP server on our private network, we do not need to login to it to
authenticate users. This information can be updated later by executing:
dpkg-reconfigure ldap-auth-config
From the menu, choose LDAP and any other authentication mechanisms you
need. You should now be able to log in to the machine using valid LDAP
credentials. The setup can be tested by logging in as a local administrator and
listing all visible LDAP and local accounts:
getent passwd
getent group
Chapter 11 - Email
Electronic mail or “email” is almost as old as operating systems. Early shared
computer systems allowed messages to be exchanged between users that
were both logged in to the same system at the same time, or allowed them to
be stored until a message recipient logged in to a terminal attached to the
system.
For legal purposes, since the actual delivery path of email cannot be
guaranteed over the Internet, in the USA email is treated as “interstate
communications” and therefore subject to Federal law. In many cases
unencrypted email is even routed through potentially adversarial foreign
servers which is the reason its use for example between customers and
medical and financial professionals and government is extremely limited.
After UUCP, the “sendmail” package was used for many years as an MTA to
store and forward email on UNIX and Linux systems. This package was
incredibly flexible and had a very powerful configuration language.
Unfortunately this language was very much like Perl a write-only language
and it was incredibly difficult to create anything but a very basic server with
any level of verifiable security.
As of July, 2020, sendmail only retained a 3.74% market share of 3.8 million
accessible email servers. The market is now split between “Exim” at 56.97%
and “Postfix” at 35.32% market share respectively. The remaining 3.97% is
split among a very large number of lesser-known packages, including
“Microsoft mail” at 0.44%.
There are a very large number of smarter and less scrupulous people in the
world than this author, the authors of the software described here and our
readers and users. We include the following instructions therefore with the
usual software caveat that they are “for entertainment purposes only”:
Postfix
This book describes how to install and configure Postfix due to its simpler
configuration, better adaptation to the Debian/Ubuntu way of handling
configuration files, better security partitioning of the applications and better
queuing of large volumes of mail. Postfix is the default MTA for Ubuntu.
File Structure
Postfix configurations are stored in “/etc/postfix”. The main setting file is
“/etc/postfix/main.cf”. Another file “/etc/postfix/master.cf” controls
scheduling and parameters of various postfix applications. There does not
seem to be a “/etc/default/postfix”. Email is stored in “/var/spool/postfix” and
additional working files are stored in “/var/lib/postfix”.
One thing that you should never do (for example using this file) is to forward
email received at a local address out of the server. Such behavior will
immediately be exploited for spam and will almost as quickly land your
server on a blacklist. This means that none of your legitimate local senders
will be able to communicate with people using common email services like
gmail. The server should only do two things: Forward email from an
authenticated local sender to anywhere, including other local mailboxes, and
receive external email for delivery to a local mailbox.
Once your service grows in popularity you should consider the services of an
email spam filtering service.
Service Operations
Server operations are managed by the “postfix.service”. Some common
operations are:
systemctl start postfix
systemctl restart postfix
systemctl stop postfix
Replace the “-23” with “-12” to show settings that duplicate default settings.
Select a “satellite system” and enter the FQDN of the system and of the
server that will forward SMTP. This does not completely configure a satellite
system. Configure the remaining parameters using:
dpkg-reconfigure postfix
This allows for the setting of a root and postmaster mail recipient, for
example “[email protected]”, more domain names the system may
be known at, for example internal LAN names and other settings for which
the defaults are sufficient.
Install the email security key, certificate and authority chain in the usual
location and with the usual ownerships and permissions in “/etc/ssl” (see the
chapter on HTTP certificates).
If postfix has not been installed on the server (for example as a satellite),
make sure all packages needed for an email server are installed:
apt install postfix postfix-pcre postfix-ldap
Configure the server as an “internet site”. For now, accept the suggested
“mail name”.
Configure the email server firewall to permit access on ports 25 (SMTP), 110
(POP3), 143 (IMAP), 995 (POP3S) and 587 (ESMTPS). If the email server
runs on a dedicated internal virtual machine, also configure the corporate
firewall server to forward traffic on these port to the internal email server
address.
Do not create a root account mailbox since the “/etc/aliases” file will take
care of forwarding.
Do not force synchronous updates to the mail queue since our traffic volume
will be light.
If email will already be running on a separate server, you can disable the
chroot environment by changing a line in “/etc/postfix/master.cf”:
# service type private unpriv chroot wakeup maxproc command + args
smtp inet n - n - - smtpd
The rules in the “_restriction” settings are evaluated in the order specified and
the first rule that matches wins.
Dovecot
Dovecot is a mail delivery agent (MDA). Of 3.8 million email servers
accessible in July of 2020, it has an installed base of 76.22%. No other MDA
has a double-digit-percentage installed base. Microsoft Exchange only has a
1.28% installed base. Of course these numbers are server counts, not end-user
counts.
Installation
Install or complete the installation of the dovecot packages:
apt install dovecot-core dovecot-imapd dovecot-pop3d
File Structure
The configuration files are stored in “/etc/dovecot”. The main configuration
file is “/etc/dovecot/dovecot.conf”. There are a number of specific
configuration files in “/etc/dovecot/conf.d”. There is also a
“/etc/default/dovecot” configuration file for the service.
Service Operations
Server operations are managed by the “dovecot.service”. Some common
operations are:
systemctl start dovecot
systemctl restart dovecot
systemctl stop dovecot
# Authentication configuration
auth_mechanisms = plain login
service auth {
# Postfix smtp-auth
unix_listener /var/spool/postfix/private/dovecot-auth {
mode = 0660
user = postfix
group = postfix
}
}
Spamassassin
Spamassassin is a Perl filter for email. It examines email headers and content
in a variety of ways to determine if a message is likely to be spam. It assigns
a score to each message and the MTA can then decide to forward a message,
forward it with its score or discard it.
Installation
Install the spamassassin package using
apt install spamassassin spamc
File Structure
The configuration files are found in “/etc/spamassassin”. The only file that
may need modification is “/etc/spamassassin/local.cf”. Additional work files
are stored in “/var/lib/spamassassin”.
Service Operations
Server operations are managed by the “spamassassin.service”. Some common
operations are:
systemctl start spamassassin
systemctl restart spamassassin
systemctl stop spamassassin
Filtering
Add a filter to “/etc/postfix/master.cf”:
spamassassin unix - n n - - pipe
user=debian-spamd argv=/usr/bin/spamc -f -e
/usr/sbin/sendmail -oi -f ${sender} ${recipient}
In the same file the smtpd daemon must be told to use the filter:
smtp inet n - n - - smtpd
-o content_filter=spamassassin
Installation
The default server installation of Ubuntu 20.04 does not include MySQL, but
we can install it using:
apt install mysql-server
File Structure
The configuration files for MySQL are stored in “/etc/mysql”. Unlike for
other Linux subsystems, these files apply both to the server component
“mysqld” and the client command “mysql”. The files still follow the old
Windows “.ini” file format. Server and client settings are differentiated
through “[mysql]” and “[mysqld]” sections in these files.
Service Operations
Server operations are managed by the “mysql.service”. Some common
operations are:
systemctl start mysql
systemctl restart mysql
systemctl stop mysql
Configuration
The initial installation does not set a password for the “root” MySQL user
(which is not related to the Linux “root” user account but which has a similar
function) and it installs a number of test features we do not need in a
production installation. One of the first things we must do to secure the
installation is to execute:
mysql_secure_installation
The script will ask to install the “validate password” plugin. This is a good
idea, so reply “y”.
The script will then ask for a password for the MySQL “root” account. It will
also ask for a repeat for confirmation.
The script will then ask to remove anonymous users. This is useful, so reply
“y”.
The script will then ask to disable remote “root” logins. We will always login
locally or through an encrypted SSH tunnel, so reply “y”.
The script then asks to remove the test database. We do not need it, so reply
“y”.
The script then asks to reload the privilege tables to apply the changes. Reply
“y”.
When upgrading from some older configurations you may want to add:
lower_case_table_names = 1
and:
QUIT;
After that, comment out the failing “mysql_upgrade” around line 320 in the
file
vi /var/lib/dpkg/info/mysql-server-5.7.postinst
This allows access via the network and from applications for example
Symfony, Drupal and MediaWiki. Then execute:
FLUSH PRIVILEGES;
and:
QUIT;
Use the root user to verify all schemas or another user to verify only the
subset the user can access. In some cases MySQL table names are case-
sensitive. If such databases are transferred for example from an old system
with a case-insensitive file system to Ubuntu with a case-sensitive file system
tables may need to be renamed, for example with the following script:
#!/bin/bash
# uppercase_tables.sh -- rename all database tables to uppercase
DB_HOST=<host>
DB_SCHEMA=<schema>
DB_USER=<user>
DB_PASSWORD=<password>
EXISTING_TABLES=`echo "show tables;" | mysql -u ${DB_USER} --
password=${DB_PASSWORD} -h ${DB_HOST} --skip-column-names ${DB_SCHEMA}`
for EXISTING_TABLE in ${EXISTING_TABLES}
do
UPPERCASE_TABLE=`echo "${EXISTING_TABLE}" | tr "[:lower:]" "[:upper:]"`
if [ "${EXISTING_TABLE}" != "${UPPERCASE_TABLE}" ]
then
echo "ALTER TABLE ${EXISTING_TABLE} RENAME TO ${UPPERCASE_TABLE};"
fi
done | mysql -u ${DB_USER} --password=${DB_PASSWORD} -h ${DB_HOST} ${DB_SCHEMA}
Replication
The most common replication method uses binary logs. At some point GTIDs
(Global Transaction IDs) will gain popularity.
A cluster consists of two servers in a multi-master replication configuration
plus zero or more remote slaves that access the cluster through an encrypted
VPN connection. Ensure all servers have the same version of MySQL (or
slaves higher than the master).
On all servers, create a user “replicator” with a not terribly secret password,
allow it to log in from “%” and grant it only the global privilege
“REPLICATION_SLAVE”. Note that the account name and password are
case sensitive.
Also clear out any prior binary logs from “/var/lib/mysql” and empty out the
“/var/log/mysql/error.log”.
On each master in turn, log into mysql and determine the name and location
of the binary log:
SHOW MASTER STATUS;
On each master or slave replicating off that master, start following the binary
log using:
CHANGE MASTER TO MASTER_HOST='10.0.0.<ip>', MASTER_USER='replicator',
MASTER_PASSWORD='<secret>', MASTER_LOG_FILE='<other hostname>-bin.000001',
MASTER_LOG_POS=156;
START SLAVE;
At this point schemas, tables, records and users created on one server will be
replicated to the other. Also useful is:
STOP SLAVE;
To import a specific database into MySQL, from the Linux command line
use:
mysql -u <user_name> -p <db_name> < <filename.sql>
Chapter 13 – Version Control
Configure the preferred editor for commit messages and for various settings:
git config --global core.editor vi
git config --global user.name "Bart Besseling"
git config --global user.email [email protected]
git config --global push.default simple
File Structure
Git does not install a server service component and there is no preferred
location for git repositories. A typical use is to provide a repository directory
that can be accessed by server user accounts that are members of a particular
repository user group.
On the development server, the staging server and the live server execute:
cd /var/www
git clone ssh://[email protected]/home/git/data/Repo.git .
Chapter 14 - HTTP
Apache Server
On Ubuntu 20.04 we use the Apache 2.4 web server. Created in 1995,
Apache became the first web server software to serve more than 100 million
websites in 2009. As of June 2020, it was estimated to serve 25% of 189
million active web sites. Its main competitor is the free nginx web server
serving 37% but nginx does not have the robust open-source history of
Apache. Microsoft’s proprietary server is the third runner up at 11%. All
other competitors serve only single-digit percentages.
Installation
By default, the Apache web server is not installed but it can be added to an
installation with
apt install apache2 apache2-utils w3m
File Structure
The configuration of the Apache web server is located in “/etc/apache2”. In
this directory, the main configuration file is “/etc/apache2/apache2.conf”, but
most configuration can be found in three pairs of directories
“/etc/apache2/conf*”, “/etc/apache2/mods*”, “/etc/apache2/sites*” that
contain links to available and enabled settings.
Web sites are typically stored in “/var/www” and must be accessible to the
“www-data” user and the “www-data” group.
Log files are located in “/var/log/apache2”.
Service Operations
Server operations are managed by the “apache2.service”. Some common
operations are:
apachectl start
systemctl start apache2
apachectl restart
apachectl graceful
systemctl restart apache2
apachectl stop
systemctl stop apache2
In some cases we want to use “.htaccess” files included with web applications
like Drupal. Add the following to all files “/etc/apache2/sites-available/*”:
<Directory "/var/www/html">
AllowOverride All
</Directory>
If a system is not going to serve its own web site, redirect any browsers to the
main corporate site in “/var/www/html/.htaccess”:
Redirect 301 / http://www.quarium.com
In some cases we want clients not to cache content. Add the following to the
proper files “/etc/apache2/sites-available/*”:
<IfModule mod_expires.c>
ExpiresActive On
ExpiresDefault "access"
</IfModule>
Multiple PEM blobs may be combined into one file in any order, so an SSL
web site really only needs one security file “quarium.pem” which should
contain its private key, its domain certificate and the certificate of the CA
(certificate authority) which has issued the domain certificate. It is identified
to Apache using the “SSLCertificateFile” directive. The disadvantage of this
is that the key file is less secure, so we DO NOT use this method.
Generate a domain certificate request using the private key and some
organizational input. Note the exact spelling of the organization name, with
case and punctuation. Our entire infrastructure should only need one domain
certificate for “*.quarium.com”:
openssl req -new -key star.quarium.com.key -out star.quarium.com.csr
Country Name (2 letter code) [AU]:US
State or Province Name (full name) [Some-State]:California
Locality Name (eg, city) []:San Francisco
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Quarium, Inc.
Organizational Unit Name (eg, section) []:
Common Name (for example server FQDN or YOUR name) []:*.quarium.com
Email Address []:[email protected]
Please enter the following 'extra' attributes to be sent with your certificate
request
A challenge password []:
An optional company name []:
Provide the request to the CA who will return its own CA certificate file and
the new domain certificate file.
Verify the certificate authority certificate file and the new user certificate
using:
openssl x509 -in ca.quarium.com.crt -noout -text
openssl x509 -in star.quarium.com.crt -noout -text
Install the private key and the certificates (concatenated into one file):
cp star.quarium.com.key /etc/ssl/private
cat star.quarium.com.crt ca.quarium.com.crt >/etc/ssl/certs/star.quarium.com.pem
cp star.quarium.com.key /etc/ssl/private
cp star.quarium.com.crt /etc/ssl/certs/star.quarium.com.pem
cp ca.quarium.com.crt /etc/ssl/certs/ca.quarium.com.pem
Or for the official domain certificate:
cp star.quarium.com.key /etc/ssl/private
cat star.quarium.com.crt ca.thawte.com.crt >/etc/ssl/certs/star.quarium.com.pem
cp star.quarium.com.key /etc/ssl/private
cp star.quarium.com.crt /etc/ssl/certs/star.quarium.com.pem
cp ca.thawte.com.crt /etc/ssl/certs/ca.thawte.com.pem
although some older iOS browsers do not like all parts to be in one file.
Make sure that the site configuration includes a proper server name in the
“sites-available” files or some browsers and Java 7 will not negotiate SNI
correctly:
ServerName www.quarium.com
ServerAlias *.quarium.com
In some cases a private key may have been protected with a password, which
would require that password to be entered each time a server or service is
restarted. Remove the password from the key file using:
openssl rsa -in protected.star.quarium.com.key -out star.quarium.com.key
For diagnostics, a private key file can be decomposed into its components
using:
openssl rsa -in star.quarium.com.key -text -noout
For diagnostics, a private key file can be used to extract a public key: (Some
applications may need an -RSAPublicKey_out option.)
openssl rsa -in star.quarium.com.key -pubout -out star.quarium.com.pub
In some cases a certificate may be stored in its stricter DER format. Convert
back and forth using:
openssl x509 -in star.quarium.com.crt -outform der -out star.quarium.com.der
openssl x509 -in star.quarium.com.der -inform der -outform pem -out
star.quarium.com.crt
There are a number of useful public free HTTP HTTPS and HTML check
services on the Internet. They will test a set of standard configuration criteria
and report any problems found for a specified domain. Most problems will be
found in the rapidly evolving security area where all SSL and all TLS
protocols except TLS 1.3 are now more or less seriously compromised.
Basic Authentication with File, PAM or LDAP
We only use basic authentication (RFC 7617) for secure sites. The encryption
of the secure site is much better than that of digest authentication, and digest
authentication is too complicated for non-browser clients.
Digest Authentication
Do not use digest authentication! It is not sufficiently secure and it is too
complicated for some REST clients. But just to see how it works, create a
password file in “/etc/apache2”:
htdigest -c htdigest "Quarium Ops" quarium
The PAM and LDAP variations should be obvious, but should not be used
either.
WebDAV Configuration
Web Distributed Authoring and Versioning (WebDAV, RFC 4918) allows
users to POST files up to a web site. Desktop Operating Systems like
Windows and MacOS can mount DAV servers as remote file systems.
WebDAV can operate securely over HTTPS.
Disable the default log rotation because it interferes with our log file naming
and with awstats:
mv /etc/logrotate.d/apache2 /etc/apache2/logrotate.old
Installation
Install the awstats package using:
apt install awstats
Make sure the “/var/log/apache2” folder can be read by the “www-data” user:
chmod 755 /var/log/apache2
File Structure
The configuration for awstats is stored in “/etc/awstats”. The main
configuration file is “/etc/awstats/awstats.conf”. Any local parameters can be
placed in “/etc/awstats/awstats.conf.local”. Additional service parameters can
be configured in “/etc/default/awstats”.
Awstats does not have a service daemon. Updates are scheduled through the
“/etc/cron.d/awstats” script. The script is “/usr/lib/cgi-bin/awstats.pl”. Asset
files (icons etc.) are in “/usr/share/awstats”.
Configuration
On a server with a single site, edit “awstats.conf.local” or duplicate the
configuration file to “awstats.quarium.conf” for example “quarium.com.conf”
for each separate site.
Installation
To install the default version of PHP with a few useful extensions, execute
the following:
apt install php libapache2-mod-php php-mysql php-cli
apt install php-curl php-gd php-mbstring php-ldap php-intl php-zip
apt install php-uploadprogress
apt install php-xml
In the rare case when it is necessary to compile PHP extensions from source
we could add:
apt install php-dev
File Structure
The various PHP configuration files are located in “/etc/php/7.4”. Two
subdirectories “apache2” and “cli” separately configure the web server and
the commandline settings. In each, the main configuration file is “php.ini”. A
directory “conf.d” is used for both system and user configuration files.
Configuration
Add the following in a “conf.d/99-quarium.ini” file for each environment:
cat <<EOF >/etc/php/7.4/apache2/conf.d/99-quarium.ini
memory_limit = 256M
max_execution_time = 60
date.timezone = "UTC"
date.default_latitude = 37.58417
date.default_longitude = -122.365
EOF
cp /etc/php/7.4/apache2/conf.d/99-quarium.ini /etc/php/7.4/cli/conf.d/99-
quarium.ini
chmod 777 /etc/php/7.4/*/conf.d/99-quarium.ini
You can obtain exact latitude and longitude information for a server location
from Google maps.
Upgrading
To upgrade a default version to a newer version, for example the version 7.2
used in Ubuntu 18.04 LTS to 7.4, first add some new repositories:
add-apt-repository ppa:ondrej/php
add-apt-repository ppa:ondrej/apache2
Composer Configuration
Composer is by far the most popular application-level package manager
utility for PHP libraries, including those published on the Packagist
repository.
GeoIP configuration
First install the corresponding packages:
apt install geoipupdate geoip-database php-geoip
The free databases are not updated frequently, or at all anymore, it seems.
Updating them now requires a license key. Comment-out the update in
“/etc/cron.d/geoipupdate”:
# 47 6 * * 3 root test -x /usr/bin/geoipupdate && /usr/bin/geoipupdate
Chapter 16 - Wiki
In the 1960’s, Ted Nelson’s project Xanadu was the first to explore
“hypertext” and later “stretchtext”. When an actual implementation never
materialized, in the 1980’s, Tim Berners Lee was among several others
experimenting with its concepts in the form of a “World Wide Web”. In
1995, Ward Cunningham published the first“WikiWikiWeb”, a user-editable
website. On Monday 15 January 2001, “wikipedia.org” went online and by
September of the same year it was widely popular with over 10,000 entries.
After 2002, Wikipedia ran on its own PHP wiki software, “MediaWiki”
which was published as free open-source software in 2003. In June 2020,
Wikipedia exceeded 50 million articles across 310 language editions.
MediaWiki Server
Installation
On the git version control server, create an empty repository for a new
project:
git init --bare someproject.git
Make sure the virtual machine that will serve the wiki(s) has an SSH private
key “~root/.ssh/id_rsa” and make sure that the contents of the corresponding
“~root/.ssh/id_rsa.pub” have been added to a user account on the git server
that has access to the repository.
Remove all existing files from “/var/www/html” and clone the git repository
into the “/var/www/html” directory or a subdirectory:
git clone ssh:…someproject.git .
Create a subfolder “backups” and inside it retrieve the current release of the
MediaWiki software (1.34 as of this writing):
mkdir backups
cd backups
wget https://releases.wikimedia.org/mediawiki/1.34/mediawiki-1.34.2.tar.gz
Unpack the software and move it to the top level of the repository:
tar xzvf mediawiki-1.34.2.tar.gz
(cd mediawiki-1.34.2 && find . -print | cpio -pduvm ../..)
rm –rf mediawiki-1.34.2
Make sure the web directory has the ownerships needed for access by the
Apache web server:
chown –R www-data:www-data /var/www/html
Now is a good time to add and commit the initial untouched release files into
the git repository and push them to the git server:
git add –all && git commit && git push
Next, create a MySQL database with a user and password for the wiki.
File Structure
The configuration of the wiki is stored in a file “LocalSettings.php” in the top
directory of the application. If you are using the same repo for multiple
separate wiki application servers, you can keep these local settings out of the
repo using “.gitignore” and instead make copies of the different
configurations in a “backups” subdirectory you can also use to store database
backups.
Site Configuration
Point a web browser to the URL that serves the wiki and follow the
configuration instructions. The first page merely says that the wiki has not
been set up yet and provides a link to do so.
Select the languages for the user and the wiki. Both default to “en – English”.
The next screen provides the results of a check if the server has been set up
properly. Follow the instructions provided to make any corrections necessary.
The next two pages ask for the database host (“localhost”), the database
name, the database user and the password.
The next page asks for a wiki name, for example “Company Project Wiki”
and for the account name of the “webmaster”, another password and an email
address.
For basic wikis this is sufficient but for more secure corporate wikis we can
select “private wiki”. This requires all users to log in before reading or
editing any wiki topics.
Set the email return address to an account that is able to receive comments
from users, for example “[email protected]”.
Decide if you want to enable watchlist and user talk page notifications.
You may install a 135 by 135 pixel RGB or RGBA logo image in the
“images” directory and refer to it from “LocalSettings.php”:
$wgLogo = "$wgResourceBasePath/images/<logo name>.png";
Make sure this and all files in the wiki directory have the correct ownership
and permissions:
chown –R www-data:www-data /var/www/html
It should now be possible to log into the wiki using the webmaster account.
Then, the database that stores the wiki content should be backed up and
possibly also stored in the version control database. A backup can be made
using:
mysqldump --host=localhost --user=<user> --password=<password> <database>
>backups/<database>Content.sql
Keep the top-level “LocalSettings.php” file out of the git repository using a
file “.gitignore” containing:
LocalSettings.php
Make sure that all files can be accessed by the web server:
chown -R www-data:www-data /var/www/html
Then add the new backup information into the version control database:
git add --all && git commit && git push
Chapter 17 - Blog
WordPress Server
Wordpress is an open source content management system (CMS) written in
PHP that is mostly used for blogging. The software was first released in 2003
and as of October 2018 it had an installed base of between 19 and 76 million
sites.
Installation
On the git version control server, create an empty repository for a new
project:
git init --bare someproject.git
Make sure the virtual machine that will serve the blog(s) has an SSH private
key “~root/.ssh/id_rsa” and make sure that the contents of the corresponding
“~root/.ssh/id_rsa.pub” have been added to a user account on the git server
that has access to the repository.
Remove all existing files from “/var/www/html” and clone the git repository
into the “/var/www/html” directory or a subdirectory:
git clone ssh:…someproject.git .
Create a subfolder “backups” and inside it retrieve the current release of the
Wordpress software:
mkdir backups
cd backups
wget https://wordpress.org/latest.tar.gz
Unpack the software and move it to the top level of the repository:
tar xzvf latest.tar.gz
(cd wordpress && find . -print | cpio -pduvm ../..)
rm –rf wordpress
cd ..
cp wp-config-sample.php wp-config.php
mkdir wp-content/upgrade
find . -type d -exec chmod 750 {} \;
find . -type f -exec chmod 640 {} \;
Make sure the web directory has the ownerships needed for access by the
Apache web server:
chown –R www-data:www-data /var/www/html
File Structure
The configuration of the blog is stored in a file “wp-config.php” in the top
directory of the application. If you are using the same repo for multiple
separate blog application servers, you can keep these local settings out of the
repo using “.gitignore” and instead make copies of the different
configurations in a “backups” subdirectory you can also use to store database
backups.
Site Configuration
Create a MySQL database with a user and password for the blog and add the
values for “DB_NAME”, “DB_USER” and “DB_PASSWORD” to the “wp-
config.php” file.
Replace the placeholder values in the “wp-config.php” file with the output.
Point a web browser to the URL that serves the blog and follow the
configuration instructions.
Enter the site name and a user name (“webmaster”), password and email
address for an administrative user account.
Then, the database that stores the blogi content should be backed up and
possibly also stored in the version control database. A backup can be made
using:
mysqldump --host=localhost --user=<user> --password=<password> <database>
>backups/<database>-content.sql
Make sure that all files can be accessed by the web server:
chown -R www-data:www-data /var/www/html
Then add the new backup information into the version control database:
git add --all && git commit && git push
Chapter 18 – CMS
Drupal Server
Drupal is an open source content management system (CMS) written in PHP.
The project was started in 2001 and as of June 2020 it was up to version 9.0.1
and it had an installed base of 1.2 million sites.
Installation
On the git version control server, create an empty repository for a new
project:
git init --bare someproject.git
Make sure the virtual machine that will serve the CMS has an SSH private
key “~root/.ssh/id_rsa” and make sure that the contents of the corresponding
“~root/.ssh/id_rsa.pub” have been added to a user account on the git server
that has access to the repository.
Remove all existing files from “/var/www/html” and clone the git repository
into the “/var/www/html” directory or a subdirectory:
git clone ssh:…someproject.git .
Create a subfolder “backups” and inside it retrieve the current release of the
Drupal software:
mkdir backups
cd backups
wget https://www.drupal.org/download-latest/tar.gz
Unpack the software and move it to the top level of the repository:
tar xzvf tar.gz
(cd drupal-9.0.1 && find . -print | cpio -pduvm ../..)
rm –rf drupal-9.0.1
Next, create a MySQL database with a user and password for the CMS.
File Structure
The configuration of the content management system is stored in a file
“sites/default/settings.php” in the application. If you are using the same repo
for multiple separate content management system application servers, you
can keep these local settings out of the repo using “.gitignore” and instead
make copies of the different configurations in a “backups” subdirectory you
can also use to store database backups.
Site Configuration
Point a web browser to the URL that serves the CMS and follow the
configuration instructions.
Verify the installation requirements are met, ignore only a possible “clean
URL” notice for now and “continue anyway”.
Enter the database schema name and the database user name and password.
Observe the initial home page for the site (which will automatically log you
in to the administrative user account) and make sure the URL rewrite
specified in the “.htaccess” file is working by selecting any administrative
menu entry.
Check the status report for any problems with the configuration.
Use the “drupal.org” website to obtain the “tar.gz” URL for the “bootstrap 4”
theme. Then use the “appearance” menu to install and enable it as the default.
Use the “drupal.org” website to obtain the “tar.gz” URL for the “admin
toolbar” module. Then use the “extend” menu to install and enable it and its
sub-modules.
Create a new basic page with title “Access Denied” with URL “/access-
denied” and the content:
We're sorry, but you must have permission to view the page you requested.
If you are already a registered member of this site, please try logging in.
If you are not a member, you need to join us.
If you have any questions about our site or group, please feel free to contact us.
Create a new basic page with title “Page Not Found” with URL “/page-not-
found” and the content:
We're sorry, but the page you were looking for currently does not exist.
We redesign our site frequently and many pages may have changed.
If you are unable to find something on our new site or have a question about our
site or services feel free to contact us.
In Configuration -> System -> Basic Site Settings, inspect and complete the
settings including the “/access-denied” and “/page-not-found” pages.
In Configuration -> People -> Account Settings, set the name of the
anonymous user to “guest”. In Appearance -> Settings, set the logo image
and the favicon.
Create a user with role “Administrator” and one with role “Authenticated
User” and verify that the site can send email with the proper “from” address.
Then, the database that stores the CMS content should be backed up and
possibly also stored in the version control database. A backup can be made
using:
mysqldump --host=localhost --user=<user> --password=<password> <database>
>backups/<database>-content.sql
Also back up the configuration file using:
cp sites/default/settings.php backups/<database>-settings.php
Make sure that all files can be accessed by the web server:
chown -R www-data:www-data /var/www/html
Then add the new backup information into the version control database:
git add --all && git commit && git push
Chapter 19 – Framework
Symfony Server
Symfony is a PHP web application framework and a set of reusable PHP
components/libraries published as free software since 2005. It is sponsored
by SensioLabs, a French software developer and professional services
provider. Symfony is currently the second most popular web framework after
Laravel and ahead of CodeIgniter and Zend, but it has such major advantages
that it has been adopted by other major PHP frameworks such as Laravel
itself and Drupal for their internal functionality.
Installation
On the git version control server, create an empty repository for a new
project:
git init --bare someproject.git
Make sure the virtual machine that will serve the site has an SSH private key
“~root/.ssh/id_rsa” and make sure that the contents of the corresponding
“~root/.ssh/id_rsa.pub” have been added to a user account on the git server
that has access to the repository.
Create a subfolder “backups” and inside it retrieve the current release of the
Symfony software:
mkdir temp backups
cd temp
composer create-project symfony/skeleton .
find . -print | cpio -pduvm ..
cd ..
rm –rf temp
Make sure the web directory has the ownerships needed for access by the
Apache web server:
chown –R www-data:www-data /var/www/html
Unlike the other PHP web applications, Symfony does not serve its top-level
directory, but only its “public” subdirectory. In the “/etc/apache2/sites-
available/*” files, make the following change:
DocumentRoot /var/www/html/public
File Structure
Typically, the local configuration parameters of a Symfony application are
passed in as environment values by the web server. In our case they would be
stored in the “/etc/apache2/sites-available/*” files. In these cases, the git repo
will not contain any server-specific configuration files but only the
application and its assets.
Site Configuration
Point a web browser to the URL that serves the site. The initial version of
Symfony will not have any functionality other than an information page.
The second book in this series explains how to turn this initial distribution
into a complete application for serving web pages and a REST API.
Chapter 20 - Global Load Balancing
Our infrastructure consists of one or more firewall servers and one or more
application servers. The purpose of the firewall servers is to isolate the
application servers from the public Internet. The purpose of the application
servers is to run web sites and REST APIs and various supporting DHCP,
DNS, LDAP, SMTP, POP3 and IMAP applications on the internal network.
The purpose of having multiple servers for both firewall functions and
application functions is that we can perform maintenance on any one server
without disabling the application service.
In addition, the configuration described here provides load balancing over all
available physical and virtual machines. This must not only work over the
entire world, but by preference, requests from users in particular geographic
locations should be served by application servers located nearest in terms of
jurisdictions and then Internet communication hops and bandwidth. Some
servers will be located in jurisdictions that require information about their
citizens to be stored inside the jurisdiction only. Other jurisdictions will on
occasion or permanently prevent their citizens from communicating with
servers located outside the jurisdiction.
The remote system then tries to contact each server in turn, using either a
UDP or a TCP protocol, until it succeeds in establishing a connection to one
of the firewall servers. The firewall servers forward all requests from remote
systems to application servers on the internal network.
Cli IP Cli Port FW IP1 FW port1 FW IP2 FW port2 App IP App Port
A1 P1 A2 P2 A3 P3 A4 P4
The firewall server may have multiple public and private addresses, so the
table must also remember which of its own addresses A2 and A3 were used.
The tricky part in any NAT is how to allocate port numbers P3. As long as a
particular combination [A3, P3, A4, P4] is unique then a response from an
application server [A4, P4] to the NAT [A3, P3] can be returned to the
correct client [A1, P1]. If this combination (table key) is already allocated
and is still active, another P3 must be calculated. Linux “netfilter” originally
chooses P3 = P1 and for additional connections [A1, P1] it increments a
previously used P3. This predictable behavior makes NAT traversal
algorithms possible.
We could also use this same mechanism to forward all requests for the web
protocols HTTP and HTTPS to a single application server. Instead, we may
want to operate a number of separate web and REST API applications, and in
that case we must forward such requests based on the HTTP header of the
request. This is beyond the capabilities of the netfilter modules and we will
operate a “reverse proxy” Apache web server on each firewall server.
Separation by Domain
When we separate applications by domain name, we create a separate
“quarium.com” DNS entry that points to our firewall servers for each
application. This domain is then load-balanced as described above.
Then we create one Apache configuration file for the HTTP virtual host and a
separate one for the HTTPS virtual host, for example “<nnn>-quarium.conf”
and “<nnn>-quarium-ssl.conf”:
<VirtualHost *:80>
ServerName quarium.com
ServerAlias *.quarium.com
…
</VirtualHost>
and
<IfModule mod_ssl.c>
<VirtualHost _default_:443>
ServerName quarium.com
ServerAlias *.quarium.com
…
</VirtualHost>
</IfModule>
The Apache configuration files are processed in reverse sorted order and the
files with the alphabetically first name, for example “000”, will match
requests that do not specify a domain. Requests for specified domains that are
not configured will be rejected by the Apache server.
Each application server will have its own Apache web server with its own
configuration files, typically “000-default.conf” and “default-ssl.conf” which
serve the local application. If we let the firewall server handle the secure
channel, we only need to enable the “000-default.conf” file on the application
server.
Separation by Path
In addition to separation by domain, we can separate applications for a single
domain by URL path. We must make sure that these paths do not collide
between applications and in some cases they should not be obvious to users.
We can for example choose randomized paths in the same way we choose
randomized passwords. In other cases they can be obvious, for example
“quarium.com/wiki”.
In this example, we’ll use two paths “quarium.com” and
“quarium.com/babUb4HAWret”. Note that when a web server is asked for
non-file paths like this, it will actually redirect the request to the directory
path “quarium.com/babUb4HAWret/” and then serve one of the files
specified in a “DirectoryIndex” directive, for example “index.php”.
We’ll configure URL paths by creating a separate definition for each path in
both HTTP and HTTPS firewall server configuration files for the domain (or
only in the HTTPS file, if we have redirected all HTTP requests). We can
serve as many different paths as needed. In Apache, each URL application
path is called a “location” and for each location we will perform a reverse
proxy to the proper application server. The Apache web server does this
using the following optional modules:
a2enmod proxy
a2enmod proxy_http
a2enmod proxy_html
a2enmod proxy_balancer
a2enmod substitute
To operate these modules, we add a few directives that are global to each
virtual host. This first directive tells the proxy module to disable its “forward
proxy” and only operate in its “reverse proxy” functions. This is critical for
the security of the firewall:
ProxyRequests off
We then add a set of directives specific to each location in each virtual host
for the “mod_proxy” module:
ProxyPass /babUb4HAWret http://somehost2.lan.quarium.com
<Location /babUb4HAWret>
SetEnv filter-errordocs
ProxyPassReverse http://somehost2.lan.quarium.com
ProxyPassReverseCookieDomain lan.quarium.com quarium.com
ProxyPassReverseCookiePath / /babUb4HAWret/
</Location>
ProxyPass / http://somehost1.lan.quarium.com/
<Location />
SetEnv filter-errordocs
ProxyPassReverse http://somehost1.lan.quarium.com/
ProxyPassReverseCookieDomain lan.quarium.com quarium.com
</Location>
These directives remap each incoming request HTTP header to the proper
application server and if the application server then responds, for example
with a redirect location HTTP header, it maps those locations in reverse.
Note that the use or omission of trailing “/” in the entire section is critically
important but is definitely not always obvious.
If we try these by themselves, we’ll notice that any links inside the HTML
content (for example in the <a href=””> tags) still point to the application
server and not to the public URL path. To make this work, we’ll use the
“proxy_html” module. We expand the location to:
ProxyPass /babUb4HAWret http://somehost2.lan.quarium.com
<Location /babUb4HAWret>
SetEnv filter-errordocs
ProxyPassReverse http://somehost2.lan.quarium.com
ProxyPassReverseCookieDomain lan.quarium.com quarium.com
ProxyPassReverseCookiePath / /babUb4HAWret/
SetOutputFilter INFLATE;DEFLATE;
ProxyHTMLEnable on
ProxyHTMLURLMap / /babUb4HAWret/ c
ProxyHTMLURLMap http://somehost2.lan.quarium.com /babUb4HAWret c
</Location>
ProxyPass / http://somehost1.lan.quarium.com/
<Location />
SetEnv filter-errordocs
ProxyPassReverse http://somehost1.lan.quarium.com/
ProxyPassReverseCookieDomain lan.quarium.com quarium.com
SetOutputFilter INFLATE;DEFLATE;
ProxyHTMLEnable on
ProxyHTMLURLMap http://somehost1.lan.quarium.com/ / c
ProxyHTMLURLMap http://somehost1.lan.quarium.com / c
</Location>
The output filter allows us to map URLs in compressed content as well. The
“proxy_html” filter is automatically inserted between the “INFLATE” and
“DEFLATE” filters. And again we want to map both URLs that include a
host name and those that do not.
Strangely, for the “/” location in the secure configuration only, the two
“ProxyHTMLURLMap” directives must be reversed!!!
We do not want to expand just any “/” or URL in the HTML content. Apache
will only rewrite paths inside attributes of elements listed in
“ProxyHTMLLinks” directives. Fortunately, a good default set is part of the
standard configuration of the “proxy_html” module and we do not have to
add any in our own configuration files. This is the standard set in “mods-
available/proxy_html.conf” in Ubuntu 20.04 LTS:
ProxyHTMLLinks a href
ProxyHTMLLinks area href
ProxyHTMLLinks link href
ProxyHTMLLinks img src longdesc usemap
ProxyHTMLLinks object classid codebase data usemap
ProxyHTMLLinks q cite
ProxyHTMLLinks blockquote cite
ProxyHTMLLinks ins cite
ProxyHTMLLinks del cite
ProxyHTMLLinks form action
ProxyHTMLLinks input src usemap
ProxyHTMLLinks head profile
ProxyHTMLLinks base href
ProxyHTMLLinks script src for
The only addition we might like to add to our own site configuration file is:
ProxyHTMLLinks button formaction
Now all URLs in HTTP headers and HTML content are mapped correctly.
But this is not the case for any supporting “.js”, “.css” and “.json” files or the
JSON output of an API.
If you then request any of these files, you’ll see the message: “Non-HTML
content; not inserting proxy-html filter” This means that the “proxy_html”
filter is only used for HTML content. The documentation is explicit on that
point as well: “Note that the proxy_html filter will only act on HTML data
(Content-Type text/html or application/xhtml+xml) and when the data are
proxied.”
The filter does process CSS and JavaScript, but only if they are embedded in
the text of the HTML document! Even this could be very dangerous: In
HTML, the “/” character only occurs in URLs and outside element attributes
as content. In CSS and JavaScript, comments are delimited by “/*” and “*/”.
In JavaScript we can also use “//”. We do not want these “/” characters
interpreted as the root URL and have them expanded into
“/babUb4HAWret*”. This is the reason for the “c” flag at the end of the line:
ProxyHTMLURLMap / /babUb4HAWret/ c
The flag prevents the rule from being applied in embedded CSS and
JavaScript. This does mean that an embedded “window.location.href = ‘/’;”
may not have the intended effect.
Clearly we’ll need some other way to correctly reverse proxy secondary files
that may contain URLs. Fortunately, the “.js” and “.css” files are always used
in conjunction with HTML and we can employ the best practice of only using
relative URLs in these files.
A bigger problem exists with “.json” files (or really any other textual data
files) and with JSON API output. For this we’ll need to add a separate output
filter and then we tell it to substitute the external URLs for the internal URLs.
We could complicate things more by trying to map relative URLs correctly as
well, but instead we recommend the best practice of only embedding absolute
URLs in REST responses. The final reverse proxy location now looks like
this:
ProxyPass /babUb4HAWret http://somehost2.lan.quarium.com
<Location /babUb4HAWret>
SetEnv filter-errordocs
ProxyPassReverse http://somehost2.lan.quarium.com
ProxyPassReverseCookieDomain lan.quarium.com quarium.com
ProxyPassReverseCookiePath / /babUb4HAWret/
SetOutputFilter INFLATE;DEFLATE;
ProxyHTMLEnable on
ProxyHTMLURLMap / /babUb4HAWret/ c
ProxyHTMLURLMap http://somehost2.lan.quarium.com /babUb4HAWret c
AddOutputFilterByType INFLATE;SUBSTITUTE;DEFLATE application/hal+json
application/json
Substitute
"s!http://somehost2.lan.quarium.com!https://www.quarium.com/babUb4HAWret!"
</Location>
ProxyPass / http://somehost1.lan.quarium.com/
<Location />
SetEnv filter-errordocs
ProxyPassReverse http://somehost1.lan.quarium.com/
ProxyPassReverseCookieDomain lan.quarium.com quarium.com
SetOutputFilter INFLATE;DEFLATE;
ProxyHTMLEnable on
ProxyHTMLURLMap http://somehost1.lan.quarium.com/ / c
ProxyHTMLURLMap http://somehost1.lan.quarium.com / c
AddOutputFilterByType INFLATE;SUBSTITUTE;DEFLATE application/hal+json
application/json
Substitute "s!http://somehost1.lan.quarium.com!https://www.quarium.com!"
</Location>
This should now work as expected, but there are a surprising number of
special cases. We need to test if this is operating as expected in each case. To
do this we create a small test application.
We can test if “.css” (and “.js”) files are processed correctly with a new file
“css/proxytest.css” (and some corresponding “images/proxytest1.png” and
“images/proxytest2.jpg”):
/* proxytest.css */
body {
background-image: url('../images/proxytest2.jpg');
}
We can test if URLs in response headers are also mapped correctly with a file
“proxytest3.php”:
<?php
header("Location: http://somehost.lan.quarium.com /proxytest1.html");
http_response_code(307);
exit;
?>
We can now inspect if the mappings are correct using for example
curl --trace - --location http://www.quarium.com/babUb4HAWret
curl --trace - --location http://www.quarium.com/babUb4HAWret/
curl --trace - --location http://www.quarium.com/babUb4HAWret/proxytest1.html
curl --trace - --location http://www.quarium.com/babUb4HAWret/proxytest1.php
curl --trace - --location http://www.quarium.com/babUb4HAWret/proxytest3.php
curl --trace - --location http://www.quarium.com
curl --trace - --location http://www.quarium.com/
curl --trace - --location http://www.quarium.com/proxytest1.html
curl --trace - --location http://www.quarium.com/proxytest1.php
curl --trace - --location http://www.quarium.com/proxytest3.php
Now that our application servers are behind a reverse proxy, they will see all
HTTP requests arriving from the reverse proxy internal IP address. For
analytics purposes we would like to log the original external client IP. For
this we enable an additional Apache module on each application server:
a2enmod remoteip
We must also update “%h” to the “%a” field in the “combined” log format in
the “/etc/apache2/apache2.conf” file:
LogFormat "%a %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
Load Balancing by Geolocation
We are going to assume that when a new user first contacts our online
service, they are doing so from within their home jurisdiction.
When a request from an as-yet unidentified user reaches any of our servers,
that server should determine which <hostname>.quarium.com is
geographically nearest to the IP address from which the request originates. If
the server determines that another server is better situated to serve the
request, it should redirect the user to that server. When such a user then either
identifies themselves, or registers a new account, they will do so on the server
(cluster) that can best serve them.
If the user then identifies themselves as an account that is not stored on the
current server, the server should instigate a search for the account on all other
servers it can reach. If the account is found on another server, the user can be
authenticated and then redirected to the server on which their account resides.
There are obviously many failure modes for this entire global load balancing
mechanism. Users must be carefully educated on what could be wrong when
they are denied access to their account.
We’ll discuss the server application code for this in the next book in this
series.
About the Book
An Online Infrastructure with Ubuntu 20.04 LTS LAMP demonstrates the
detailed development and testing of a server configuration for the Online
Service and the Online Client App described in the companion books of the
series. It can also be used for Symfony, WordPress, MediaWiki, Drupal, Git,
Jira or Icinga deployment. The Online Infrastructure runs on private server
farms or cloud-hosted virtual machines at scales ranging from simple local
sites to large-scale world-wide applications.
He helped build some of the first 6809 UCSD Pascal and 68000 UNIX
microcomputers (thank you Patrick, Henk, Don and Zion).
Colophon
This book was written using MediaWiki, Microsoft Word and Sigil. Titles are
set in Impact. Headings are set in Adobe Myriad Pro. Text is set in Adobe
Minion Pro. Code examples are set in Ubuntu Mono.