PostgreSQL For Data Architects - Sample Chapter
PostgreSQL For Data Architects - Sample Chapter
$ 49.99 US
32.99 UK
P U B L I S H I N G
Jayadevan Maymala
PostgreSQL for
Data Architects
ee
Sa
m
pl
C o m m u n i t y
E x p e r i e n c e
D i s t i l l e d
PostgreSQL for
Data Architects
Discover how to design, develop, and maintain your database
application effectively with PostgreSQL
Jayadevan Maymala
Chapter 6, Client Tools, covers two clients tools (pgAdmin: a UI tool and psql:
a command-line tool). Browsing database objects, generating queries, and generating
the execution plan for queries using pgAdmin are covered. Setting up the environment
variables for connecting from psql, viewing history of SQL commands executed, and
meta-commands are also covered in this chapter.
Chapter 7, SQL Tuning, explains query optimization techniques. To set the context,
some patterns about database use and theory on how the PostgreSQL optimizer works
are covered.
Chapter 8, Server Tuning, covers PostgreSQL server settings that have significant
impact on query performance. These include memory settings, cost settings, and so
on. Two object types: partitions and materialized views are also covered in this chapter.
Chapter 9, Tools to Move Data in and out of PostgreSQL, covers common tools/utilities,
such as pg_dump, pg_bulkload, and copy used to move data in and out of PostgreSQL.
Chapter 10, Scaling, Replication, and Backup and Recovery, covers methods that are
usually used for achievability. A step-by-step method to achieve horizontal scalability
using PostgreSQL's streaming replication and pgpool-II is also presented. Point-in-time
recovery for PostgreSQL is also covered in this chapter.
Chapter 11, PostgreSQL Troubleshooting, explains a few of the most common
problems developers run into when they start off with PostgreSQL and how to
troubleshoot them. Connection issues, privilege issues, and parameter setting
issues are also covered.
Chapter 12, PostgreSQL Extras, covers quite a few topics. Some interesting data
types that every data architect should be aware of, a couple of really useful extensions,
and a tool to analyze PostgreSQL log files are covered. It also covers a few interesting
features available in PostgreSQL 9.4.
Client Tools
In the previous chapter, we looked at a tool that is used to design databases.
Now, let's cover a few tools that are used with PostgreSQL to manipulate data,
create, drop, and alter objects, find out what is happening on the server, and so on.
In this chapter, we will cover one GUI and one command-line tool that are used to
work with PostgreSQL. We will see how database connections are made, how SQL
statements are executed, and how database objects and related metadata can be
viewed. We will also look at a couple of advanced use cases (such as generating
the plan for queries and changing configuration parameters).
Client Tools
When you use the commands provided in the links to install pgAdmin,
pay attention and ensure that you install only the pgAdmin software. If
you just copy/paste the commands, you might end up overwriting your
PostgreSQL installation.
Adding a server
The first thing we want to do once pgAdmin has been installed and started is set up
a connection. For this, we use the Add Server option.
Click on File and choose Add Server. We get a dialog box with various options, as
shown in the following screenshot:
[ 106 ]
Chapter 6
Name: This can be anything that we want to use for reference. It could be
Host: This is the IP address of the machine where our cluster is running.
It can also be the fully qualified hostname.
Port: This is the port where the cluster listens for requests.
Maintenance DB: This is used to specify the initial database that pgAdmin
connects to. It's the database where pgAdmin will check for optional but
useful modules and extensions (such as pgAgent and adminpack).
Username: This is the user that you will be connecting as. We could
choose to save the password. If we do this, the password will be stored in
a password file named .pgpass in the home directory if we are working in
a Unix/Linux environment. In Windows, the password file will be under
%APPDATA%\postgresql\, and the file will be named pgpass.conf. We can
use the option under the File menu to edit the password file.
This covers the basic options. The other tabs have the not-so-basic settings. For
example, we can use the SSH Tunnel option if the server does not allow direct
connections from the client machine. This is useful when we are working for a client.
Only a few machines in the client's network are allowed access from an external
network. The machines on which databases are hosted will definitely not be among
the machines exposed to external networks.
If you are working with the database cluster on a different machine, check whether
you have modified the postgresql.conf (listen_addresses entry) and pg_
hba.conf files on the server so that you can connect from a remote machine. The
documentation links are http://www.postgresql.org/docs/current/static/
runtime-config-connection.html and http://www.postgresql.org/docs/
current/static/auth-pg-hba-conf.html.
[ 107 ]
Client Tools
In the left-hand side pane, we have the Object Browser pane that shows all objects in
a hierarchical manner. At the top of the hierarchy, we have Server Groups and then
Servers. This is where choosing a good name for the server helps. At the next level
and the levels below is the rest of the hierarchy is visible.
The top right-hand side pane provides quite a lot of very useful information. The
first tab provides information such as name of the object, object identifier, estimated
rows, and comment. Even more important, in many cases, is the next tab: Statistics.
This provides information regarding reads, sequential scans, index scans, and other
information that will be useful when we are trying to optimize database queries or
having another look at the table design. The data shown in the Statistics tab as well
as the other tabs in the right-hand side pane changes depending on the type of object
we select in the object browser.
[ 108 ]
Chapter 6
The bottom right-hand side pane has the SQL for the object selected. Right-clicking
on a table in the Object Browser pane gives you a few options. One of the options
is the data viewer. In the data viewer menu, we can edit existing data, add new
records, and delete records. The data viewer menu also lets us apply filters and
select specific records.
In the case of a table, the right-click options lets us take a backup of the table, import
data, restore from a backup, and so on, as shown in the following screenshot:
[ 109 ]
Client Tools
The Tools menu provides many useful options as well. The important ones are
listed next. The Server Configuration option lets us see the current settings, edit
them, and reload the parameter values, as shown in the following screenshot. Some
of the settings need a server restart. So, the effect might not be visible immediately.
The user must have appropriate permissions to use this feature.
[ 110 ]
Chapter 6
The Server Status option lets us see what is happening at the server. It lists the
current activity, transactions, locks, and the information getting logged in to the
log file. If you are not comfortable with the command-line tool and the PostgreSQL
views, which store database activity data, this option can be used to retrieve pretty
much the same information that you could get from these views. In addition, we can
also retrieve past log files, as shown in the following screenshot:
[ 111 ]
Client Tools
In the Query tool, writing and executing a SQL statement is a straightforward thing
to do. It also has a graphical query builder, which is intuitive for simple queries. As
shown in the following screenshot, we can choose the tables and columns we need,
join tables by choosing the columns from one table and then drag it to the column to
be joined within the other table:
[ 112 ]
Chapter 6
Another useful feature is the Explain pane. We can select a query and use Shift + F7
to generate a graphical query execution plan. The plan for the preceding query is
displayed as follows:
pgAdmin does provide us with many more features. We can click on File and then
select Options to set which object types should be displayed in the Object Browser
pane. The right-click options in Database or Tables lets us carry out maintenance
activities (such as vacuuming and analyzing). Right-clicking on Tables provides a
Script menu that can be used to generate scripts for the CREATE, SELECT, INSERT,
UPDATE, and DELETE statements. We can also generate reports from the database.
It's necessary to install the adminpack extension for some of the features
to work. This can be done using the CREATE EXTENSION command.
[ 113 ]
Client Tools
This covers the features one will use most frequently in pgAdmin. Next, we will
cover psqlthe command-line utility to work with PostgreSQL clusters.
Each of these options can be provided in a one letter form or in a detailed manner.
For example, we could say which database to connect to use -d mydatabase or
--dbname=mydatabase, as shown here:
psql --host=localhost --dbname=test
psql (9.3.0)
Type "help" for help.
test=# \q
psql
-h localhost -d test
psql (9.3.0)
Type "help" for help.
test=# \q
[ 114 ]
Chapter 6
It makes sense to use the long option initially to get familiar with the possible
options. In the case of options (such as port) --port is pretty clear, but if we type P,
the error appears because P is to specify printing options, not port. We use p to
specify port:
psql -h localhost -d test -U postgres -P 5432
\pset: unknown option: 5432
psql: could not set printing parameter "5432"
While we can definitely execute queries once we are at the psql prompt, it's the rich
set of meta-commands that deserve more attention. To say that we can execute SQL
is to state the obvious.
The power of \d
Any command that starts with a backslash is referred to as a meta-command.
Among the numerous meta-commands, it's likely that you will find \d to be most
useful. \d has a number of options. In its simplest form, without any options, it
lists all user-created objects in the database, as shown here (first, create a database
myobjects and a few objects of different types if you want to follow along, or you
could try it on the database you have created):
myobjects=# \d
List of relations
Schema |
Name
|
Type
| Owner
--------+------------+----------+---------public | mysequence | sequence | postgres
public | mytable
| table
| postgres
public | myview
| view
| postgres
(3 rows)
If you want to just see the tales, use \dt (t for table):
myobjects=# \dt
List of relations
Schema | Name
| Type | Owner
--------+---------+-------+---------public | mytable | table | postgres
(1 row)
myobjects=# \ds
List of relations
Schema |
Name
|
Type
| Owner
--------+------------+----------+---------public | mysequence | sequence | postgres
(1 row)
[ 115 ]
Client Tools
myobjects=# \dv
List of relations
Schema | Name | Type | Owner
--------+--------+------+---------public | myview | view | postgres
(1 row)
myobjects=# \df
List of functions
Schema | Name | Result data type | Argument data types | Type
--------+--------+------------------+---------------------+-------public | myfunc | integer
| integer, integer
| normal
(1 row)
We can use s for sequence, v for view, f for function, and so on. For schemas, we
have to use n, which denotes namespace. A + sign at the end will give us some more
information (such as comments for tables and additional information for functions),
as shown in the following screenshot:
The + sign is very useful when we want to see a view or function definition. \d
without any letter after it is interpreted as \dtvsE. In this, E stands for foreign
tables. We have already covered the rest.
By the way, we can also use patterns, as shown here:
[ 116 ]
Chapter 6
We can use psql with the E option. This will display all the SQL statements used
by PostgreSQL internally when we execute a \d command or other backslash
commands. In the following example, the setting is turned ON from psql:
After the setting is turned ON, we can see the query executed:
psql rocks!
More meta-commands
The \h command provides help. So does \?. The first one provides help for SQL
commands, whereas the second one provides help for psql commands (for example,
meta-commands).
In previous chapters, we saw that many host commands can be executed from the
psql prompt in the following manner. The example also shows how output can be
redirected to a file. Note that in the first command where we redirect output, there
is no ! after \:
test=# \o out.txt
test=# show data_directory;
test=# \! cat out.txt
data_directory
---------------/pgdata/9.3
(1 row)
test=# \! ls
out.txt
test=# \o
test=# \! rm out.txt
test=# \! ls
[ 117 ]
Client Tools
The file is gone. Now, we will see how we can execute SQL statements in a file from
psql. We use the \i option for this:
test=# \! echo "SELECT 1; " > a.sql
test=# \i a.sql
?column?
---------1
(1 row)
test=# \! cat a.sql
SELECT 1;
Another important command is to change the setting for timing. How much time
the query is taking is something we want to know when we are trying to optimize
a query. To enable this, we can use the following code:
postgres=# \timing
Timing is off.
postgres=# \timing
Timing is on.
postgres=# select now();
now
------------------------------2014-02-19 03:06:53.785452+00
(1 row)
Time: 0.202 ms
'
[ 118 ]
Chapter 6
The prompt has changed to include the user (%n), the symbol (@), and database name
(%/). The file was renamed to .psqlrcb just to see what the prompt looked like if
there was no .psqlrc file. We can try renaming the .psqlrcb file to .psqlrc so that
it's used by psql. Add an entry in the file, as shown here:
\set HISTFILE ~/history-:DBNAME
If we connect to a couple of databases and exit, for each database we connected to,
there will be a separate history file.
Executing \conninfo at the psql prompt gives us information (such as
user, database connected to, and port).
By now, we have covered most of the commands that can be used to get information
about the environment, switch between databases, read from and write to files,
and so on. For an exhaustive listing of the possibilities, we can get help for
meta-commands at the psql prompt by typing \?. The commands are neatly grouped
together into different sets: general, query buffer, input/output, and so on. Also,
refer to http://www.postgresql.org/docs/current/static/app-psql.html
for more options and explanation.
[ 119 ]
Client Tools
PGHOST: This is the address of the host where the cluster is running
Now, we can create a few environment files and source them as required. There can
be better options, but this is one. Let's create a file named .mylocalenv. Its entries
are as follows:
$ more
export
export
export
export
.mylocalenv
PGHOST=localhost
PGPORT=5432
PGUSER=postgres
PGDATABASE=myobjects
Chapter 6
One item that we left out is the password. It's not a good idea to set this as an
environment variable. For this, we use the .pgpass file in our home directory.
A reference to this file was made in the pgAdmin section.
The .pgpass file contains entries in host:port:database:user:password form;
for example:
more ~/.pgpass
localhost:5432:myobjects:postgres:tcc123
If we try to connect to the default database (myobjects, that is, the database set in
the env file), it does not ask for a password. However, if we try to connect to another
database, the password prompt appears:
$ psql
psql (9.3.0)
Type "help" for help.
myobjects=# \q
postgres@jayadevan-Vostro-2520:~$ psql -d postrges
Password:
The password is stored as plain text in the .pgpass file. This is not
the recommended method to store passwords.
History of commands
The \s command lists all the commands you have executed so far. The data is
stored in ~/.psql_history, that is, .psql_history under the home directory
of the Linux user under which we are connecting to psql.
[ 121 ]
Client Tools
Summary
In this chapter, we covered two PostgreSQL tools: one GUI tool and one
command-line tool. While the GUI tool might be easier to learn than the other
one, it's better to learn the command-line tool because it provides you with a lot
of flexibility. For example, you could create SQL files, call them from shell scripts
and schedule them in cron. For such automated batch jobs, psql usually proves
to be more powerful than a GUI tool. Also, for efficiency, it's better to work from
command line. Typing \d and hitting Enter is faster than moving the mouse around;
click on an icon and wait for the data to be displayed. In some cases, such as the
execution plan for a query, the visual display might be easier to understand than
text. Choosing between a point-and-click option and a command-line option is
definitely a matter of personal preference. You can take your pick.
In the next chapter, we will take a look at the PostgreSQL optimizer and see how
queries can be optimized with this information.
[ 122 ]
www.PacktPub.com
Stay Connected: