Tripleten 5 - Introduction To Table Relationships and Joining Tables
Tripleten 5 - Introduction To Table Relationships and Joining Tables
In this chapter, we’ll take our relationship with relational databases to the next level. We’ll finally go beyond merely
working with data in a single table, and we’ll learn how to join tables together in useful ways.
We’ll take a look at Entity Relationship diagrams (ERs), which are a useful tool to get an idea of the
structure of a database and the relationships between tables.
In addition to primary keys, we’ll expand our repertoire with knowledge of foreign keys (fields that reference
the primary key of another table), which can be used to relate data with other data.
We’ll cover the three types of data relationships: one-to-one, one-to-many, and many-to-many.
We’ll learn to rename tables and fields with the AS command, a useful tool for readability and
maintainability, which will come in handy when our queries become quite large while working with multiple
tables.
To give you an idea of the join options available, you’ll learn to use the operators INNER JOIN, LEFT OUTER
JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN and UNION ALL. As you can probably imagine, these tools give
us powerful options that allow us to export data in specific ways.
These features are incredibly important for understanding and using relational databases and languages like SQL.
So, there will be a ton of tasks — some a bit challenging — to test your confidence with these concepts.
Table Relationships
In previous lessons, we’ve worked with single tables to make data slices and group and sort data. But let’s face it,
databases have more than one table!
You’re already familiar with the invoice table, which contains data about orders. This table also has data about
clients: each user has an ID that is stored in the customer_id field. But this ID doesn’t tell us very much about our
clients, does it?
First and last names are stored in the client table. So, if we export orders from invoice and the user’s name from
client, we’ll have a more informative table.
At the beginning of this course, we talked about how databases are different from simple file systems, mainly
because tables are connected with each other. These connections are created with keys.
You already know about primary keys, a column that contains unique ID values for each record in that column.
There are also foreign keys — these are fields that reference the primary key of another table.
Let’s look at an example. The invoice_id field is the primary key in the invoice table. It’s unique and cannot be
repeated. The customer_id field stores customer IDs, but that doesn’t tell us much about who these customers
actually are. (More on that in just a second).
We can now compare these tables. In the client table, for example, we see the customer with ID 8 is called
Marcus Nozadze. If we cross-reference the invoice table and check the customer_id of 8, we’ll see that he placed
an order on January 3, 2009.
Let’s summarize: the customer_id field appears in two tables (invoice and client), which helps us relate them
and combine them into one as needed.
For invoice, the customer_id field is a foreign key referencing the second table (client). A customer’s ID can be
repeated in the invoice table because one customer could hypothetically place several orders. However, in the
client table, values in the customer_id field can’t be repeated. One customer. One ID.
It might seem inconvenient to store data like this. After all, we have to switch between tables and add new fields.
Wouldn’t it be better to just add the clients’ names directly to the invoice table? Then we’d be able to see who
placed the order right away.
But, believe it or not, this isn’t the best solution. Storing the same data across several tables isn’t particularly
practical because it’s very inconvenient to update. Therefore, it’s better to leave the client names in the client
table. And it’s more flexible: you could always link the table with customer data to some other separate table for a
new feature or purpose.
One-to-one relationship
In this type of table relationship, one record in a table is associated with only one record in another table.
For example, let’s say the staff table stores data about company employees.
We could also create several tables displaying employee data from different departments (e.g., one table for
developers, one table for support_specialists, and so on).
The developers table will be linked to the staff table with the one-to-one relationship: one record about an
employee in the development department corresponds with one record about the same employee in staff.
And the same would be true for the other department tables too.
One-to-many relationship
This type of table relationship is the most popular. Here, one record in a table is associated with several records in
another table. Our beloved invoice and client tables are linked exactly like this. The client table can only
contain a list of unique customers with unique IDs, but in the invoice table, one customer might appear many
times, depending on the number of orders they’ve placed.
Many-to-many relationship
This type of relationship means that one record in a table is associated with several records in another table, and
vice versa. It might seem very similar to the previous relationship (one-to-many) but there is a difference.
Let’s say one table contains school subjects and another contains teachers. Consider this: one teacher might teach
several subjects and one subject might also be taught by several teachers.
This problem can be solved with a connecting table that links teacher ID with subject ID. Here, each pair (Subject
ID + Teacher ID) is unique.
Subject ID Teacher ID
1 1
1 13
2 4
3 4
The teacher with the ID 4 teaches two subjects, so the table will have a separate record for each of them.
Note that the teacher table and the associative table have a one-to-many relationship. One teacher is associated
with several records in the associative table. The subject table and the associative table will have the same
relationship.
You can see the table name at the top, followed by field names in the right column and information denoting
whether each field is a primary key or foreign key in the left column.
The rest of the fields are also listed in the same order as they’re listed in the table itself.
The relationship itself is represented by a line that links the primary key of one table to the foreign key of another
table. Here, the invoice and client tables are linked by the customer_id field. It’s the primary key in the client
table and the foreign key in the invoice table.
It’s probably worth keeping in mind that the linking column doesn’t have to have the same name in both tables.
You’ll likely encounter this sooner or later.
Different table relationships are represented with different symbols. In the ER diagrams you’ll be working with,
they’re represented like this:
ER diagrams and other forms of organization documentation are usually provided by the database administrator or
the data engineer. Think of how messy things could get if you didn’t have a reference like this!
There are no errors in this query but it can still be improved. For instance:
NOTE: Changing a name to an alias is temporary. In the initial database, the name will stay the same.
To assign an alias, we use the AS clause in our SELECT statement after the field we want to use an alias with. In this
case, we’ll have an alias of year_of_purchase:
But in PostgreSQL, you can assign aliases simply by putting the new name after a space. Note the position of
year_of_purchase.
So, now we can put the aliases after GROUP BY and ORDER BY, instead of using the previous bulky structure with
EXTRACT.
...
GROUP BY year_of_purchase
ORDER BY year_of_purchase DESC;
Assigning aliases to tables
You can also assign an alias to a table. Just like with fields, there are two ways of doing this. First, the standard
way with the AS keyword:
...
FROM invoice AS i
...
...
FROM invoice i
...
Refactoring our code
Now we can correct the query from the beginning of this lesson and add aliases: year_of_purchase, min_cost,
max_cost, total_revenue, total_purchases, and average_receipt.
Cool! Now the table is easier to understand and the query looks much better. But note that you can’t address an
alias from just any place in a query. For instance, this query will display an error:
You can’t address an alias in WHERE or HAVING because operators in SQL aren’t executed in the order they’re listed
in a query. First, data is filtered out according to a condition, and only then are aliases assigned.
Many DBMSs don’t allow you to address aliases in GROUP BY, but PostgreSQL does. We’ll talk more about the
order of query execution in future lessons.
Theory
Practicing Aliases
We can assign aliases to fields or tables, even though the names will stay the same in the database itself.
To assign an alias to a field, we put SELECT and then the field, followed by the AS clause and the desired
name. In PostgreSQL, we can also simply put the new name after a space.
The syntax for table aliases is essentially the same: we put the table name, followed by AS and the alias. In
PostgreSQL, again, we can just write the new name after a space.
Task:
1. Export several fields from the invoice table in the following order, and use the values in parentheses
as aliases:
A field with the number of purchases (total_purchases)
A field with the total revenue (total_revenue)
A field with the average revenue rounded to two decimal places (average_revenue)
Group data by country of purchase (billing_country). Sort the data in descending order by the
value in the average_revenue field. Limit the output by the first ten records.
1 SELECT billing_country,
2 COUNT(total) AS total_purchases,
3 SUM(total) AS total_revenue,
4 ROUND(AVG(total), 2) AS average_revenue
5 FROM invoice
6 GROUP BY billing_country
7 ORDER BY average_revenue DESC
8 LIMIT 10;
Result
billing_country total_purchases total_revenue average_revenue
Chile 7 46.62 6.66
Hungary 7 45.62 6.52
Ireland 7 45.62 6.52
Czech Republic 14 90.24 6.45
Austria 7 42.62 6.09
2. Get movies that have the word Epic in their description from the movie table. Export a table containing
three fields using the following as aliases:
rating_of_epic for the movie’s rating
year_of_epic for the year the movie was released
average_rental for the average rental period
Group the data by rating and release year (put the rating field first).
SELECT rating AS rating_of_epic,
release_year AS year_of_epic,
AVG(rental_duration) AS average_rental
FROM movie
WHERE description LIKE '%Epic%'
GROUP BY rating_of_epic, year_of_epic;
Result
rating_of_epic year_of_epic average_rental
PG-13 2006 7
G 2012 5
PG-13 2007 3.5
PG 2016 7
PG 2011 6
The INNER JOIN operator combines two tables using an “inner” field that’s common to both tables.
Let’s use the tables with the client’s last name and number of purchases as an example. They can be joined
through the customer_ID field.
If we use INNER JOIN, the resulting table will be comprised of matching values in the customer_ID field in both
tables. The graphic below shows this in action:
It just so happens that the number 667 comes up in both tables as a value of a field called
customer_ID. So, by using this value as the common factor, the last name associated with the ID 667
from the first table is combined with the number of purchases associated with this ID from the second
table.
The same will happen with other clients whose customer_ID values come up in both tables. Items that don’t have
an ID match in both tables won’t be included in the resulting table.
The result of LEFT OUTER JOIN is a table with all records of the initial “left” table. Records from the right table are
only kept if the field value matches the corresponding value in the left table.
The table on the left contains information about clients’ last names, so all this data is included in the final table.
Some of the last names don’t have ID matches in the right table — that’s why there’s NULL in the Purchase count
field for these items. There is no data about clients with IDs 111, 221, and 456 in the resulting table because these
values are not present in the left table.
RIGHT OUTER JOIN is similar to the previous operator, but now priority is given to the right table. This operator
includes all records from the right table into the final table. Records from the left table are only kept if the field value
matches the corresponding value in the right table. The NULL value is used for last names that correspond to IDs
not present in the left table.
FULL OUTER JOIN
The FULL OUTER JOIN operator combines all the data from the left and the right table. If there are no matches, it will
display NULL in place of a value.
In the next lesson, we’ll talk a bit more about joining tables. Then, we’ll finally start practicing.
All database tables are connected, and the links between them are displayed in ER diagrams.
Tables are connected with the help of foreign keys, which can also be used to help join tables.
Aliases allow us to link a field to a specific table (this is useful during joining).
Tables can be joined in different ways, and the results of the join depend on the chosen method.
The following operators are used to join tables: INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL
OUTER JOIN.
If a customer didn’t specify their last name when placing an order, they won’t be included in the table. The opposite
is also true: if a user registered but didn’t place any orders, they should also be excluded from the final table.
💡 A quick reminder: The invoice table contains information about orders, while the client table stores data about
customers (their last and first names and addresses). The client table is connected to the invoice table through
the customer_id field. It serves as a primary key in client and as a foreign key in invoice.
Now we can start working on our task: to get a list with the last names of the customers with the biggest orders. To
do this, we’ll need information from both tables: invoice and client.
Before we join the tables, we need to decide which joining type to use. We don’t know whether or not each record
in client has a match in invoice. From the previous lesson, you know that INNER JOIN will only put data found in
both tables into the final table. So, if we want to exclude the last names of customers without orders and customer
IDs without last names, INNER JOIN is just what we need.
In PostgreSQL, we can use INNER JOIN but we can also use the short version: JOIN. The result will be the same in
both cases. In other words, INNER JOIN is the default keyword for JOIN.
The resulting table should include the first_name and last_name fields from the client table, as well as the total
field from the invoice table. Without the order price field, we won’t know which customers placed the largest
orders.
We can join invoice and client through their common customer_id field:
SELECT c.first_name,
c.last_name,
i.total
FROM invoice AS i
INNER JOIN client AS c ON i.customer_id = c.customer_id
LIMIT 10;
You don’t have to rename the fields that come after SELECT, because they already have understandable names. But
you do need to add the table’s alias before each field. Otherwise, it would be hard to understand where the
necessary field is located.
In the code example above, one table was given the alias i (it is specified after FROM). Note that if you’re using
INNER JOIN, the order you put the tables in doesn’t matter. The result of the join will be the same in any case.
You can put the second table after INNER JOIN, adding a condition for joining after the keyword ON. This condition
will define how the two tables are compared. In our query, the foreign key from invoice is compared with the
primary key from client.
Everyone’s joining in
You can join more than two tables, if necessary. Just make sure to put the operator, tables, and the
necessary fields in your query:
...
FROM table_1
INNER JOIN table_2 ON table_1.field = table_2.field -- first joining
INNER JOIN table_3 ON table_1.field = table_3.field -- second joining
...
There’s last name, first name, and order price — perfect! The hardest part is done, now we just need to add
grouping and sorting.
As you might remember, we need to make a list of the customers with the biggest orders. That means we
should find the highest cost order for each customer and use that to sort our results.
To make the table more informative, we’ll also add the lowest and average order price for each customer, as
well as the total number of orders.
SELECT c.first_name,
c.last_name,
MIN(i.total) AS min_cost,
MAX(i.total) AS max_cost,
ROUND(AVG(i.total), 2) AS average_cost,
COUNT(i.total) AS total_purchases
FROM invoice AS i
INNER JOIN client AS c ON i.customer_id = c.customer_id
WHERE i.billing_country = 'USA'
GROUP BY first_name, last_name
ORDER BY max_cost DESC
LIMIT 10;
first_name last_name min_cost max_cost average_cost total_purchases
Note that the data was grouped by two fields at the same time: first_name and last_name. This was
done because several customers might have the same last name.
With the condition WHERE i.billing_country = 'USA', we filtered out customers from the USA.
Then, we sorted the table by average order price in descending order to get the list we needed. Done
and done!
Theory
Here’s a syntax example: we can join invoice and client through their common customer_id field.
SELECT c.first_name,
c.last_name,
i.total
FROM invoice AS i
INNER JOIN client AS c ON i.customer_id = c.customer_id
LIMIT 10;
Task
1. Combine data from two tables: track and invoice_line. The track table stores information about
music tracks in the store, with track names in the name field. The invoice_line table stores data about
tracks that were purchased, with track price in the unit_price field. Both tables have the track_id field
containing track IDs.
Result
name unit_price
Funky Piano 0.99
Welcome to the Jungle 0.99
Compadre 0.99
Balls to the Wall 0.99
The Number Of The Beast 0.99
2. Export a table with the track name and price. Select all unique records. If a certain track wasn’t bought or a
track was bought but doesn’t have a name, don’t include it in the table. Keep only the first twenty records in
the final table.
Let’s expand the query and add a field with playlist ID (playlist_id). You can get it from the
playlist_track table that stores the IDs of playlists and tracks (the track_id field). The condition stays
the same: if a track’s ID doesn’t come up in all three tables, that track shouldn’t be included in the resulting
table. Print the first twenty records.
SELECT t.name,
i.unit_price,
pt.playlist_id
FROM track AS t
INNER JOIN invoice_line AS i ON t.track_id=i.track_id
INNER JOIN playlist_track AS pt ON t.track_id=pt.track_id
LIMIT 20;
Result
name unit_price playlist_id
For Those About To Rock (We Salute You) 0.99 1
For Those About To Rock (We Salute You) 0.99 8
For Those About To Rock (We Salute You) 0.99 17
Balls to the Wall 0.99 1
Balls to the Wall 0.99 8
3. Playlist IDs are now in the resulting table but we don’t know what playlists they stand for. This information
can be found in the fourth table — playlist. It contains the playlist_id field, as well as the name field
with playlist names. Add name to the resulting table. The condition is the same: data without matches in all
tables shouldn’t be included. Limit the output to the first twenty records. Be sure to alias the p.name column
as something like playlist_name, otherwise there would be two name columns in the result table.
SELECT t.name,
i.unit_price,
pt.playlist_id,
p.name
FROM track AS t
INNER JOIN invoice_line AS i ON t.track_id=i.track_id
INNER JOIN playlist_track AS pt ON t.track_id=pt.track_id
INNER JOIN playlist AS p ON pt.playlist_id = p.playlist_id
LIMIT 20;
Result
name unit_price playlist_id name
For Those About To Rock
0.99 1 Music
(We Salute You)
For Those About To Rock
0.99 8 Music
(We Salute You)
For Those About To Rock
0.99 17 Heavy Metal Classic
(We Salute You)
Balls to the Wall 0.99 1 Music
Balls to the Wall 0.99 8 Music
The LEFT OUTER JOIN and RIGHT OUTER
JOIN Operators
As you might remember from earlier in this chapter, the LEFT OUTER JOIN operator includes all records from the left
table in the resulting table. Records from the right table are only kept if values in the field that’s used to join the
tables match values in the left table.
RIGHT OUTER JOIN works in a similar way, but with the right table having higher priority.
Let’s move on to practice: we’ll combine data from the artist and album tables and print all artists that have
information about their albums.
The artist table stores information about the artists in the artist_id and name fields. The album table stores data
about albums in the album_id, title, and artist_id fields.
(These two tables are linked by the artist_id key, their relationship is one-to-many because one artist can have
many albums).
What’s our best course of “joining action”? Since we need to print data about all the artists, we can’t use INNER
JOIN or else artists without any albums won’t be included in the final table. So, we’ll use LEFT OUTER JOIN, and all
artists will be in the output. (By the way, you can also use the shortened name of LEFT OUTER JOIN — LEFT JOIN.)
SELECT art.name,
alb.title
FROM artist AS art
LEFT OUTER JOIN album AS alb ON art.artist_id = alb.artist_id
LIMIT 10;
Now artist and album are joined through artist_id , and thanks to LEFT OUTER JOIN all the name values from the
artist table are included in the resulting table. If an artist doesn’t have an album, we’ll see NULL in the title field.
name title
We could also get the same table with a different query with RIGHT OUTER JOIN:
SELECT art.name,
alb.title
FROM album AS alb
RIGHT OUTER JOIN artist AS art ON art.artist_id = alb.artist_id
LIMIT 10;
name title
The result is the same because, in most cases, LEFT OUTER JOIN and RIGHT OUTER JOIN are interchangeable. You
just need to switch the places of the tables in the query.
Let’s update our table. We’ll count how many albums there are per artist and rename the field with the number of
albums.
SELECT art.name,
COUNT(alb.title) AS total_album
FROM artist AS art
LEFT OUTER JOIN album AS alb ON art.artist_id = alb.artist_id
GROUP BY art.name
ORDER BY total_album DESC
LIMIT 10;
name total_album
Iron Maiden 21
Led Zeppelin 14
Deep Purple 11
Metallica 10
U2 10
Ozzy Osbourne 6
Pearl Jam 5
Faith No More 4
Foo Fighters 4
Various Artists 4
These are the top ten artists by number of albums released. It’s not too surprising that Iron Maiden might be at the
top of this table. After all, they’ve been playing since the ‘80s!
Bonus: Reflections from a seasoned student
Theory
The “left” table is the first table, which is artist in this case. Here’s an example of LEFT OUTER JOIN. The
resulting table will join the artist and album tables via their shared artist_id field.
SELECT artist.name,
album.title
FROM artist
LEFT OUTER JOIN album ON artist.artist_id = album.artist_id
LIMIT 5;
To give an idea of how similar these joins are, we could rewrite the query and replace LEFT OUTER JOIN with
RIGHT OUTER JOIN and give the “right” table, album priority, but in that case we’d also have to swap the order of
the artist and album tables as well.
Let’s practice!
Task
1. Print the titles of all tracks and add the dates when they were purchased. Each track should be
included, even if no one ever bought it. You’ll need to combine three tables because invoice has the
purchase date but no information about the track that was purchased.
First, join track and invoice_line through the track_id key and then add invoice using invoice_id as a
key. Leave two fields in the resulting table: name from the track table and invoice_date from the
invoice table. Make sure that the dates are in the necessary format.
SELECT name,
CAST(i.invoice_date AS date)
FROM track AS t
LEFT JOIN invoice_line AS il ON t.track_id = il.track_id
LEFT JOIN invoice AS i ON il.invoice_id = i.invoice_id;
Result
name invoice_date
Balls to the Wall 2009-01-01
Restless and Wild 2009-01-01
Put The Finger On You 2009-01-02
Inject The Venom 2009-01-02
Evil Walks 2009-01-02
2. Count the number of unique purchased track titles for each year.
Result
year_of_invoice count
2009 442
2010 446
2011 440
2012 439
2013 433
3. Export a table consisting of two fields: one with the employee’s last name and one with the number of
users whose requests were processed by that employee. Name the fields employee_last_name and
all_customers, respectively. Group the records by employee ID. Sort the number of users in
descending order and the last names in lexicographical order. You will need to use two tables to
complete this task.
Result
employee_last_name all_customers
Peacock 21
Park 20
Johnson 18
Adams 0
Callahan 0
4. Print the names of artists with no album in the database.
SELECT ar.name
FROM artist AS ar LEFT JOIN album AS al ON ar.artist_id=al.artist_id
GROUP BY ar.name
HAVING COUNT(al.title) = 0;
Result
name
Ben Harper
Dhani Harrison & Jakob Dylan
Baby Consuelo
Santana Feat. Lauryn Hill & Cee-Lo
Vinícius E Odette Lara
Let’s say the database of an online store contains two tables: actor and client. The actor table includes
data about all actors featured in movies that can be rented from this particular online store. The client
table stores information about customers who buy movies and music from the store. While you may see
that they are related by theme, seemingly, the data within these two tables have very little in common —
but even they can be joined.
Let’s say that each customer of this store receives a “happy birthday” message on their big day. In
addition, if they have the same last name as a famous actor, they’ll be sent a special message. We’ll need
to join the two tables to create a list of customers with these names.
But… let’s also imagine that the resulting table should include customers and their famous namesakes, as
well as actors who don’t have matches among the customers (we might need actors’ birthdates for
newsletters or something).
There’s a problem, though: there is no direct relationship between the actor and the client tables. Both
contain last name fields but these aren’t related to the database structure. There are no foreign keys to
connect them.
No big deal!
PostgreSQL allows us to join the tables not just through keys but also through fields that have identical
names and similar contents.
All first and last names from both tables should be included in the database, so we’ll use FULL OUTER JOIN.
It’s used for printing all records from the left and right tables.
The table should contain the actor_id, first_name, and last_name fields for the actors and the first_name
and last_name fields for the customers.
SELECT a.actor_id,
a.first_name,
a.last_name,
c.first_name,
c.last_name
FROM actor AS a
FULL OUTER JOIN client AS c ON a.last_name = c.last_name
LIMIT 10;
actor_id first_name last_name first_nаme last_nаme
OK, we have found two matches in the first ten rows! For all actors without a matching namesake, the
values in the columns with the customer’s first and last name are replaced with NULL.
A little warning
We’ll mention up front that it’s rare to use OUTER JOIN in the first place. That said, it’s not a good idea to
make a habit of using FULL OUTER JOIN for every occasion. Instead, it’s better to use this operator with
smaller tables. Larger tables joined using FULL OUTER JOIN can become too big to reasonably manage.
Operator Precedence
The order in which we write our queries
You’ve already written a lot of queries. You know which operator to put first and which to put last. But
let’s go over the main rules one more time:
A query usually starts with SELECT, followed by FROM, and then operators like WHERE and GROUP BY.
Some operators (e.g., ORDER BY and LIMIT) are put at the end of the query. If a query has
both, LIMIT should go last.
HAVING is always put after GROUP BY, never before it.
1. First of all, you need to know where to get data, so the FROM operator is performed first. At the same
stage, tables are joined with the JOIN operators and table aliases are assigned. Don’t forget that joining
precedes filtering and grouping. (This means that joining large tables will take up a big bulk of your time.
You’ll find out how to solve this problem in the next chapter.)
2. The next step is choosing the specific data, meaning it’s time for the WHERE clause. Now you’ll only get
data that meets your specified conditions.
3. After slicing comes grouping with the GROUP BY operator, and calculations with aggregate functions. Note
that as WHERE precedes GROUP BY, you won’t get a slice for each group. This is because, at the moment of
slicing, grouping hasn’t been performed yet.
4. The next step is the HAVING operator, which filters the grouped data.
5. This is the stage when data selection with the SELECT operator happens and causes fields in the resulting
table to have their aliases assigned. That’s why aliases can’t be used after WHERE and HAVING — they just
weren’t assigned yet. (Some DBMSs don’t let you use aliases even after GROUP BY, but PostgreSQL has an
extension that solves this problem.)
6. After SELECT, the keyword DISTINCT is activated, which filters out unique values.
7. With the necessary data selected, it’s time for sorting with the ORDER BY operator (this always comes
second to last in terms of precedence).
8. LIMIT will be the closing operator.
Tricky! Don’t feel too bad if you don’t remember them all by heart. This knowledge will come with
practice. For now, though, let’s have a quick test of what we’ve talked about here.
Sometimes we want to combine rows from different tables that have the same fields (thus, joining them “vertically”).
There are two operators for this: UNION and UNION ALL.
The table above gives us a quick visual comparison of what we’ve learned so far with JOIN and what we’re talking
about now with UNION.
We’ll follow these rules when using UNION and UNION ALL:
Export fields from two tables in the same order. Make sure the number of fields matches too.
Check that the data types within the fields match. For example, you can’t join an integer field and
a varchar field, but if the data types are integer and real, joining is possible.
An example of a blissful union
Let’s look at an example. The query below will export the number of orders in 2009 placed in the USA,
Germany, and Brazil.
SELECT i.billing_country,
COUNT(i.total) AS total_purchases
FROM invoice AS i
WHERE i.billing_country IN ('USA',
'Germany',
'Brazil')
AND EXTRACT(YEAR FROM cast(invoice_date AS date)) = 2009
GROUP BY i.billing_country;
billing_country total_purchases
Brazil 7
Germany 9
USA 17
And this one will export the number of orders made in 2013:
SELECT i.billing_country,
COUNT(i.total) AS total_purchases
FROM invoice AS i
WHERE i.billing_country IN ('USA',
'Germany',
'Brazil')
AND EXTRACT(YEAR FROM cast(invoice_date AS date)) = 2013
GROUP BY i.billing_country;
billing_country total_purchases
Brazil 7
Germany 2
USA 16
These two tables are similar: they have the same amount of fields, and the data types in the fields match.
This means they can be joined with the UNION operator.
Just put UNION after the first query and add the second query.
SELECT i.billing_country,
COUNT(i.total) AS total_purchases
FROM invoice AS i
WHERE i.billing_country IN ('USA',
'Germany',
'Brazil')
AND EXTRACT(YEAR FROM cast(invoice_date AS date)) = 2009
GROUP BY i.billing_country
UNION
SELECT i.billing_country,
COUNT(i.total) AS total_purchases
FROM invoice AS i
WHERE i.billing_country IN ('USA',
'Germany',
'Brazil')
AND EXTRACT(YEAR FROM cast(invoice_date AS date)) = 2013
GROUP BY i.billing_country;
billing_country total_purchases
Brazil 7
Germany 2
Germany 9
USA 16
USA 17
Notice that the final table includes two records for Germany and the USA but only one record for Brazil. That’s
because the number of orders placed in Brazil in 2009 and 2013 are identical (seven orders). In that case, the
record from the first table is a complete duplicate of the record from the second table, and UNION doesn’t
include absolute duplicates in the final table.
The UNION ALL operator works differently: it includes all records, even absolute duplicates.
SELECT i.billing_country,
COUNT(i.total) AS total_purchases
FROM invoice AS i
WHERE i.billing_country IN ('USA',
'Germany',
'Brazil')
AND EXTRACT(YEAR FROM cast(invoice_date AS date)) = 2009
GROUP BY i.billing_country
UNION ALL
SELECT i.billing_country,
COUNT(i.total) AS total_purchases
FROM invoice AS i
WHERE i.billing_country IN ('USA',
'Germany',
'Brazil')
AND EXTRACT(YEAR FROM cast(invoice_date AS date)) = 2013
GROUP BY i.billing_country;
billing_country total_purchases
Brazil 7
Germany 9
USA 17
Brazil 7
Germany 2
USA 16
With UNION and UNION ALL, you can combine as many tables as you need. Just add one of the operators to the
query and then write another query below.
QUERY_1
UNION ALL
QUERY_2
UNION ALL
QUERY_3
UNION ALL
.......;
Final Quiz
Primary keys are columns that contain unique ID values for each record in that column. They must be
unique.
Foreign keys are fields that reference the primary key of another table. They are responsible for linking
tables. A table can contain multiple foreign keys.
In a one-to-one relationship, one record in a table is associated with only one record in another table.
In a one-to-many relationship, one record in a table is associated with several records in another table.
In a many-to-many relationship, one record in a table is associated with several records in another table,
and vice versa.
Aliases can be used to rename tables and fields, and improve the readability and maintainability of a query.
For example, to assign an alias for fields, we use the AS clause in our SELECT statement after the field with
which we want to use an alias.
The INNER JOIN operator combines two tables using an “inner” field that’s common to both tables.
The result of LEFT OUTER JOIN is a table with all records of the initial “left” table. Records from the right
table are only kept if the field value matches the corresponding value in the left table.
RIGHT OUTER JOIN is similar to LEFT OUTER JOIN, but priority is given to the “right” table.
The FULL OUTER JOIN operator combines all the data from the left and the right table.
For FULL OUTER joins and LEFT and RIGHT joins, if there are no matches, it will display NULL in place of a
value.
We can use shortened versions of many of the join options. For example, instead of INNER JOIN, we can
use JOIN and get the equivalent result.
UNION is used to combine the output of SQL queries. Essentially, one table is “glued” to the other. It does
not include duplicate data.
The UNION ALL operator works differently: it includes all records, even absolute duplicates.
We can join tables through fields that aren’t foreign keys if the data in the fields match.
Now let’s check what we’ve learned. This quiz contains 11 questions and should take around 20-25 minutes to
complete. Good luck!
By the way, we’re always working on improving your learning experience. Before moving on, let’s see
what you thought of this chapter.