SQL Essentials: Mark Mcilroy
SQL Essentials: Mark Mcilroy
Mark McIlroy
www.blueskytechnology.com.au
www.markmcilroy.com
Readers should have access to a query environment that allows viewing data tables and
running SQL queries against a database or data warehouse.
2. Instructions
A test environment is located at
www.markmcilroy.com/test_env/sql_test.php
You can try out the examples from the book in this environment.
Each of the queries listed in the notes is executable in the test environment.
4
Databases are typically used to store information such as customer records, transactions,
accounting records and so on.
Data processing is used extensively in the government and corporate business sectors.
5
3. Joining tables
Processing typically requires that information be retrieved from several different tables.
In the following example, details are extracted from the customer table and the transaction
table.
Transaction Table
0000001
0000001 6/01/2008 PURCHASE PCDE12 1 723.12
0000002 12/01/2008 0000001 PURCHASE PCDE27 1 324.12
0000003 4/01/2008 0000002 PURCHASE PCDE12 3 623.23
0000004 12/01/2008 0000002 CANCELLATION PCDE27 1 123.34
0000005 12/01/2008 0000002 PURCHASE PCDE12 2 423.23
0000006 5/01/2008 0000003 PURCHASE PCDE12 2 153.24
0000007 12/01/2008 0000004 REFUND PCDE43 1 233.22
0000008 21/01/2008 0000004 PURCHASE PCDE43 1 823.11
Customer Table
6
Extracted Data
Note that data from one table has been duplicated in the columns of the result table.
In this example, the surname and first-name of the customer appears beside each transaction
relevant to that customer.
7
4. SQL
SQL, Structured Query Language, is a database query language that provides functions for
sorting, filtering and totaling information that is stored in relational data bases.
SQL is the basis for most query operations against large-scale data storage systems.
8
5. Single-table operations
In this case the ‘*’ represents that all columns should be included in the result table.
select
surname,
first_name,
date_of_birth
from
customer;
Result
9
Amoah Maulesh 1965-01-20
Lucas Raj 1973-01-01
Crenshaw Emmanuel 1987-04-06
Adesina Jessica 1977-05-03
Amoah Selina 1982-01-26
Karmakar Ademola 1971-03-11
The ‘from’ clause specifies the table that is used as the source of the data.
Column definitions
For example
select
surname || ‘ ‘ || firstname as full_name,
from
customer
select
amount * items as value
from
transactions
Expressions can include the concatenation operator || which combines two text values and
mathematical operators.
Note: the || operator is the standard SQL operator for string concatenation.
Due to the variation of SQL used in the test environment, the function ‘concat()’ must be
used instead.
select
concat(first_name , ' ', surname) as full_name,
10
from
customer
Result
Query Results
full_name
Stephen Adjei
Sammy Adams
Linda Larigue
Sabina Patel
Vangie Robinson
Christiana Majola
Olya Ayinoko
David Kellyn
Olusegun Aby
Maulesh Amoah
Christopher Lucas
Emmanuel Crenshaw
Jessica Adesina
Selina Amoah
Ademola Karmakar
11
5.2 Selecting rows
Generally rows are filtered so that only rows matching certain criteria are included in the
result set or the table calculations.
For example,
select
*
from
transactions
where
trans_date > '2008-01-04' and
product_code = 'PCDE12'
Result
Brackets should be used when the ‘where’ expression includes a combination of ‘and’ and
‘or’ expressions.
‘in’ operator
Comparisons can also be done with another table using the ‘in’ operator.
select
*
from
transactions
where
product_code in (select product_code from sample_list)
12
Result
‘like’ operator
The ‘like’ operator allows for a selection of records matching similar patterns.
For example
select
*
from
customer
where
surname like ‘A%’
Result
13
6. Sorting result tables
Rows returned from an SQL query may be returned in a random order.
The ordering of the rows can be specified using an ‘order by’ clause
For example
select
*
from
transactions
order by
customer_id,
trans_date desc;
Rows can be sorted in descending order by adding ‘desc’ after the column name.
Result
transa
customer
ction_i trans_date transaction_code product_code items amount
_id
d
2 2008-01-12 00:00:00 1 PURCHASE PCDE27 1 324.12
1 2008-01-06 00:00:00 1 PURCHASE PCDE12 1 723.12
4 2008-01-12 00:00:00 2 CANCELLATION PCDE27 1 425.54
5 2008-01-12 00:00:00 2 PURCHASE PCDE12 2 423.23
6 2008-01-12 00:00:00 2 PURCHASE PCDE12 2 423.23
3 2008-01-04 00:00:00 2 PURCHASE PCDE12 3 623.23
7 2008-01-05 00:00:00 3 PURCHASE PCDE12 2 153.24
9 2008-01-21 00:00:00 4 PURCHASE PCDE43 1 823.11
8 2008-01-12 00:00:00 4 REFUND PCDE43 1 233.22
14
7. Counting rows
count(*) can be used to count the number of records in a table.
For example
select
count(*)
from
transactions
Result
count(*)
9
A “where’ clause can be used to count a sub-set of the rows in the table
select
count(*)
from
transactions
where
trans_date > '2008-01-06'
Result
count(*)
6
Counting sets of records can be done using the ‘Group By’ clause.
15
8. Summing totals
A ‘group by’ clause can be used to calculate totals, averages and counts of records.
‘Group by’ is a complex use of SQL functionality and is not recommending for initial use.
Example
select
customer_id,
trans_date,
sum(amount)
from
transactions
group by
customer_id;
Result
SQL has an open syntax that will allow many combinations of queries to be written.
The following rules should be used to produce meaningful result sets when ‘group by’ is
used.
One row will be produced in the result set for each combination of the ‘group by’ columns.
For example,
16
group by customer_id
Would produce a row for each date on which a customer transaction occurred.
Example
select
customer_id
from
customer
group by
customer_id;
For example
select
customer_id,
count(*) as transaction_count,
sum(amount) as transaction_total
from
transactions
group by
customer_id;
17
Result
This query will return one row for each customer who has transaction records, with the
following columns:
When an expression is used in place of a column name, the naming of the result column is
database-dependant.
A ‘having’ clause can be added to a ‘group by’ clause when aggregate functions are used.
For example
select
customer_id,
count(*) as trans_count,
sum(amount) as trans_total
from
transactions
group by
18
customer_id
having
sum(amount) > 200
Result
19
9. Cartesian Joins
A Cartesian join involves creating a result set containing all combinations of the records
from the input tables.
For example
select
*
from
customer,
transaction
This would not be a meaningful result set, as transaction data would appear beside customer
details of a customer unrelated to the transaction.
The lack of a ‘where’ or ‘join’ clause will result in all combinations of records being
returned.
20
10. Retrieving data from multiple tables
Most queries involve retrieving data from several input tables.
These are generally columns such as customer number, product code, transaction date, etc.
Key fields identify a record, rather than being stored data such as amounts, text values, etc.
Join syntax
select
t.customer_id,
t.trans_date,
c.postcode
from
transactions t
inner join
customer c
on
t.customer_id = c.customer_id;
Result
21
The join types are
Left join All records are returned from the first table, and matching records
from the second table
Right join All records are returned from the second table, and matching records
from the first table
Alias names
Alias names do not affect the result of a query however they can be useful in expressing the
query more simply.
For example
select
t.customer_id,
t.trans_date,
c.postcode
from
transactions as t
inner join
customer as c
on
t.customer_id = c.customer_id;
Alias names are necessary in the rare case in which an input table appears more than once in
a ‘select’ statement.
Also, if a column name appears in more than one input table, then an alias name should be
used to identify the relevant input table.
Columns can also be specified using alias names in the format ‘a.*’.
This indicates that all columns from table ‘a’ should be included in the result set.
22
The following layout is recommended when more than two input tables are included in a join
select
t.customer_id,
t.trans_date,
c.postcode,
p.product_code as product,
cu.description as currency
from
transactions as t
inner join
customer as c
on
t.customer_id = c.customer_id
inner join
product as p
on
t.product_code = p.product_code
left join
currency as cu
on
cu.code = p.currency_code;
Result
23
In the case of left joins and right joins, the order of tables in the query may affect the result
set. Each table is joined to the result of the previous joins. Field names in ‘on’ expressions
should only refer to tables that are specified earlier in the join list.
Where syntax
Joins can also be specified by listing multiple tables in the ‘from’ clause, and matching the
keys within the ‘where’ clause.
This syntax is equivalent to using ‘inner join’ on all the joined tables.
select
t.customer_id,
t.trans_date,
c.postcode,
p.product_code as product,
cu.description as currency
from
transactions as t,
customer as c,
product as p,
currency as cu
where
t.customer_id = c.customer_id and
cu.code = p.currency_code and
t.product_code = p.product_code
24
…
on
a.product_class = b.product_class and
a.product_subclass = b.product_subclass
For example
select distinct
customer_id
from
transactions;
This query will return a list of the customer_id values that appear in the transaction table
Result
customer_id
1
2
3
4
This function can be used with any query, but is most useful when there is a single result
column or a small number of result columns.
If a count of these values is required, the ‘group by’ syntax should be used
select
customer_id,
count(*)
from
transactions
group by
customer_id;
25
Result
customer_id count(*)
1 2
2 4
3 1
4 2
26
12. Union
The ‘union’ statement can be used to combine the results of two queries into a single result
set.
select * from
union all
‘union all’ combines the two result sets, while ‘union’ selects only the distinct records
Result
id product_code
1 PCDE43
2 PCDE52
3 PCDE12
1 PCDE27
2 PCDE12
27
13. Subqueries
An SQL query can be used in place of a table name.
The query should be placed within brackets, and used in place of a table name within another
query.
For example
select
count(*)
from
transactions as t
inner join
(
select distinct
product_code
from
product
) as p
on
t.product_code = p.product_code
In this example a bracketed query has been used in place of a table name.
In this case, a count of records is calculated from customer records joined to product codes.
The statement within the brackets is equivalent to a table containing the same data.
28
14. Updating data
In many cases it is not possible to recover data that is accidently altered or deleted.
Individual rows can be inserted into a table using the following syntax
insert into
currexchange (name, amount, exchdate)
values
('name1', 12.52, '2003-02-01')
Importing large quantities of records is dependant on the functions provided by the database
environment.
update
currexchange
set
amount = 32.23
where
exchdate = ‘2003-02-01’
29
Implementations vary in their ability to perform updates on views created by joining several
tables.
For example
30
15. NULL values
NULL values represent missing data.
This may indicate that a data item is not known, or is not relevant in that particular case.
Visual tools may display this result in several formats including NULL, (null), a blank field
etc.
select
customer_id,
amount
from
transactions
where
amount is not NULL
Result
customer_id amount
1 723.12
1 324.12
2 623.23
2 425.54
2 423.23
2 423.23
3 153.24
4 233.22
4 823.11
31
16. About the Author
Mark McIlroy has an undergraduate degree in Computer Science and Applied Mathematics
from Monash University.
He has extensive experience consulting in the banking and government sectors in Australia
in large SQL data warehouse environments.
32
17. Appendix A – Implementation variations
The SQL statements described here should be executable in most SQL environments.
Some differences may occur with issues such as specifying date constants.
For example
‘1990-04-12’
to_date( ‘01JUL2009’ )
etc.
Major implementations frequently have added syntax which is not compatible across
alternative implementations.
Examples include variations on join types such as OUTER JOIN, CROSS JOIN etc.
33
18. Appendix B – Summary of operators
Operators
Mathematical
* Multiplication
/ Division
+ Addition
- Subtraction
Relational
String
|| Concatenate
Aggregate
sum Sum
avg Average
count Count
min Minimum value
max Maximum value
stdev Standard Deviation
34
19. Appendix C – Other statements
Other Issues
These statements are not widely used as these functions are more easily performed using
database administration tools.
Administration statements
Descriptive statements
Statements for returning the information about the database, such as the list of tables.
SHOW TABLES
35
20. Appendix D – Test environment
www.markmcilroy.com/test_env/sql_test.php
You can try out the examples from the book in this environment.
Tables
36