Aggregate Queries
Aggregate Queries
Stéphane Bressan
The Case Aggregate Functions Defining Groups HAVING Clause
Requirements
We want to develop an application for managing the data of our online app store. We
would like to store several items of information about our customers such as their first
name, last name, date of birth, e-mail, date and country of registration to our online
sales service and the customer identifier that they have chosen . We also want to
manage the list of our products, games, their name, their version and their price. The
price is fixed for each version of each game. Finally, our customers buy and download
games. So we must remember which version of which game each customer has
downloaded. It is not important to keep the download date for this application.
The Case Aggregate Functions Defining Groups HAVING Clause
Entity-relationship Diagram
SQLite
1 sqlite> . open m y f i l e . db
2 sqlite> . mode column
3 sqlite> . h e a d e r s on
4 sqlite> PRAGMA f o r e i g n k e y s = ON;
5 sqlite> . r e a d AppStoreSchema . s q l
6 sqlite> . read AppStoreCustomers . s q l
7 sqlite> . r e a d AppStoreGames . s q l
8 sqlite> . read AppStoreDownloads . s q l
9 ...
10 sqlite> . quit
The Case Aggregate Functions Defining Groups HAVING Clause
SQLite
1 % p s q l =h l o c a l h o s t =U p o s t g r e s
2 Password f o r u s e r p o s t g r e s :
3 psql (9.6.3 , server 9.6.4)
4 Type ” h e l p ” f o r h e l p .
5 p o s t g r e s=# CREATE DATABASE d e d o m e n o l o g y ;
6 CREATE DATABASE
7 p o s t g r e s=# \c d e d o m e n o l o g y ;
8 psql (9.6.3 , server 9.6.4)
9 You a r e now c o n n e c t e d t o d a t a b a s e ” d e d o m e n o l o g y ” a s u s e r ” p o s t g r e s ” .
10 p o s t g r e s=# \ i AppStoreSchema . s q l
11 CREATE TABLE
12 CREATE TABLE
13 CREATE TABLE
14 p o s t g r e s=# \ i A p p S t o r e C u s t o m e r s . s q l
15 INSERT 0 1
16 ...
17 p o s t g r e s=# \ i AppStoreGames . s q l
18 INSERT 0 1
19 ...
20 p o s t g r e s=# \ i A p p S t o r e D o w n l o a d s . s q l
21 INSERT 0 1
22 ...
23 p o s t g r e s=# \q
The Case Aggregate Functions Defining Groups HAVING Clause
Aggregate Functions
1 SELECT COUNT( * )
2 FROM c u s t o m e r s ;
count
1000
The above query prints the number of rows in the table customers.
The Case Aggregate Functions Defining Groups HAVING Clause
Aggregate Functions
1 SELECT COUNT( c . c u s t o m e r i d )
2 FROM c u s t o m e r s c ;
count
1000
The above query prints the number of rows in the table customers.
1 SELECT COUNT( c . c o u n t r y )
2 FROM c u s t o m e r s c ;
count
1000
The above query also prints the number of rows in the table customers.
The Case Aggregate Functions Defining Groups HAVING Clause
Aggregate Functions
1 SELECT COUNT( c . c o u n t r y )
2 FROM c u s t o m e r s c ;
count
1000
Aggregate Functions
count
5
We need to add the keyword DISTINCT inside the COUNT() aggregate function if we
want to count the number of different countries in the column country of the table
customers.
DISTINCT can be used in other aggregate functions similarly.
The Case Aggregate Functions Defining Groups HAVING Clause
Aggregate Functions
The following query finds the maximum, minimum, average and standard deviation
prices of our games.
It uses the arithmetic function TRUNC() to display two decimal places for average and
standard deviation.
1 SELECT MAX( g . p r i c e ) ,
2 MIN( g . p r i c e ) ,
3 TRUNC(AVG( g . p r i c e ) , 2 ) AS ave ,
4 TRUNC(STDDEV( g . p r i c e ) , 2 ) AS s t d
5 FROM games g ;
Defining Groups
The GROUP BY clause creates groups of records that have the same values for the
specified fields before computing the aggregate functions.
1 GROUP BY c . c o u n t r y ;
The Case Aggregate Functions Defining Groups HAVING Clause
Defining Groups
1 SELECT c . c o u n t r y , COUNT( * )
2 FROM c u s t o m e r s c
3 GROUP BY c . c o u n t r y ;
country count
”Vietnam” 98
”Singapore” 391
”Thailand” 100
”Indonesia” 243
”Malaysia” 168
1 SELECT COUNT( * )
2 FROM c u s t o m e r s c
If no GROUP BY clause is specified only one group is formed as soon as one aggregate
function is used.
The Case Aggregate Functions Defining Groups HAVING Clause
Defining Groups
Groups are formed after the rows have been filtered by the WHERE clause.
1 SELECT c . c o u n t r y , COUNT( * )
2 FROM c u s t o m e r s c
3 WHERE c . dob >= ’ 2000 =01 =01 ’
4 GROUP BY c . c o u n t r y ;
country count
”Vietnam” 4
”Singapore” 25
”Thailand” 5
”Indonesia” 15
”Malaysia” 12
The Case Aggregate Functions Defining Groups HAVING Clause
Defining Groups
The following query finds the total spending for each customer.
1 SELECT c . c u s t o m e r i d , c . f i r s t n a m e , c . l a s t n a m e , SUM( g . p r i c e )
2 FROM c u s t o m e r s c , d o w n l o a d s d , games g
3 WHERE c . c u s t o m e r i d = d . c u s t o m e r i d
4 AND d . name = g . name AND d . v e r s i o n = g . v e r s i o n
5 GROUP BY c . c u s t o m e r i d , c . f i r s t n a m e , c . l a s t n a m e ;
Note that we include the columns first_name and last_name in the GROUP BY
clause because we want to print them.
The Case Aggregate Functions Defining Groups HAVING Clause
Defining Groups
1 SELECT c . f i r s t n a m e , c . l a s t n a m e , SUM( g . p r i c e )
2 FROM c u s t o m e r s c , d o w n l o a d s d , games g
3 WHERE c . c u s t o m e r i d = d . c u s t o m e r i d
4 AND d . name = g . name AND d . v e r s i o n = g . v e r s i o n
5 GROUP BY c . c u s t o m e r i d ;
The above query works only because the first and last name are guaranteed to be
unique for a given customer identifier (which is the primary key of the table
customers).
Do not write such queries for the sake of readability and portability.
The Case Aggregate Functions Defining Groups HAVING Clause
Defining Groups
We should write the query as follows, making sure that every column mentioned in the
SELECT clause is mentioned in the GROUP BY unless it is used in an aggregate
function.
1 SELECT c . f i r s t n a m e , c . l a s t n a m e , SUM( g . p r i c e )
2 FROM c u s t o m e r s c , d o w n l o a d s d , games g
3 WHERE c . c u s t o m e r i d = d . c u s t o m e r i d
4 AND d . name = g . name AND d . v e r s i o n = g . v e r s i o n
5 GROUP BY c . c u s t o m e r i d , c . f i r s t n a m e , c . l a s t n a m e ;
The Case Aggregate Functions Defining Groups HAVING Clause
Defining Groups
The query below does not work in PostgreSQL (it does work in SQLite but such
queries could give wrong results).
1 SELECT c . c u s t o m e r i d , c . f i r s t n a m e , c . l a s t n a m e , SUM( g . p r i c e )
2 FROM c u s t o m e r s c , d o w n l o a d s d , games g
3 WHERE c . c u s t o m e r i d = d . c u s t o m e r i d
4 AND d . name = g . name AND d . v e r s i o n = g . v e r s i o n
5 GROUP BY c . f i r s t n a m e , c . l a s t n a m e ;
Defining Groups
The following query displays the number of downloads by country of registration and
year of birth of the customers. EXTRACT() is a PostgreSQL function. STRFTIME() is a
SQLite function.
Defining Groups
The order of columns in the GROUP BY clause does not change the meaning of the
query.
HAVING Clause
1 SELECT c . c o u n t r y
2 FROM c u s t o m e r s c
3 WHERE COUNT( * ) >= 100
4 GROUP BY c . c o u n t r y ;
1 ERROR : a g g r e g a t e f u n c t i o n s a r e n o t a l l o w e d i n WHERE
2 LINE 4 : WHERE COUNT( * ) >= 100
3 ˆ
4 SQL s t a t e : 42803
5 C h a r a c t e r : 42
However aggregate functions are not allowed in the WHERE clause. This is because
they can only be evaluated after the groups are formed. The groups are formed after
rows are filtered by the WHERE clause.
The Case Aggregate Functions Defining Groups HAVING Clause
HAVING Clause
Instead, we use a new clause: the HAVING clause to add conditions to be checked
after the evaluation of the GROUP BY clause.
The HAVING clause can only involve aggregate functions, columns listed in the
GROUP BY clause and subqueries.
The Case Aggregate Functions Defining Groups HAVING Clause
HAVING Clause
The following query finds the countries in which there are more than 100 customers.
1 SELECT c . c o u n t r y
2 FROM c u s t o m e r s c
3 GROUP BY c . c o u n t r y
4 HAVING COUNT( * ) >= 1 0 0 ;
country
”Singapore”
”Thailand”
”Indonesia”
”Malaysia”
Copyright 2022 Stéphane Bressan. All rights reserved.