0% found this document useful (0 votes)
4 views

Aggregation

The document discusses aggregate functions in SQL, which are used to perform calculations on multiple rows and return a single result, often in conjunction with the GROUP BY clause. It explains various aggregate functions like SUM, COUNT, AVG, MIN, MAX, and how to concatenate values using GROUP_CONCAT, LISTAGG, and STRING_AGG. Additionally, it covers the importance of the HAVING clause for filtering results after aggregation, and the necessity of using aggregate functions or including columns in the GROUP BY clause when performing queries.

Uploaded by

jmrobison5
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Aggregation

The document discusses aggregate functions in SQL, which are used to perform calculations on multiple rows and return a single result, often in conjunction with the GROUP BY clause. It explains various aggregate functions like SUM, COUNT, AVG, MIN, MAX, and how to concatenate values using GROUP_CONCAT, LISTAGG, and STRING_AGG. Additionally, it covers the importance of the HAVING clause for filtering results after aggregation, and the necessity of using aggregate functions or including columns in the GROUP BY clause when performing queries.

Uploaded by

jmrobison5
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Aggregating and Grouping

I. Aggregate Functions
Most DBMS provide functions to aggregate numerical data, such as summing a numeric column or
averaging a numeric column. In addition, aggregate functions include counting the number of
occurrences of a given column or rows in a table, etc…

Aggregate functions work on multiple rows… and return a single row based on the aggregation
performed on those rows. You often see aggregate functions used in conjunction with the GROUP BY
clause (discussed later), although this is not a requirement.

Unlike the numeric and string functions, aggregate functions are mostly standard.

Function Description
SUM(column) Total the values for all the instances for a numeric column.
COUNT(column) Count the number of occurrences for all instances for a
COUNT(*) numeric or alphabetic column.
COUNT(*) – count all instances of rows
AVG(column) Compute the average of all instances of a numeric column.
MIN(column) Return the smallest value of all instances for a numeric or
alphabetic column
MAX(column) Return the largest value of all instances for a numeric or
alphabetic column
MEDIAN(column) Return the median (sorted mid-point value) of a column
- If number of rows is odd, you get the value for the row.
- If number of rows is even, you get the average of the 2 rows.
STDDEV(column) Return the standard deviation of a column.
GROUP_CONCAT(…) Mysql and SQLite
LISTAGG(…) Oracle
STRING_AGG(…) PostgreSQL
Concatenate all the values of a single (or multiple) columns.

Examples.: SELECT SUM(price) FROM course


SELECT COUNT(*) FROM student /* Count all rows */
SELECT COUNT(fname), COUNT( DISTINCT fname) /* count first names */
FROM student /* count distinct fname */
SELECT SUM(price), COUNT(price), AVG(price), MIN(price), MAX(price)
FROM course
WHERE price > 1000

Property of: Sam Sultan © Page 1 of 35


Aggregating and Grouping

Be Aware…
Question: I want to display information about the cheapest course in the course table:

SELECT * FROM course /* Take a close look at the data */

SELECT MIN(price) FROM course /* The lowest priced course */

SELECT MIN(price), description /* You cannot do this. */


FROM course /* All columns must be aggregated */
/* or included in GROUP BY */
` /* MySql  wrong results. Oracle? */

SELECT MIN(price), MIN(description) /* This is now technically OK, */


FROM course /* but still not what I am looking for */

SELECT price, description FROM course /* To get the lowest priced course */
ORDER BY price /* sort it by price */
/* look at the first row */

(Best Approach – use a subquery)

SELECT price, description /* use a sub-query */


FROM course
WHERE price = (SELECT MIN(price) FROM course)

*** Once a single column uses an aggregate function, all columns must use aggregate functions,
unless those columns are included in a GROUP BY clause. (covered later)

Treatment of NULL in Aggregate functions

NULL values are not included for aggregation purposes. If a column value is NULL it is not
considered for the purpose of counting, summing, or averaging. NULL values are also not considered
for either the MIN or MAX functions.

 If a numeric column contains a zero, then the column will of course be included for summing,
counting, averaging, and min/max functions as any other number.
 If a text column contains an empty string '', then the column will be included for counting, and
min/max functions as any other text values.
 Some databases (example Oracle) treat an empty string '' as a NULL. Other databases (including
MySql) do not.

Property of: Sam Sultan © Page 2 of 35


Aggregating and Grouping

Aggregating (Concatenating) ALL the Values of a Column

The MySql GROUP_CONCAT(…), Oracle LISTAGG(…), and PostgreSQL STRING_AGG(…)


aggregate functions allow you to concatenate all the values of a single column or multiple columns
across multiple rows into a single value. These functions are similar to all aggregate functions and
return a single row.

MySql and SQLite:


GROUP_CONCAT( [DISTINCT] columns [ORDER BY columns] [SEPARATOR ‘string’] )
[ ] denotes optional
Examples:
SELECT GROUP_CONCAT(lname) FROM student; --separated with commas
SELECT GROUP_CONCAT( DISTINCT lname) FROM student;
SELECT GROUP_CONCAT(lname SEPARATOR ', ' )
FROM student;
SELECT GROUP_CONCAT(lname ORDER BY lname SEPARATOR ', ' )
FROM student;
SELECT GROUP_CONCAT(fname, lname ORDER BY lname DESC, fname SEPARATOR ', '),
COUNT(lname) as total
FROM student;

GROUP_CONCAT(fname, lname ORDER BY lname DESC, fname SEPARATOR ', ') Total
EileenWillis, LillianVasquez, KathyTok, WayneTobias, EugeneThomas, EdwardTeagan, 25
SamSultan, PatrickStack, JohnSoley, DavidSmith, NatashaRyan, JosephRace, CynthiaOwens,
ColetteNelson, RickMyers, janetmiller, AnyaMilgrom, PhyllisJames, RobertGrace, DuncanDavidson,
DavidChan, VincentCambria, BarbaraBurns, AngelBrinson, MariaAustin
1 rows returned

Oracle:
LISTAGG( column [,‘separator’] ) [WITHIN GROUP (ORDER BY columns)]
[ ] denotes optional
Examples:
SELECT LISTAGG(lname, ', ') -- separated using ‘, ’
FROM student;
SELECT LISTAGG(lname, ', ') WITHIN GROUP (ORDER BY lname)
FROM student;
SELECT LISTAGG(fname || lname, ', ') WITHIN GROUP (ORDER BY lname DESC, fname),
COUNT(lname) as total
FROM student --same output as above

Property of: Sam Sultan © Page 3 of 35


Aggregating and Grouping

II. Grouping Data


The SQL SELECT statement includes various clauses within it. These include the FROM clause, the
WHERE clause, the GROUP BY clause, the HAVING clause, and the ORDER BY clause.
Each of these clauses serves a particular purpose.

The GROUP BY clause allows you to create one or more groupings of the selected data. The
GROUP BY collapses multiple individual rows sharing the same value(s) into a single grouped row.

The GROUP BY clause instructs the DBMS to sort the selected rows based on the value of the column
specified in the GROUP BY, and to group that data and return a single row for each unique value for
that column (or combination columns) being grouped.

Using the following payment table;


Vendor Amount Description Pay_date
Con Edison 125.15 Electric 2017-01-31
Verizon 54.79 Home Phone 2017-02-28
Allstate 1240.50 Home Insurance 2017-03-01
Verizon 49.95 Cell Phone 2017-03-02
Verizon 39.95 Internet Service 2017-03-03
Con Edison 59.63 Electric 2017-03-04
Department of Water 155.43 Water & Sewers 2017-03-05
Allstate 1550.25 Car Insurance 2017-03-06

I can issue the following:


-- Can I do…
SELECT vendor FROM payment -- SELECT * FROM payment
GROUP BY vendor -- GROUP BY vendor ???
-- Try it in all databases.
Vendor
Allstate
Con Edison
Department of Water
Verizon

As you can see, the SELECT statement automatically returned a single row for each unique vendor
found in the table.
SELECT vendor, MAX(pay_date) FROM payment -- Why do I need MIN( ) or MAX( )
GROUP BY vendor -- for pay_date? Try it without it.
-- Does it work in MySql? Oracle?
Vendor MAX(pay_date)
Allstate 2017-03-06
Con Edison 2017-03-04
Department of Water 2017-03-05
Verizon 2017-03-03

Property of: Sam Sultan © Page 4 of 35


Aggregating and Grouping

Aggregating and Grouping


You often use the various aggregate functions such as SUM( ), COUNT( ), AVG( ), MIN( ), MAX( )
in conjunction with the GROUP BY clause

** Most analytic queries perform the above by aggregating one or more numeric columns while
doing a GROUP BY on one or more descriptive columns. This is very useful.

Using the payment table from previous page;

SELECT vendor, SUM(amount)


FROM payment
GROUP BY vendor
Vendor Sum(amount)
Allstate 2790.75
Con Edison 184.78
Department of Water 155.43
Verizon 144.69

SELECT vendor, SUM(amount) paid, COUNT(amount) tally, AVG(amount) "Avg Pay"


FROM payment
GROUP BY vendor
Vendor Paid Tally Avg Pay
Allstate 2790.75 2 1395.375
Con Edison 184.78 2 92.39
Department of Water 155.43 1 155.43
Verizon 144.69 3 48.23

GROUP BY multiple columns.

SELECT vendor, description, SUM(amount) paid, COUNT(amount) tally


FROM payment
GROUP BY vendor, description
Vendor Description Paid Tally
Allstate Car Insurance 1550.25 1
Allstate Home Insurance 1240.50 1
Con Edison Electric 184.78 2
Department of Water Water & Sewers 155.43 1
Verizon Cell Phone 49.95 1
Verizon Internet Service 39.95 1
Verizon Home Phone 54.79 1

The database server returned a single row for every unique combination of vendor and description.

Property of: Sam Sultan © Page 5 of 35


Aggregating and Grouping

Mixing of Aggregated and Non-Aggregated Columns NO


The Golden Rule:

*** When using aggregate functions such as SUM( ), COUNT( ), AVG( ), MIN( ) and MAX( ),
or using the GROUP BY clause,
All columns must either use aggregate functions, or must be added to the GROUP BY.

Examples:

SELECT vendor, description, SUM(amount) /* vendor, description are not aggregated */


FROM payment /* amount is aggregated */

Which vendor or description would I get? Does the query even make any sense?

(or) SELECT vendor, description, SUM(amount) /* amount is aggregated */


FROM payment /* description is not aggregated */
GROUP BY vendor /* vendor is grouped by */

Which description would I get? Does the query even make any sense?

 The above will not work in Oracle, or PostgreSQL database servers. This is the standard.
 The above will work in MySQL & SQLite, but not standard, and does it really make sense? Try it.

Change this to either:

SELECT MAX(vendor), MAX(description), SUM(amount) /* all columns aggregated */


FROM payment /* perhaps not what you want */
(or)
SELECT vendor, MAX(description), SUM(amount)
FROM payment
GROUP BY vendor /* add a GROUP BY */
(or)
SELECT vendor, description, SUM(amount) /* most likely this is what */
FROM payment /* you are looking for */
GROUP BY vendor, description

Property of: Sam Sultan © Page 6 of 35


Aggregating and Grouping

Typical Analytic Queries:

Most analytic queries rely on the use of AGGREGATE functions in combination with GROUP BY.
You will most often aggregate one or more numerical values, while at the same time group by one or
more descriptive attributes.

Examples from demo database….


1. select lname, fname, count(course_id) as num_courses How many courses
from student join class on ssn=stu_ssn each student is taking
group by lname, fname
order by 3 desc;

2. select lname, fname, count(course_id), sum(price) Count courses and sum


from student join class on ssn=stu_ssn their prices, per student
join course using(course_id)
group by lname, fname
order by 3 desc;

3. select sex, count(course_id), avg(price)


from student join class on ssn=stu_ssn Which sex takes more
join course using(course_id) classes, and more
group by sex expensive classes
order by 2 desc

Drill Downs

In addition, another common analytical query against a database is the concept of a “drill down”.
In a drill down, I have a previous query that gave me a higher-level perspective analysis of the data, and
now I want to focus on a subset of that original query, and drill down to more detail.

Image if the original query was delivered to you as a pie chart, and now you want to drill down (perhaps
by double clicking) to a lower level detail for a particular slice of the pie.

A drill down for example #3 above by course description, but for female students only.
 I add course description to the select line, and to the group by
 I add a filter for female students only

select sex, description, count(course_id), avg(price)


Drill down by course description,
from student join class on ssn=stu_ssn
and filter female students only
join course using(course_id) What is the avg GPA
where sex = 'F'
group by sex, description
order by 3 desc

Property of: Sam Sultan © Page 7 of 35


Aggregating and Grouping

Aggregating & Grouping Descriptive Columns

As mentioned previously, you can use the MySql GROUP_CONCAT( ) or Oracle LISTAGG( ) or
PostgreSQL STRING_AGG( ) as aggregate functions for descriptive data. As you recall those functions
concatenate the values of all rows for a particular column. This is a great way to display all descriptive
values for all the rows that you are grouping by.

Examples…

MySql and SQLite:


SELECT vendor, GROUP_CONCAT(description), SUM(amount)
FROM payment
GROUP BY vendor
(or)
SELECT vendor, GROUP_CONCAT(description ORDER BY description), SUM(amount)
FROM payment
GROUP BY vendor
(or)
SELECT vendor, GROUP_CONCAT(description ORDER BY description SEPARATOR ' - ')
AS "Multi-Row Descriptions",
SUM(amount) as "total"
FROM payment
GROUP BY vendor

Oracle:
SELECT vendor, LISTAGG(description, ' - ') WITHIN GROUP (ORDER BY description),
SUM(amount) as "total"
FROM payment
GROUP BY vendor

PostgreSQL:
SELECT vendor, STRING_AGG(description, ' - ' ORDER BY description) AS "multi-row",
SUM(amount) as "total"
FROM payment
GROUP BY vendor

All will produce

Vendor Multi-Row Descriptions Total


Allstate Car Insurance - Home Insurance 2790.75
Con Edison Electric - Electric 184.78
Department of Water Water and Sewers 155.43
Verizon Cell Phone - Internet Service - Home Phone 144.69

Property of: Sam Sultan © Page 8 of 35


Aggregating and Grouping

The HAVING Clause

What if you want to filter the final result after the aggregation/grouping was performed?

In addition to allowing you to filter the rows that will be used as a basis for the SELECT process,
SQL also allows you to filter the data after the aggregation/grouping process has completed. This
allows you filter data that did not meet thresholds of your aggregated values.

The WHERE clause that we have already seen does not work for filtering aggregated/grouped rows.
The WHERE clause filters rows that would be selected before the aggregation/grouping is performed.

So what do we use instead of WHERE? SQL provides yet another clause for the purpose of filtering
the resulting data after the aggregation/grouping is performed. The name of this new filtering clause is
the HAVING clause. The HAVING clause is very similar to the WHERE clause. In fact, all types of
WHERE expressions you learned about thus far can also be used with HAVING.

The only difference between the WHERE and the HAVING is:

 The WHERE clause filters the raw data before the aggregation/grouping process.
 The HAVING clause filters rows after the aggregation/grouping process has competed.

The above statements are important distinctions between the two clauses. That is rows that are
eliminated by the WHERE clause will not be included in the aggregation or grouping process. This
could change the values of the data being aggregated, which in turn could affect which rows meet or
do not meet the criteria for the HAVING clause.

HAVING supports all the WHERE comparison and logical operators we have seen. All the
techniques that you were able to use with the WHERE clause, including comparison operations
(=, >, <, BETWEEN, LIKE, etc.), and logical operations (AND, OR, etc.) can also be used in the
HAVING clause. The syntax is identical expect the keyword is different.

Please Note:

 Efficiency consideration:
If possible, any data value known ahead of time, should be filtered using the WHERE clause,
since it is performed prior to the burden of aggregation and grouping. HAVING should only be
used for filtering aggregated data.

Property of: Sam Sultan © Page 9 of 35


Aggregating and Grouping

Example from before:

SELECT vendor, SUM(amount), COUNT(amount)


FROM payment
GROUP BY vendor

Vendor Sum(amount) Count(amount


)
Allstate 2790.75 2
Con Edison 184.78 2
Department of Water 155.43 1
Verizon 144.69 3

Adding a HAVING clause:

SELECT vendor, SUM(amount), COUNT(amount)


FROM payment
GROUP BY vendor
HAVING SUM(amount) > 150

Vendor Sum(amount) Count(amount


)
Allstate 2790.75 2
Con Edison 184.78 2
Department of Water 155.43 1

Here you see that the last row (vendor “Verizon”) has been eliminated from the final result set.

As you recall from previous discussion, the name of the columns returned by this query are “vendor”,
“sum(amount)”, and “count(amount)”. We are asking SQL to filter using the second column – that is
the column called “sum(amount)” – and only return rows where that column value is more than $150.

Using Column Aliasing - Same as above: (not supported by Oracle or PostgreSQL)

SELECT vendor, SUM(amount) AS total, COUNT(amount) tally


FROM payment
GROUP BY vendor
HAVING total > 150 /* using a column alias */

Vendor Total Tally

Allstate 2790.75 2
Con Edison 184.78 2
Department of Water 155.43 1

Property of: Sam Sultan © Page 10 of 35


Aggregating and Grouping

So is there ever a need to use both the WHERE clause and the HAVING clause in the same SQL
statement? The answer is actually yes.

Suppose that you want to filter all the payments you made in this month to any vendor that received 2 or
more payments from you. To do this, you must first filter all the payments that you made in this month.
To do this you must use the WHERE clause. Once those payments have been filtered, you use a
GROUP BY mechanism (as discussed above) to group the number of payments you made to each
vendor. And as a final step, you must filter the groups by using the HAVING clause, and eliminate any
vendor where number of payments is less than 2.

SELECT vendor, count(vendor)


FROM payment
WHERE MONTH(pay_date) = '03' /* for March. Or simply 3 */
GROUP BY vendor /* for Oracle use: */
HAVING count(vendor) >= 2 /* to_char(pay_date, 'mm') = 3 */

Vendor Count(vendor)
Allstate 2
Verizon 3

SELECT concat(vendor, description), sum(amount) total


FROM payment
WHERE description != 'Home Insurance'
GROUP BY concat(vendor, description) /* Notice grouping using function */
HAVING sum(amount) > 150 /* or derived column */

Vendor Total
AllstateCar Insurance 1550.25
Con EdisonElectric 184.78
Department of WaterWater and Sewers 155.43

Here you see that only 1 payment to “Allstate” was included in the grouping mechanism the other payment
was eliminated by the WHERE clause. In addition, all payments for “Verizon” were eliminated by the
HAVING clause since they did not meet the HAVING conditions.

SELECT vendor, sum(amount) total, count(vendor) tally -- Using WHERE


FROM payment -- and HAVING
WHERE amount > 50 -- on the same column.
GROUP BY vendor -- also
HAVING sum(amount) > 250 OR count(vendor) >= 2 -- Adding OR

Vendor Total Tally


Allstate 2790.75 2
Con Edison 184.78 2

Property of: Sam Sultan © Page 11 of 35


Aggregating and Grouping

GROUP BY with Rollup (Sub-Totals)

We often use the GROUP BY clause in conjunction with the SUM( ), AVG( ) and COUNT( ) functions
to obtain totals, averages and counts of rows in the database table.

SELECT vendor, description, SUM(amount) paid, COUNT(amount) tally, AVG(amount) avg


FROM payment
GROUP BY vendor, description

Vendor Description Paid Tally Avg


Allstate Car Insurance 1550.25 1 1550.25
Allstate Home Insurance 1240.50 1 1240.50
Con Edison Electric 184.78 2 92.39
… … … … …
Verizon Home Phone 54.79 1 54.79

What if I want to obtain a grand total for the entire query? Or, what if I want to obtain intermediate
totals for example totals for Allstate payments above. The answer is I use the ROLLUP option of the
GROUP BY clause. The ROLLUP option is not standard.

 For MySql (and SQL Server):

SELECT vendor, description, SUM(amount) paid, COUNT(amount) count, AVG(amount) avg


FROM payment
GROUP BY vendor, description WITH ROLLUP

 For Oracle or PostgreSQL:

SELECT vendor, description, SUM(amount) paid, COUNT(amount) count, AVG(amount) avg


FROM payment
GROUP BY ROLLUP(vendor, description) -- or GROUP BY CUBE(…)

Vendor Description Paid Count Avg


Allstate Car Insurance 1550.25 1 1550.25
Allstate Home Insurance 1240.50 1 1240.50
Allstate 2790.75 2 1395.38
Con Edison Electric 184.78 2 92.39
Con Edison 184.78 2 92.39
… … … 1 …
Verizon Home Phone 54.79 1 54.79
Verizon 144.69 3 48.23
3275.65 8 409.46

For Oracle, the above can also be written using GROUPING SETS
GROUP BY GROUPING SETS( (vendor, description), (vendor), ( ) )

Property of: Sam Sultan © Page 12 of 35


Aggregating and Grouping

The ORDER BY Clause

What order do the returned rows from a SELECT query appear in? The answer is they are
returned in no particular order. In reality however, they are returned in the most optimal way that
the database can return them without processing a sort.

Most optimal way? Sometimes this means in the same order as the rows were first placed in the
table. Sometimes however this may not even be the case as various inserts and deletes may have
created table segmentations, and the original order in no longer the most optimal retrieval path.

The bottom line is that you cannot and should not rely on the default order if you do not explicitly
request a sort. Relational database design theory states that the sequence of retrieved data cannot be
assumed to have significance if ordering was not explicitly specified.

The ORDER BY clause is used to explicitly sort the data retrieved using the SELECT statement. The
ORDER BY takes the name of one or more columns by which to sort the returned rows. The order
could be specified as either ascending, which is the default, or descending.

The ORDER BY clause sorts the final resulting rows, so if the SQL statement has a GROUP BY
clause the ORDER BY sorts the grouped rows, not the original rows selected before the grouping
process.

SELECT course_id, description, price


FROM course
ORDER BY price

Course_id Description Price


X52-9272 SQL Programming Language 540
X52-9267 Object Oriented Analysis & Design 995
X52-9238 Introduction to Java 1095
… … …
X52-9759 XML for Web Development 1095
X52-9742 Intensive Web Development 3995

You can specify multiple columns for the ORDER BY clause. When multiple columns are specified,
the sort is performed on the first column. If multiple values of the first column are found to be equal
in the first column, the rows are sorted by the value of the second column, and so forth (if a third, etc.
column is specified).

SELECT course_id, description, price


FROM course
ORDER BY price, description

Property of: Sam Sultan © Page 13 of 35


Aggregating and Grouping

Course_id Description Price


X52-9272 SQL Programming Language 540
X52-9267 Object Oriented Analysis & Design 995
X52-9238 Introduction to Java 1095
X52-9562 Java web Services 1095
X52-9755 JavaScript 1095
X52-9740 Web Page Development with HTML 1095
X52-9759 XML for Web Development 1095
X52-9742 Intensive Web Development 3995

If not specified, the sort is performed in ascending order. You can however change that default
order by specifying that you want to sort to be in descending order. To do so, simply add the
keyword DESC after the column name. You can also use the keyword ASC in the same way.
However, since ASC is the default, there is no need to include in the ORDER BY clause.

SELECT course_id, description, price


FROM course
ORDER BY price ASC, description DESC /* ASC is optional */

The ORDER BY clause allows you to specify the sort columns by column name, or by the column
number in the SELECT statement, or if a SELECT * is requested, then the order the columns appear
in the underlying table.

SELECT description, price, course_id


FROM course
ORDER BY 2, 1 /* sort by price, then description */

SELECT *
FROM student, class
WHERE ssn = stu_ssn
ORDER BY 2, 7 DESC /* sort by student.lastname, class.course_id */

Is the Sort Case Sensitive?

Remember the ORDER BY clause is case sensitive by standard.


 MySql database however is not case sensitive. To sort case sensitive, use:
ORDER BY BINARY lname. -- case sensitive

 Oracle database is case sensitive. To sort not case sensitive, use:


ORDER BY LOWER( lname ). -- not case sensitive

Property of: Sam Sultan © Page 14 of 35


Aggregating and Grouping

Putting it all together

SELECT vendor, SUM(amount) paid, COUNT(amount) tally


FROM payment
GROUP BY vendor
ORDER BY COUNT(amount) DESC /* using COUNT(amount) */
/* as a column name */
Vendor Paid Tally

Verizon 144.69 3
Allstate 2790.75 2
Con Edison 184.78 2
Department of Water 155.43 1

SELECT vendor, SUM(amount) paid, COUNT(amount) tally


FROM payment
WHERE amount > 100
GROUP BY vendor
ORDER BY paid DESC, tally DESC /* using column alias */

Vendor Paid Tally

Allstate 2790.75 2
Department of Water 155.43 1
Con Edison 125.15 1

Please note: Even though the query had to sort the data in vendor order to perform the GROUP BY,
your request to ORDER BY will re-sort the final result in the order that you desire.

SELECT vendor, SUM(amount) paid, COUNT(amount) tally


FROM payment /* the FROM clause */
WHERE amount > 100 /* the WHERE clause */
GROUP BY vendor /* the GROUP BY clause */
HAVING SUM(amount) > 150 /* the HAVING clause */
ORDER BY 2 DESC, 3 /* the ORDER BY clause */

Vendor Paid Tally

Allstate 2790.75 2
Department of Water 155.43 1

Please note:

Property of: Sam Sultan © Page 15 of 35


Aggregating and Grouping

The sequence of the SELECT clauses must always be in the sequence of the final example above

Property of: Sam Sultan © Page 16 of 35


Aggregating and Grouping

Display both Detail and Summary Rows

As you have seen thus far, using aggregation with the GROUP BY clause collapses the raw data,
and only gives you the summary rows. This is also true even if you use the ROLLUP feature.

What if on the other hand you would like to see the detail rows, as well as the summary rows?
The answer to that is that you need to perform multiple queries, and UNION the multiple results.

Example, I want to see the raw data from my payment table along with multiple level of summaries.

SELECT vendor, description, amount, null as "count" -- retrieve all detail rows
FROM payment
UNION ALL
SELECT vendor, concat(description,' Total >>'), SUM(amount), COUNT(*)
FROM payment
GROUP BY vendor, description -- summarize by vendor & desc
UNION ALL
SELECT concat(vendor,' Total >>'), null, SUM(amount), count(*)
FROM payment
GROUP BY vendor -- summarize by vendor
UNION ALL
SELECT '_Grand Total >>>', null, SUM(amount), COUNT(*) -- grand total
FROM payment
ORDER BY 1,2

vendor description amount count


Allstate Car Insurance 1550.25
Allstate Car Insurance Total >> 1550.25 1
Allstate Home Insurance 1240.50
Allstate Home Insurance Total >> 1240.50 1
Allstate Total >> 2790.75 2
Con Edison Electric 125.15
Con Edison Electric 59.63
Con Edison Electric Total >> 184.78 2
Con Edison Total >> 184.78 2
Department of Water Water and Sewers 155.43
Department of Water Water and Sewers Total >> 155.43 1
Department of Water Total >> 155.43 1
Verizon Cell Phone 49.95
Verizon Cell Phone Total >> 49.95 1
Verizon Home Phone 54.79
Verizon Home Phone Total >> 54.79 1
Verizon Internet Service 39.95
Verizon Internet Service Total >> 39.95 1
Verizon Total >> 144.69 3
_Grand Total >>> 3275.65 8

Property of: Sam Sultan © Page 17 of 35


Aggregating and Grouping

Using Count( ) & Group By to Find Duplicate Records

I often need to inspect whether my table has duplicate records. If you have a unique id (such as a
primary key), the key will ensure that each record has a unique identifier. However, the key does not
ensure that your data values (or even your entire row) are not the same. So if I am not diligent about
ensuring that I do not add a student unless I first check if the student already exists in my table, then I
could have duplicate records.

So how do I find those duplicate records in my table? The easiest way is to select all of the columns
that you want to check possible duplicate values on, then count them and group them, and then filter
only the ones having a count greater than 1.

Example 1:

Supposing that I believe that I paid a particular vendor twice. For this situation, I believe that if the
vendor is the same and the description is the same, this could potentially be a duplicate payment.
To find out if I have such a condition, I would select vendor and description, I count, and group by
vendor and description, as follows:

SELECT vendor, description, COUNT(*)


FROM payment
GROUP BY vendor, description

vendor description count(*)


Allstate Car Insurance 1
Allstate Home Insurance 1
Con Edison Electric 2
Department of Water Water and Sewers 1
Verizon Cell Phone 1
Verizon Internet Service 1
Verizon Home Phone 1

And by adding a HAVING clause, I get the possible duplicate:

SELECT vendor, description, COUNT(*)


FROM payment
GROUP BY vendor, description
HAVING count(*) > 1

vendor description count(*)


Con Edison Electric 2

Property of: Sam Sultan © Page 18 of 35


Aggregating and Grouping

Example 2 – Improving on the above

However, perhaps I should have also added the amount paid and the transaction pay_date, to truly
confirm that I have paid my vendor twice.

SELECT vendor, description, amount, pay_date, COUNT(*)


FROM payment
GROUP BY vendor, description, amount, pay_date /* or whatever columns necessary */
HAVING count(*) > 1 /* to ensure uniqueness on */

vendor description amount pay_date count(*)


0 rows returned

Example 3 – I believe that if the student fname is the same and sex is the same, it is a duplicate record.
I need to find out whether such a combination exists…

SELECT fname, sex, COUNT(fname)


FROM student
GROUP BY fname, sex
HAVING count(fname) > 1

fname sex count(fname)


David M 2

(Advanced Topics)

Who are those two students (from above)? If I need to know the rest of the data, I will need to
enclose the above query as a subquery (we will cover in future session) as follows:

SELECT * /* or whatever columns you need */


FROM student st JOIN
( SELECT fname, sex, COUNT(fname) /* inline view subquery */
FROM student
GROUP BY fname, sex
HAVING count(fname) > 1 ) subq
ON st.fname = subq.fname /* joining to the sub-query */
AND st.sex = subq.sex

student_id lname fname sex ssn


4 Smith David M 000-01-0004
23 Chan David M 000-01-0023

Property of: Sam Sultan © Page 19 of 35


Aggregating and Grouping

The CASE Expression

CASE expression is often overlooked but can be extremely useful to change very complex query
requirements into simpler, and sometimes more efficient SQL statements.

The CASE expression enables many forms of conditional processing to be placed into a SQL statement.
By using CASE, more logic can be placed into SQL statements instead of being expressed in a host
language such as Java, Python, PHP, etc. In addition, the CASE expression can be used for data mining
classification (or segmentation) use cases.

The CASE expression can be syntactically written in one of two styles:

STYLE 1: Simple Case - CASE <column or expression> WHEN …

SELECT fname, lname,


CASE sex
WHEN 'M' THEN 'Male'
WHEN 'F' THEN 'Female'
ELSE 'Unknown' --optional ELSE
END as gender
FROM student

fname lname gender


Barbara Burns Female
Vincent Cambria Male
Duncan Davidson Male
… … …

STYLE 2: Searched Case - CASE WHEN <column or expression> …

SELECT fname, lname, description,


CASE
WHEN sex='M' and description like '%SQL%' THEN 'SQL/Male'
WHEN sex='F' and description like '%SQL%' THEN 'SQL/Female'
ELSE '--'
END as "skills/gender"
FROM student s, class c, course co
WHERE ssn=stu_ssn
AND c.course_id= co.course_id

fname lname description skills/gender


Vincent Cambria Web Page Development with HTML --
Vincent Cambria SQL Programming Language SQL/Male
Cynthia Owens JavaScript --
Cynthia Owens SQL Programming Language SQL/Female
Property of: Sam Sultan © Page 20 of 35
Aggregating and Grouping

Another CASE Example

SELECT c.course_id ,
CASE s.sex
WHEN 'M' THEN concat(s.fname, s.lname)
ELSE null -- ELSE null is not needed
END as "Male" , -- but trying to make a point
CASE s.sex
WHEN 'F' THEN concat(s.fname, s.lname)
ELSE null
END as "Female" ,
sex
FROM class c, student s
WHERE c.stu_ssn = s.ssn

Course_id Male Female Sex


X52-9759 BarbaraBurns F
X52-9740 BarbaraBurns F
X52-9272 VincentCambria M
X52-9759 EugeneThomas M
X52-9272 CynthiaOwens F

Maybe separate the debits from the credits in a financial report.

Nested CASE Example

SELECT course_id, description, price,


CASE
WHEN price < 1000 THEN
case
when description like '%Dev%' then 'cheap development course'
when description like '%Java%' then 'cheap Java course'
else 'cheap course'
end
WHEN price < 2000 THEN
case
when description like '%Dev%' then 'moderately priced development course'
when description like '%Java%' then 'moderately priced Java course'
else 'moderately priced course'
end
ELSE 'expensive'
END as "case results"
FROM course
Property of: Sam Sultan © Page 21 of 35
Aggregating and Grouping

Using Aggregate Function & Group By to Pivot Rows into Columns

I am often asked how I can take values from multiple rows for a single entity, and pivot them out
into a single row with multiple columns.

The answer is there is no easy way to do this is plain SQL. You can use procedural SQL such as
PL/SQL or transact/SQL, or a programming language such as Python, Java, etc. to accomplish this task.
In addition, you can also use a BI tool such as Business Object, Crystal Report, Tableau, etc.
All those tools offer the ability to pivot your data, thereby turning multiple rows into multiple columns.

Solution:

If however, your data has a finite number of distinct values, and each value is known ahead of time,
then there are a few techniques that can be used to accomplish this pivoting.

One technique is to…

1. Use the CASE expression (or MySql IF, or Oracle DECODE ) to spread a single column into
multiple columns output.
2. Use the GROUP BY clause, to collapse the main column(s) values into a single row.
3. Use an aggregate function to collapse the remaining columns into a single row.

Let’s start by getting all students and their classes

SELECT fname, lname, course_id


FROM student join class
ON ssn=stu_ssn I want to get a single
line of output for
fname lname course_id each student,
Barbara Burns X52-9759 and many courses
Barbara Burns X52-9740 across the same line
Vincent Cambria X52-9272 horizontally
Vincent Cambria X52-9740
Duncan Davidson X52-9759
Duncan Davidson X52-9740
Duncan Davidson X52-9755
David Smith X52-9272
David Smith X52-9740
Eugene Thomas X52-9759
Eugene Thomas X52-9740
Cynthia Owens X52-9272
Cynthia Owens X52-9740
Cynthia Owens X52-9755
… … …
50 rows returned

Property of: Sam Sultan © Page 22 of 35


Aggregating and Grouping

1. Now let’s use the CASE expression, or (MySql) IF or (Oracle) DECODE to spread the course_id
into multiple columns
SELECT fname, lname, CASE course_id when 'X52-9759' then course_id END as "XML",
CASE course_id when 'X52-9740' then course_id END as "HTML",
CASE course_id when 'X52-9272' then course_id END as "SQL" -- etc.
FROM student join class
ON ssn=stu_ssn
Or for Mysql:
SELECT fname, lname, IF(course_id='X52-9759', course_id, null) as "XML", -- etc.
Or for Oracle:
SELECT fname, lname, DECODE(course_id, 'X52-9759', course_id) as "XML", -- etc.

fname lname XML HTML SQL


Barbar Burns X52-9759
a
Barbar Burns X52-9740
a
Vincent Cambria X52-9272
Vincent Cambria X52-9740
Duncan Davidson X52-9759
Duncan Davidson X52-9740
… etc … … … …
50 rows returned

2. Now let’s do a GROUP BY on fname and lname to collapse the student name into a single row,
3. and at the same time we have to aggregate all remaining columns as required.
SELECT fname, lname, MAX( CASE course_id when 'X52-9759' then 'X' END) as "XML",
MAX( CASE course_id when 'X52-9740' then 'X' END) as "HTML",
MAX( CASE course_id when 'X52-9272' then 'X' END) as "SQL" -- etc.
FROM student join class
ON ssn=stu_ssn
GROUP BY fname, lname
Or for Mysql:
SELECT fname, lname, MAX( IF(course_id='X52-9759', 'X', null) ) as "XML", -- etc.
Or for Oracle:
SELECT fname, lname, MAX( DECODE(course_id, 'X52-9759', 'X') ) as "XML", -- etc.

fname lname XML HTML SQL


Barbara Burns X X
Vincent Cambria X X
Duncan Davidson X X
… etc … … …
22 rows returned

Property of: Sam Sultan © Page 23 of 35


Aggregating and Grouping

Please Note: In the final query, I also optionally replaced the course_id with a simple checkmark ‘X’.
Another example…

Total amount paid for each vendor horizontally by month rather than vertically.

for Oracle, use:


SELECT vendor, pay_date, MONTH(pay_date) as mth, amount to_char(pay_date,'mm')
FROM payment;
vendor pay_date mth amount
Con Edison 2017-01-31 1 125.15
Verizon 2017-02-28 2 54.79
Allstate 2017-03-01 3 1240.50
Verizon 2017-03-02 3 49.95
Verizon 2017-03-03 3 39.95
Con Edison 2017-03-04 3 59.63
Department of Water 2017-03-05 3 155.43
Allstate 2017-03-06 3 1550.25

SELECT vendor,
CASE month(pay_date) when '01' then amount END as "January",
CASE month(pay_date) when '02' then amount END as "February",
CASE month(pay_date) when '03' then amount END as "March"
FROM payment;
vendor January February March
Con Edison 125.15
Verizon 54.79
Allstate 1240.50
Verizon 49.95
Verizon 39.95
Con Edison 59.63
Department of Water 155.43
Allstate 1550.25

SELECT vendor,
SUM( CASE month(pay_date) when '01' then amount END) as "January",
SUM( CASE month(pay_date) when '02' then amount END) as "February",
SUM( CASE month(pay_date) when '03' then amount END) as "March"
from payment
GROUP BY vendor;
vendor January February March
Allstate 2790.75
Con Edison 125.15 59.63
Department of Water 155.43
Verizon 54.79 89.90

Property of: Sam Sultan © Page 24 of 35


Aggregating and Grouping

PS. The above is a great approach to classify data for analytics and data mining.
The Pivot Command (Advanced Topic, Oracle only, self-Study)

Oracle introduced the PIVOT command in version 11g. The pivot command allows you to easily pivot
a typical row oriented output into a column orientation. You can use this instead of the previous page.

Redoing the above query…


SELECT *
FROM
(
SELECT fname, lname, course_id -- the original query
FROM student JOIN class ON ssn=stu_ssn -- becomes a sub-query
)
PIVOT
(
MAX(course_id) -- must be aggregate function
FOR course_id IN ( 'X52-9759', 'X52-9740', 'X52-9272')
)
ORDER BY lname

fname lname 'X52-9759' 'X52-9740' 'X52-9772'


Barbara Burns X52-9759 X52-9740
Vincent Cambria X52-9740 X52-9772
Duncan Davidson X52-9759 X52-9740
David Smith X52-9740 X52-9772
… … … …
22 rows returned

Substituting with X….


SELECT *
FROM
(
SELECT fname, lname, course_id -- the original query
FROM student JOIN class ON ssn=stu_ssn -- becomes a sub-query
)
PIVOT
(
MAX( 'X' )
FOR course_id IN ( 'X52-9759' as XML, 'X52-9740' as HTML, 'X52-9272' as SQL)
)
ORDER BY lname
fname lname XML HTML SQL
Barbara Burns X X
Vincent Cambria X X
Duncan Davidson X X
David Smith X X

Property of: Sam Sultan © Page 25 of 35


Aggregating and Grouping

… … … …
22 rows returned

Windowing Functions (Advanced Topic)

Windowing functions allow you to perform certain aggregation and grouping “OVER( )” your data
without collapsing the more granular base rows.

As you know, when you use aggregate functions or use the group by, the resulting number of rows
are always collapsed to either one row or one row for each unique value of the “group by” expression.

SELECT vendor, COUNT(*), SUM(amount) as tot_by_vendor


FROM payment
GROUP BY vendor with rollup -- Oracle rollup(vendor)
vendor COUNT(*) tot_by_vendor
Allstate 2 2790.75 Multiple rows
Con Edison 2 184.78 have been
Department of Water 1 155.43 collapsed
Verizon 3 144.69
8 3275.65
5 rows returned (0.323 millisec)

With windowing functions the number of resulting rows is not collapsed. You will get a
resulting row for every row in your original source data. Windowing functions are executed after
the query and aggregation/grouping is completed, but before the final ORDER BY (if requested).

The syntax for windowing functions is:


FuncName(argument) OVER( [PARTITION BY part_clause] [ORDER BY order_clause] )

Where:
argument – The name of a column(s) or other argument(s) that the function takes
part_clause – The name of a column(s) or clause to partition the rows by (i.e. group by )
If a partition clause is not provided, all rows are treated as a single group.
order_clause - The name of a column(s) to order the rows (or siblings within a partition) by.
You can even specify a starting and ending range of rows

SELECT vendor, amount, SUM(amount) OVER ( ) as "Total"


FROM payment
vendor amount Total
Allstate 1240.5 3275.65
Allstate 1550.25 3275.65
Con Edison 59.63 3275.65 Notice…
Con Edison 125.15 3275.65 the number
of rows was
Department of Water 155.43 3275.65
not collapsed
Verizon 39.95 3275.65
Verizon 54.79 3275.65
Property of: Sam Sultan © Page 26 of 35
Aggregating and Grouping

Verizon 49.95 3275.65


8 rows returned

Property of: Sam Sultan © Page 27 of 35


Aggregating and Grouping

In the above, I am now using the SUM( ) function with the OVER( ) windowing command.
The OVER( ) is a windowing function over a “window” of rows. In the above case the “window”
was over all retrieved rows since I did not specify any partitioning with the OVER( ) function..

Partitioning/Grouping:

To perform “windowing” functions over a group, we must use the PARTITION BY clause.
The PARTITION BY clause acts as the traditional GROUP BY clause

SELECT vendor, amount, SUM(amount) OVER (PARTITION BY vendor) as "tot_by_vendor"


FROM payment

VENDOR AMOUNT tot_by_vendor


Allstate 1240.5 2790.75
Allstate 1550.25 2790.75
Con Edison 59.63 184.78
Con Edison 125.15 184.78 SUM( ) is
Department of Water 155.43 155.43
partitioned
Verizon 39.95 144.69
Verizon 54.79 144.69
Verizon 49.95 144.69
8 rows returned (2.526 millisec)

Partitioning/grouping by vendor and description:

SELECT vendor, description, amount,


SUM(amount) OVER (PARTITION BY vendor, description) as "tot_by_descption",
SUM(amount) OVER (PARTITION BY vendor) as "tot_by_vendor",
COUNT(amount) OVER (PARTITION BY vendor) as "count_by_vendor"
FROM payment

tot_by_descptio
VENDOR DESCRIPTION AMOUNT tot_by_vendor count_by_vendor
n
Allstate Car Insurance 1550.25 1550.25 2790.75 2
Allstate Home Insurance 1240.5 1240.5 2790.75 2
Con Edison Electric 125.15 184.78 184.78 2
Con Edison Electric 59.63 184.78 184.78 2
Department of Water Water and Sewers 155.43 155.43 155.43 1
Verizon Cell Phone 49.95 49.95 144.69 3
Verizon Internet Service 39.95 39.95 144.69 3
Verizon Home Phone 54.79 54.79 144.69 3
8 rows returned (2.745 millisec)

* Please Note:When using windowing functions, you are no longer restricted by the “Golden Rule”.
This is because windowing functions do not collapse the source rows.

Property of: Sam Sultan © Page 28 of 35


Aggregating and Grouping

Windowing Analytic Functions (Advanced Topic)

In addition to the aggregate functions (SUM, COUNT, AVG, MIN, MAX) that can now be used
with or without windowing functions, below is a list of analytic functions that also use windowing …

 MEDIAN( ) – Obtain the median of a set of numbers (example below)


 ROW_NUMBER( ) – Assigns a row number for each row. (see next page)
 RANK( ) – Assigns a rank value within a set or rows. (see next pages)
 DENSE_RANK( ) – Assigns a dense rank value within a set or rows. (see next pages)
 PERCENT_RANK( ) – Assigns a percentage rank within a set or rows. (see next pages)
 LAG(col) – Provides access to data from the previous row.
 LEAD(col) – Provides access to data from the next row.
 FIRST_VALUE(col) – Returns the first value within a set or rows.
 LAST_VALUE(col) – Returns the last value within a set or rows
 NTH_VALUE(col, n) – Returns the nth value within a set or rows.
 NTILE(n) – Divides the number of rows by n, and assigns that number

In the next few pages we will cover the median( ), row_number( ), the ranking, and lag( ).
Try practicing with the other analytics functions on your own.

The MEDIAN( ) Function

The MEDIAN( ) function will sort the data based of the column requested in the median function,
and will return the value of the middle row if the number of rows is odd, or the average of the 2
middle rows if the number of rows is even.

SELECT MEDIAN(amount) OVER ( ) AS median /* MySQL */


FROM payment
LIMIT 1;

SELECT MEDIAN(amount) AS median /* Oracle */


FROM payment;

median
92.39
1 rows returned (0.333
millisec)

In the above case, there were 8 total rows in the payment table, the amounts of the 2 middle
rows were 59.63 and 125.15.

Property of: Sam Sultan © Page 29 of 35


Aggregating and Grouping

The ROW_NUMBER( ) Function

The row_number( ) function assigns a sequential row number to each row of your query.

SELECT ROW_NUMBER( ) OVER( ) as "row_num", payment.*


FROM payment

row_nu
payment_num vendor amount description pay_date
m
1 3 Allstate 1240.50 Home Insurance 2017-03-01
2 8 Allstate 1550.25 Car Insurance 2017-03-06
3 1 Con Edison 125.15 Electric 2017-01-31
4 6 Con Edison 59.63 Electric 2017-03-04
5 4 Verizon 49.95 Cell Phone 2017-03-02
6 2 Verizon 54.79 Home Phone 2017-02-28
7 7 Department of Water 155.43 Water and Sewers 2017-03-05
8 5 Verizon 39.95 Internet Service 2017-03-03
8 rows returned (0.352 millisec)

But what order of rows did this row_number( ) function use? Unknown, perhaps FIFO.

In the below case, I am first sorting by vendor, and then assigning the row number.

SELECT ROW_NUMBER( ) OVER(ORDER BY vendor) as "row_num", payment.*


FROM payment
row_nu
vendor amount description pay_date
m
1 Allstate 1240.50 Home Insurance 2017-03-01
2 Allstate 1550.25 Car Insurance 2017-03-06
3 Con Edison 125.15 Electric 2017-01-31
4 Con Edison 59.63 Electric 2017-03-04
5 Department of Water 155.43 Water and Sewers 2017-03-05
6 Verizon 49.95 Cell Phone 2017-03-02
7 Verizon 54.79 Home Phone 2017-02-28
8 Verizon 39.95 Internet Service 2017-03-03

In the below case, I am first aggregating/grouping, and then assigning the row number.

SELECT ROW_NUMBER( ) OVER(order by vendor) as "row_num", vendor, SUM(amount)


FROM payment
GROUP BY vendor
row_nu
vendor sum(amount)
m
1 Allstate 2790.75
2 Con Edison 184.78
3 Department of Water 155.43
4 Verizon 144.69

Property of: Sam Sultan © Page 30 of 35


Aggregating and Grouping

Ranking Functions

The ranking analytic functions allow you to rank the resulting rows by a certain criteria or value.
When using ranking functions, rows with equal values will receive the same rank.

The following functions are available:

 RANK( ) OVER( [PARTITION BY columns] ORDER BY columns )


 DENSE_RANK( ) OVER( [PARTITION BY columns] ORDER BY columns )
 PERCENT_RANK( ) OVER( [PARTITION BY columns] ORDER BY columns )

Explanation: RANK( ) skips numbers when the rank value is the same for multiple rows.
DENSE_RANK( ) does not skip numbers.
PERCENT_RANK( ) assigns a rank based on a percent in the range of 0 to 1

Example 1:
SELECT course.*, RANK( ) OVER( ORDER BY price) as "rank"
FROM course
course_id description price rank
X52-9272 SQL Programming Language 540 1
X52-9267 Object Oriented Analysis and Design 995 2
X52-9755 JavaScript 1095 3
X52-9562 Java Web Services 1095 3
Same
X52-9759 XML for Web Development 1095 3
ranking
X52-9238 Introduction to Java 1095 3
X52-9740 Web Page Development with HTML 1095 3
X52-9742 Intensive Web Development 3995 8 Skip values
Example 2:
SELECT course.*,
RANK( ) OVER(order by price desc) as "rank",
DENSE_RANK( ) OVER(order by price desc) as "dense_rank",
PERCENT_RANK( ) OVER(order by price desc) as "pct_rank"
FROM course
ORDER BY rank, dense_rank;
course_id description price rank dense_rank pct_rank
X52-9742 Intensive Web Development 3995 1 1 0.0000000000
X52-9755 JavaScript 1095 2 2 0.1428571429
X52-9562 Java Web Services 1095 2 2 0.1428571429
X52-9238 Introduction to Java 1095 2 2 0.1428571429
X52-9759 XML for Web Development 1095 2 2 0.1428571429
X52-9740 Web Page Development with HTML 1095 2 2 0.1428571429
X52-9267 Object Oriented Analysis and Design 995 7 3 0.8571428571
X52-9272 SQL Programming Language 540 8 4 1.0000000000

Property of: Sam Sultan © Page 31 of 35


Aggregating and Grouping

Example 3:

Find the latest payment for each vendor.


Here we will use ranking functions to accomplish the task.
We will also need to nest the select query (see future discussion on nested queries).

Step 1. Rank the data by group (i.e. partition) by vendor, and sort by date descending.

SELECT vendor, pay_date, description, amount,


RANK( ) OVER (partition by vendor order by pay_date desc) as latest_pay
FROM payment

vendor pay_date description amount latest_pay


Allstate 2017-03-06 Car Insurance 1550.25 1
Allstate 2017-03-01 Home Insurance 1240.5 2
Con Edison 2017-03-04 Electric 59.63 1
Con Edison 2017-01-31 Electric 125.15 2
Department of Water 2017-03-05 Water and Sewers 155.43 1
Verizon 2017-03-03 Internet Service 39.95 1
Verizon 2017-03-02 Cell Phone 49.95 2
Verizon 2017-02-28 Home Phone 54.79 3
8 rows returned (0.446 millisec)

Step 2. And now we will select only the latest payments for each vendor.
In this case, we need to select all the rows with lastest_pay = 1

SELECT *
FROM
( SELECT vendor, pay_date, description, amount,
RANK( ) OVER (partition by vendor order by pay_date desc) as latest_pay
FROM payment ) latest
WHERE latest_pay = 1

vendor pay_date description amount latest_pay


Allstate 2017-03-06 Car Insurance 1550.25 1
Con Edison 2017-03-04 Electric 59.63 1
Department of Water 2017-03-05 Water and Sewers 155.43 1
Verizon 2017-03-03 Internet Service 39.95 1
4 rows returned (0.477 millisec)

Property of: Sam Sultan © Page 32 of 35


Aggregating and Grouping

LAG( ) and LEAD( ) Functions

Sometimes we might be interested in accessing the data of the previous row when we are already
positioned at the next row. For that we will use the LAG( ) function. We can also do the same for the
subsequent row using the LEAD( ) function.

Syntax:
LAG(column, n, default_value) //same for LEAD( )
Where:
column - is the column that you want to access from the previous
row
n - is the number of previous rows. Default = 1 previous row
default_value - if null, use this default value
Examples:
SELECT pay.*, LAG(pay_date) OVER( ORDER BY pay_date) AS "prev_pay"
FROM payment pay;

PAYMENT_NUM VENDOR AMOUNT DESCRIPTION PAY_DATE PREV_PAY


1 Con Edison 125.15 Electric 31-JAN-17
2 Verizon 54.79 Home Phone 28-FEB-17 31-JAN-17
3 Allstate 1240.5 Home Insurance 01-MAR-17 28-FEB-17
4 Verizon 49.95 Cell Phone 02-MAR-17 01-MAR-17
5 Verizon 39.95 Internet Service 03-MAR-17 02-MAR-17
6 Con Edison 59.63 Electric 04-MAR-17 03-MAR-17
7 Department of Water 155.43 Water and Sewers 05-MAR-17 04-MAR-17
8 Allstate 1550.25 Car Insurance 06-MAR-17 05-MAR-17
8 rows returned (0.71 millisec)

Notice that row number 2 was able to access the column value from row number 1

Adding the PARTITION clause to only get the last payment for each vendor.

SELECT pay.*,
LAG(pay_date) OVER( PARTITION BY vendor ORDER BY pay_date) "prev_pay"
FROM payment pay;

PAYMENT_NUM VENDOR AMOUNT DESCRIPTION PAY_DATE PREV_PAY


3 Allstate 1240.5 Home Insurance 01-MAR-17
8 Allstate 1550.25 Car Insurance 06-MAR-17 01-MAR-17
1 Con Edison 125.15 Electric 31-JAN-17
6 Con Edison 59.63 Electric 04-MAR-17 31-JAN-17
7 Department of Water 155.43 Water and Sewers 05-MAR-17
2 Verizon 54.79 Home Phone 28-FEB-17
4 Verizon 49.95 Cell Phone 02-MAR-17 28-FEB-17
5 Verizon 39.95 Internet Service 03-MAR-17 02-MAR-17

Property of: Sam Sultan © Page 33 of 35


Aggregating and Grouping

8 rows returned (0.538 millisec)


Limiting the Number of Rows in the Result (Seen previously)

Sometimes you want to limit the output to a certain number of rows. Perhaps you want to select the
top row of the query results, or the first 10 rows only, or perhaps only rows number 10 to 20, to create
a sample dataset for analysis.

Both MySql and Oracle provide a way to limit the number of returned rows. However the syntax is not
the same and are not SQL standard.

MySql & SQLite:


The syntax is: LIMIT rows # Limit the results to first number of rows
LIMIT skip, rows # Skip and limit the results to next number of rows
Examples:
SELECT student_id, fname, lname, ssn
FROM student
WHERE sex = 'F'
LIMIT 5 -- MySQL - first 5 rows only

SELECT student_id, fname, lname ssn


FROM student
ORDER BY student_id
LIMIT 10, 3 -- MySQL - start at 11 for 3 rows

Oracle 12+:
The syntax is: FETCH FIRST x ROWS ONLY
OFFSET n ROWS FETCH NEXT x ROWS ONLY
Examples:
SELECT student_id, fname, lname, ssn
FROM student
WHERE sex = 'F'
FETCH FIRST 5 ROWS ONLY -- Oracle - first 5 rows only

SELECT student_id, fname, lname ssn


FROM student
ORDER BY student_id
OFFSET 10 ROWS FETCH NEXT 3 ROWS ONLY -- Oracle - start at 11 for 3 rows

STUDENT_ID FNAME LNAME SSN


11 Wayne Tobias 000-01-0011
12 Joseph Race 000-01-0012
13 Colette Nelson 000-01-0013
3 rows returned (5.839 millisec)
Property of: Sam Sultan © Page 34 of 35
Aggregating and Grouping

Oracle Prior to v12: (Advanced Class Only)

Prior to Oracle version 12c, there was not an easy way to limit the number of returned rows. However…

1) You can use the ROWNUM keyword as an additional filter in the WHERE clause.
To obtain the first x number of rows, simply use ROWNUM <= x. (Do not use = Except for row 1)

First x Rows:

SELECT student_id, fname, lname, class_id, ROWNUM


FROM student join class ON ssn = stu_ssn Try adding:
WHERE sex = 'F' ORDER BY fname
AND rownum <= 10 -- top 10 rows Does it work?

Unfortunately ROWNUM assigns row numbers prior to performing any GROUP BY and/or
ORDER BY. As such, you cannot rely on ROWNUM if you are planning on grouping the data or
sorting the data prior to selecting the top rows.
_________________________________________________________________

2) A better technique is to use the analytics function ROW_NUMBER( ). (See discussion later).
This function assigns a number to each row after completion of the query, but before the final sort.
Most often we do not use a final order by, since there is an ORDER BY clause in analytic functions.

Syntax: ROW_NUMBER( ) OVER( ORDER BY <column_name> [DESC] )

SELECT vendor, SUM(amount), ROW_NUMBER( ) OVER(order by SUM(amount)) AS num


FROM payment
GROUP BY vendor or you can code:
OVER(order by 3)
VENDOR SUM(amount) Num
Verizon 144.69 1
Department of Water 155.43 2
Con Edison 184.78 3
Allstate 2790.75 4

And now to select the top row (or top few rows), or to select a range of rows, we need a subquery
(also called nested query).

Range of Rows:

SELECT * -- Must use a subquery


FROM
( SELECT vendor, SUM(amount), ROW_NUMBER( ) OVER(order by SUM(amount)) AS num
FROM payment
GROUP BY vendor )
WHERE num >= 2 and num <= 3 -- select from row 2 to 3

Property of: Sam Sultan © Page 35 of 35

You might also like