Aggregation
Aggregation
I. Aggregate Functions
Most DBMS provide functions to aggregate numerical data, such as summing a numeric column or
averaging a numeric column. In addition, aggregate functions include counting the number of
occurrences of a given column or rows in a table, etc…
Aggregate functions work on multiple rows… and return a single row based on the aggregation
performed on those rows. You often see aggregate functions used in conjunction with the GROUP BY
clause (discussed later), although this is not a requirement.
Unlike the numeric and string functions, aggregate functions are mostly standard.
Function Description
SUM(column) Total the values for all the instances for a numeric column.
COUNT(column) Count the number of occurrences for all instances for a
COUNT(*) numeric or alphabetic column.
COUNT(*) – count all instances of rows
AVG(column) Compute the average of all instances of a numeric column.
MIN(column) Return the smallest value of all instances for a numeric or
alphabetic column
MAX(column) Return the largest value of all instances for a numeric or
alphabetic column
MEDIAN(column) Return the median (sorted mid-point value) of a column
- If number of rows is odd, you get the value for the row.
- If number of rows is even, you get the average of the 2 rows.
STDDEV(column) Return the standard deviation of a column.
GROUP_CONCAT(…) Mysql and SQLite
LISTAGG(…) Oracle
STRING_AGG(…) PostgreSQL
Concatenate all the values of a single (or multiple) columns.
Be Aware…
Question: I want to display information about the cheapest course in the course table:
SELECT price, description FROM course /* To get the lowest priced course */
ORDER BY price /* sort it by price */
/* look at the first row */
*** Once a single column uses an aggregate function, all columns must use aggregate functions,
unless those columns are included in a GROUP BY clause. (covered later)
NULL values are not included for aggregation purposes. If a column value is NULL it is not
considered for the purpose of counting, summing, or averaging. NULL values are also not considered
for either the MIN or MAX functions.
If a numeric column contains a zero, then the column will of course be included for summing,
counting, averaging, and min/max functions as any other number.
If a text column contains an empty string '', then the column will be included for counting, and
min/max functions as any other text values.
Some databases (example Oracle) treat an empty string '' as a NULL. Other databases (including
MySql) do not.
GROUP_CONCAT(fname, lname ORDER BY lname DESC, fname SEPARATOR ', ') Total
EileenWillis, LillianVasquez, KathyTok, WayneTobias, EugeneThomas, EdwardTeagan, 25
SamSultan, PatrickStack, JohnSoley, DavidSmith, NatashaRyan, JosephRace, CynthiaOwens,
ColetteNelson, RickMyers, janetmiller, AnyaMilgrom, PhyllisJames, RobertGrace, DuncanDavidson,
DavidChan, VincentCambria, BarbaraBurns, AngelBrinson, MariaAustin
1 rows returned
Oracle:
LISTAGG( column [,‘separator’] ) [WITHIN GROUP (ORDER BY columns)]
[ ] denotes optional
Examples:
SELECT LISTAGG(lname, ', ') -- separated using ‘, ’
FROM student;
SELECT LISTAGG(lname, ', ') WITHIN GROUP (ORDER BY lname)
FROM student;
SELECT LISTAGG(fname || lname, ', ') WITHIN GROUP (ORDER BY lname DESC, fname),
COUNT(lname) as total
FROM student --same output as above
The GROUP BY clause allows you to create one or more groupings of the selected data. The
GROUP BY collapses multiple individual rows sharing the same value(s) into a single grouped row.
The GROUP BY clause instructs the DBMS to sort the selected rows based on the value of the column
specified in the GROUP BY, and to group that data and return a single row for each unique value for
that column (or combination columns) being grouped.
As you can see, the SELECT statement automatically returned a single row for each unique vendor
found in the table.
SELECT vendor, MAX(pay_date) FROM payment -- Why do I need MIN( ) or MAX( )
GROUP BY vendor -- for pay_date? Try it without it.
-- Does it work in MySql? Oracle?
Vendor MAX(pay_date)
Allstate 2017-03-06
Con Edison 2017-03-04
Department of Water 2017-03-05
Verizon 2017-03-03
** Most analytic queries perform the above by aggregating one or more numeric columns while
doing a GROUP BY on one or more descriptive columns. This is very useful.
The database server returned a single row for every unique combination of vendor and description.
*** When using aggregate functions such as SUM( ), COUNT( ), AVG( ), MIN( ) and MAX( ),
or using the GROUP BY clause,
All columns must either use aggregate functions, or must be added to the GROUP BY.
Examples:
Which vendor or description would I get? Does the query even make any sense?
Which description would I get? Does the query even make any sense?
The above will not work in Oracle, or PostgreSQL database servers. This is the standard.
The above will work in MySQL & SQLite, but not standard, and does it really make sense? Try it.
Most analytic queries rely on the use of AGGREGATE functions in combination with GROUP BY.
You will most often aggregate one or more numerical values, while at the same time group by one or
more descriptive attributes.
Drill Downs
In addition, another common analytical query against a database is the concept of a “drill down”.
In a drill down, I have a previous query that gave me a higher-level perspective analysis of the data, and
now I want to focus on a subset of that original query, and drill down to more detail.
Image if the original query was delivered to you as a pie chart, and now you want to drill down (perhaps
by double clicking) to a lower level detail for a particular slice of the pie.
A drill down for example #3 above by course description, but for female students only.
I add course description to the select line, and to the group by
I add a filter for female students only
As mentioned previously, you can use the MySql GROUP_CONCAT( ) or Oracle LISTAGG( ) or
PostgreSQL STRING_AGG( ) as aggregate functions for descriptive data. As you recall those functions
concatenate the values of all rows for a particular column. This is a great way to display all descriptive
values for all the rows that you are grouping by.
Examples…
Oracle:
SELECT vendor, LISTAGG(description, ' - ') WITHIN GROUP (ORDER BY description),
SUM(amount) as "total"
FROM payment
GROUP BY vendor
PostgreSQL:
SELECT vendor, STRING_AGG(description, ' - ' ORDER BY description) AS "multi-row",
SUM(amount) as "total"
FROM payment
GROUP BY vendor
What if you want to filter the final result after the aggregation/grouping was performed?
In addition to allowing you to filter the rows that will be used as a basis for the SELECT process,
SQL also allows you to filter the data after the aggregation/grouping process has completed. This
allows you filter data that did not meet thresholds of your aggregated values.
The WHERE clause that we have already seen does not work for filtering aggregated/grouped rows.
The WHERE clause filters rows that would be selected before the aggregation/grouping is performed.
So what do we use instead of WHERE? SQL provides yet another clause for the purpose of filtering
the resulting data after the aggregation/grouping is performed. The name of this new filtering clause is
the HAVING clause. The HAVING clause is very similar to the WHERE clause. In fact, all types of
WHERE expressions you learned about thus far can also be used with HAVING.
The only difference between the WHERE and the HAVING is:
The WHERE clause filters the raw data before the aggregation/grouping process.
The HAVING clause filters rows after the aggregation/grouping process has competed.
The above statements are important distinctions between the two clauses. That is rows that are
eliminated by the WHERE clause will not be included in the aggregation or grouping process. This
could change the values of the data being aggregated, which in turn could affect which rows meet or
do not meet the criteria for the HAVING clause.
HAVING supports all the WHERE comparison and logical operators we have seen. All the
techniques that you were able to use with the WHERE clause, including comparison operations
(=, >, <, BETWEEN, LIKE, etc.), and logical operations (AND, OR, etc.) can also be used in the
HAVING clause. The syntax is identical expect the keyword is different.
Please Note:
Efficiency consideration:
If possible, any data value known ahead of time, should be filtered using the WHERE clause,
since it is performed prior to the burden of aggregation and grouping. HAVING should only be
used for filtering aggregated data.
Here you see that the last row (vendor “Verizon”) has been eliminated from the final result set.
As you recall from previous discussion, the name of the columns returned by this query are “vendor”,
“sum(amount)”, and “count(amount)”. We are asking SQL to filter using the second column – that is
the column called “sum(amount)” – and only return rows where that column value is more than $150.
Allstate 2790.75 2
Con Edison 184.78 2
Department of Water 155.43 1
So is there ever a need to use both the WHERE clause and the HAVING clause in the same SQL
statement? The answer is actually yes.
Suppose that you want to filter all the payments you made in this month to any vendor that received 2 or
more payments from you. To do this, you must first filter all the payments that you made in this month.
To do this you must use the WHERE clause. Once those payments have been filtered, you use a
GROUP BY mechanism (as discussed above) to group the number of payments you made to each
vendor. And as a final step, you must filter the groups by using the HAVING clause, and eliminate any
vendor where number of payments is less than 2.
Vendor Count(vendor)
Allstate 2
Verizon 3
Vendor Total
AllstateCar Insurance 1550.25
Con EdisonElectric 184.78
Department of WaterWater and Sewers 155.43
Here you see that only 1 payment to “Allstate” was included in the grouping mechanism the other payment
was eliminated by the WHERE clause. In addition, all payments for “Verizon” were eliminated by the
HAVING clause since they did not meet the HAVING conditions.
We often use the GROUP BY clause in conjunction with the SUM( ), AVG( ) and COUNT( ) functions
to obtain totals, averages and counts of rows in the database table.
What if I want to obtain a grand total for the entire query? Or, what if I want to obtain intermediate
totals for example totals for Allstate payments above. The answer is I use the ROLLUP option of the
GROUP BY clause. The ROLLUP option is not standard.
For Oracle, the above can also be written using GROUPING SETS
GROUP BY GROUPING SETS( (vendor, description), (vendor), ( ) )
What order do the returned rows from a SELECT query appear in? The answer is they are
returned in no particular order. In reality however, they are returned in the most optimal way that
the database can return them without processing a sort.
Most optimal way? Sometimes this means in the same order as the rows were first placed in the
table. Sometimes however this may not even be the case as various inserts and deletes may have
created table segmentations, and the original order in no longer the most optimal retrieval path.
The bottom line is that you cannot and should not rely on the default order if you do not explicitly
request a sort. Relational database design theory states that the sequence of retrieved data cannot be
assumed to have significance if ordering was not explicitly specified.
The ORDER BY clause is used to explicitly sort the data retrieved using the SELECT statement. The
ORDER BY takes the name of one or more columns by which to sort the returned rows. The order
could be specified as either ascending, which is the default, or descending.
The ORDER BY clause sorts the final resulting rows, so if the SQL statement has a GROUP BY
clause the ORDER BY sorts the grouped rows, not the original rows selected before the grouping
process.
You can specify multiple columns for the ORDER BY clause. When multiple columns are specified,
the sort is performed on the first column. If multiple values of the first column are found to be equal
in the first column, the rows are sorted by the value of the second column, and so forth (if a third, etc.
column is specified).
If not specified, the sort is performed in ascending order. You can however change that default
order by specifying that you want to sort to be in descending order. To do so, simply add the
keyword DESC after the column name. You can also use the keyword ASC in the same way.
However, since ASC is the default, there is no need to include in the ORDER BY clause.
The ORDER BY clause allows you to specify the sort columns by column name, or by the column
number in the SELECT statement, or if a SELECT * is requested, then the order the columns appear
in the underlying table.
SELECT *
FROM student, class
WHERE ssn = stu_ssn
ORDER BY 2, 7 DESC /* sort by student.lastname, class.course_id */
Verizon 144.69 3
Allstate 2790.75 2
Con Edison 184.78 2
Department of Water 155.43 1
Allstate 2790.75 2
Department of Water 155.43 1
Con Edison 125.15 1
Please note: Even though the query had to sort the data in vendor order to perform the GROUP BY,
your request to ORDER BY will re-sort the final result in the order that you desire.
Allstate 2790.75 2
Department of Water 155.43 1
Please note:
The sequence of the SELECT clauses must always be in the sequence of the final example above
As you have seen thus far, using aggregation with the GROUP BY clause collapses the raw data,
and only gives you the summary rows. This is also true even if you use the ROLLUP feature.
What if on the other hand you would like to see the detail rows, as well as the summary rows?
The answer to that is that you need to perform multiple queries, and UNION the multiple results.
Example, I want to see the raw data from my payment table along with multiple level of summaries.
SELECT vendor, description, amount, null as "count" -- retrieve all detail rows
FROM payment
UNION ALL
SELECT vendor, concat(description,' Total >>'), SUM(amount), COUNT(*)
FROM payment
GROUP BY vendor, description -- summarize by vendor & desc
UNION ALL
SELECT concat(vendor,' Total >>'), null, SUM(amount), count(*)
FROM payment
GROUP BY vendor -- summarize by vendor
UNION ALL
SELECT '_Grand Total >>>', null, SUM(amount), COUNT(*) -- grand total
FROM payment
ORDER BY 1,2
I often need to inspect whether my table has duplicate records. If you have a unique id (such as a
primary key), the key will ensure that each record has a unique identifier. However, the key does not
ensure that your data values (or even your entire row) are not the same. So if I am not diligent about
ensuring that I do not add a student unless I first check if the student already exists in my table, then I
could have duplicate records.
So how do I find those duplicate records in my table? The easiest way is to select all of the columns
that you want to check possible duplicate values on, then count them and group them, and then filter
only the ones having a count greater than 1.
Example 1:
Supposing that I believe that I paid a particular vendor twice. For this situation, I believe that if the
vendor is the same and the description is the same, this could potentially be a duplicate payment.
To find out if I have such a condition, I would select vendor and description, I count, and group by
vendor and description, as follows:
However, perhaps I should have also added the amount paid and the transaction pay_date, to truly
confirm that I have paid my vendor twice.
Example 3 – I believe that if the student fname is the same and sex is the same, it is a duplicate record.
I need to find out whether such a combination exists…
(Advanced Topics)
Who are those two students (from above)? If I need to know the rest of the data, I will need to
enclose the above query as a subquery (we will cover in future session) as follows:
CASE expression is often overlooked but can be extremely useful to change very complex query
requirements into simpler, and sometimes more efficient SQL statements.
The CASE expression enables many forms of conditional processing to be placed into a SQL statement.
By using CASE, more logic can be placed into SQL statements instead of being expressed in a host
language such as Java, Python, PHP, etc. In addition, the CASE expression can be used for data mining
classification (or segmentation) use cases.
SELECT c.course_id ,
CASE s.sex
WHEN 'M' THEN concat(s.fname, s.lname)
ELSE null -- ELSE null is not needed
END as "Male" , -- but trying to make a point
CASE s.sex
WHEN 'F' THEN concat(s.fname, s.lname)
ELSE null
END as "Female" ,
sex
FROM class c, student s
WHERE c.stu_ssn = s.ssn
I am often asked how I can take values from multiple rows for a single entity, and pivot them out
into a single row with multiple columns.
The answer is there is no easy way to do this is plain SQL. You can use procedural SQL such as
PL/SQL or transact/SQL, or a programming language such as Python, Java, etc. to accomplish this task.
In addition, you can also use a BI tool such as Business Object, Crystal Report, Tableau, etc.
All those tools offer the ability to pivot your data, thereby turning multiple rows into multiple columns.
Solution:
If however, your data has a finite number of distinct values, and each value is known ahead of time,
then there are a few techniques that can be used to accomplish this pivoting.
1. Use the CASE expression (or MySql IF, or Oracle DECODE ) to spread a single column into
multiple columns output.
2. Use the GROUP BY clause, to collapse the main column(s) values into a single row.
3. Use an aggregate function to collapse the remaining columns into a single row.
1. Now let’s use the CASE expression, or (MySql) IF or (Oracle) DECODE to spread the course_id
into multiple columns
SELECT fname, lname, CASE course_id when 'X52-9759' then course_id END as "XML",
CASE course_id when 'X52-9740' then course_id END as "HTML",
CASE course_id when 'X52-9272' then course_id END as "SQL" -- etc.
FROM student join class
ON ssn=stu_ssn
Or for Mysql:
SELECT fname, lname, IF(course_id='X52-9759', course_id, null) as "XML", -- etc.
Or for Oracle:
SELECT fname, lname, DECODE(course_id, 'X52-9759', course_id) as "XML", -- etc.
2. Now let’s do a GROUP BY on fname and lname to collapse the student name into a single row,
3. and at the same time we have to aggregate all remaining columns as required.
SELECT fname, lname, MAX( CASE course_id when 'X52-9759' then 'X' END) as "XML",
MAX( CASE course_id when 'X52-9740' then 'X' END) as "HTML",
MAX( CASE course_id when 'X52-9272' then 'X' END) as "SQL" -- etc.
FROM student join class
ON ssn=stu_ssn
GROUP BY fname, lname
Or for Mysql:
SELECT fname, lname, MAX( IF(course_id='X52-9759', 'X', null) ) as "XML", -- etc.
Or for Oracle:
SELECT fname, lname, MAX( DECODE(course_id, 'X52-9759', 'X') ) as "XML", -- etc.
Please Note: In the final query, I also optionally replaced the course_id with a simple checkmark ‘X’.
Another example…
Total amount paid for each vendor horizontally by month rather than vertically.
SELECT vendor,
CASE month(pay_date) when '01' then amount END as "January",
CASE month(pay_date) when '02' then amount END as "February",
CASE month(pay_date) when '03' then amount END as "March"
FROM payment;
vendor January February March
Con Edison 125.15
Verizon 54.79
Allstate 1240.50
Verizon 49.95
Verizon 39.95
Con Edison 59.63
Department of Water 155.43
Allstate 1550.25
SELECT vendor,
SUM( CASE month(pay_date) when '01' then amount END) as "January",
SUM( CASE month(pay_date) when '02' then amount END) as "February",
SUM( CASE month(pay_date) when '03' then amount END) as "March"
from payment
GROUP BY vendor;
vendor January February March
Allstate 2790.75
Con Edison 125.15 59.63
Department of Water 155.43
Verizon 54.79 89.90
PS. The above is a great approach to classify data for analytics and data mining.
The Pivot Command (Advanced Topic, Oracle only, self-Study)
Oracle introduced the PIVOT command in version 11g. The pivot command allows you to easily pivot
a typical row oriented output into a column orientation. You can use this instead of the previous page.
… … … …
22 rows returned
Windowing functions allow you to perform certain aggregation and grouping “OVER( )” your data
without collapsing the more granular base rows.
As you know, when you use aggregate functions or use the group by, the resulting number of rows
are always collapsed to either one row or one row for each unique value of the “group by” expression.
With windowing functions the number of resulting rows is not collapsed. You will get a
resulting row for every row in your original source data. Windowing functions are executed after
the query and aggregation/grouping is completed, but before the final ORDER BY (if requested).
Where:
argument – The name of a column(s) or other argument(s) that the function takes
part_clause – The name of a column(s) or clause to partition the rows by (i.e. group by )
If a partition clause is not provided, all rows are treated as a single group.
order_clause - The name of a column(s) to order the rows (or siblings within a partition) by.
You can even specify a starting and ending range of rows
In the above, I am now using the SUM( ) function with the OVER( ) windowing command.
The OVER( ) is a windowing function over a “window” of rows. In the above case the “window”
was over all retrieved rows since I did not specify any partitioning with the OVER( ) function..
Partitioning/Grouping:
To perform “windowing” functions over a group, we must use the PARTITION BY clause.
The PARTITION BY clause acts as the traditional GROUP BY clause
tot_by_descptio
VENDOR DESCRIPTION AMOUNT tot_by_vendor count_by_vendor
n
Allstate Car Insurance 1550.25 1550.25 2790.75 2
Allstate Home Insurance 1240.5 1240.5 2790.75 2
Con Edison Electric 125.15 184.78 184.78 2
Con Edison Electric 59.63 184.78 184.78 2
Department of Water Water and Sewers 155.43 155.43 155.43 1
Verizon Cell Phone 49.95 49.95 144.69 3
Verizon Internet Service 39.95 39.95 144.69 3
Verizon Home Phone 54.79 54.79 144.69 3
8 rows returned (2.745 millisec)
* Please Note:When using windowing functions, you are no longer restricted by the “Golden Rule”.
This is because windowing functions do not collapse the source rows.
In addition to the aggregate functions (SUM, COUNT, AVG, MIN, MAX) that can now be used
with or without windowing functions, below is a list of analytic functions that also use windowing …
In the next few pages we will cover the median( ), row_number( ), the ranking, and lag( ).
Try practicing with the other analytics functions on your own.
The MEDIAN( ) function will sort the data based of the column requested in the median function,
and will return the value of the middle row if the number of rows is odd, or the average of the 2
middle rows if the number of rows is even.
median
92.39
1 rows returned (0.333
millisec)
In the above case, there were 8 total rows in the payment table, the amounts of the 2 middle
rows were 59.63 and 125.15.
The row_number( ) function assigns a sequential row number to each row of your query.
row_nu
payment_num vendor amount description pay_date
m
1 3 Allstate 1240.50 Home Insurance 2017-03-01
2 8 Allstate 1550.25 Car Insurance 2017-03-06
3 1 Con Edison 125.15 Electric 2017-01-31
4 6 Con Edison 59.63 Electric 2017-03-04
5 4 Verizon 49.95 Cell Phone 2017-03-02
6 2 Verizon 54.79 Home Phone 2017-02-28
7 7 Department of Water 155.43 Water and Sewers 2017-03-05
8 5 Verizon 39.95 Internet Service 2017-03-03
8 rows returned (0.352 millisec)
But what order of rows did this row_number( ) function use? Unknown, perhaps FIFO.
In the below case, I am first sorting by vendor, and then assigning the row number.
In the below case, I am first aggregating/grouping, and then assigning the row number.
Ranking Functions
The ranking analytic functions allow you to rank the resulting rows by a certain criteria or value.
When using ranking functions, rows with equal values will receive the same rank.
Explanation: RANK( ) skips numbers when the rank value is the same for multiple rows.
DENSE_RANK( ) does not skip numbers.
PERCENT_RANK( ) assigns a rank based on a percent in the range of 0 to 1
Example 1:
SELECT course.*, RANK( ) OVER( ORDER BY price) as "rank"
FROM course
course_id description price rank
X52-9272 SQL Programming Language 540 1
X52-9267 Object Oriented Analysis and Design 995 2
X52-9755 JavaScript 1095 3
X52-9562 Java Web Services 1095 3
Same
X52-9759 XML for Web Development 1095 3
ranking
X52-9238 Introduction to Java 1095 3
X52-9740 Web Page Development with HTML 1095 3
X52-9742 Intensive Web Development 3995 8 Skip values
Example 2:
SELECT course.*,
RANK( ) OVER(order by price desc) as "rank",
DENSE_RANK( ) OVER(order by price desc) as "dense_rank",
PERCENT_RANK( ) OVER(order by price desc) as "pct_rank"
FROM course
ORDER BY rank, dense_rank;
course_id description price rank dense_rank pct_rank
X52-9742 Intensive Web Development 3995 1 1 0.0000000000
X52-9755 JavaScript 1095 2 2 0.1428571429
X52-9562 Java Web Services 1095 2 2 0.1428571429
X52-9238 Introduction to Java 1095 2 2 0.1428571429
X52-9759 XML for Web Development 1095 2 2 0.1428571429
X52-9740 Web Page Development with HTML 1095 2 2 0.1428571429
X52-9267 Object Oriented Analysis and Design 995 7 3 0.8571428571
X52-9272 SQL Programming Language 540 8 4 1.0000000000
Example 3:
Step 1. Rank the data by group (i.e. partition) by vendor, and sort by date descending.
Step 2. And now we will select only the latest payments for each vendor.
In this case, we need to select all the rows with lastest_pay = 1
SELECT *
FROM
( SELECT vendor, pay_date, description, amount,
RANK( ) OVER (partition by vendor order by pay_date desc) as latest_pay
FROM payment ) latest
WHERE latest_pay = 1
Sometimes we might be interested in accessing the data of the previous row when we are already
positioned at the next row. For that we will use the LAG( ) function. We can also do the same for the
subsequent row using the LEAD( ) function.
Syntax:
LAG(column, n, default_value) //same for LEAD( )
Where:
column - is the column that you want to access from the previous
row
n - is the number of previous rows. Default = 1 previous row
default_value - if null, use this default value
Examples:
SELECT pay.*, LAG(pay_date) OVER( ORDER BY pay_date) AS "prev_pay"
FROM payment pay;
Notice that row number 2 was able to access the column value from row number 1
Adding the PARTITION clause to only get the last payment for each vendor.
SELECT pay.*,
LAG(pay_date) OVER( PARTITION BY vendor ORDER BY pay_date) "prev_pay"
FROM payment pay;
Sometimes you want to limit the output to a certain number of rows. Perhaps you want to select the
top row of the query results, or the first 10 rows only, or perhaps only rows number 10 to 20, to create
a sample dataset for analysis.
Both MySql and Oracle provide a way to limit the number of returned rows. However the syntax is not
the same and are not SQL standard.
Oracle 12+:
The syntax is: FETCH FIRST x ROWS ONLY
OFFSET n ROWS FETCH NEXT x ROWS ONLY
Examples:
SELECT student_id, fname, lname, ssn
FROM student
WHERE sex = 'F'
FETCH FIRST 5 ROWS ONLY -- Oracle - first 5 rows only
Prior to Oracle version 12c, there was not an easy way to limit the number of returned rows. However…
1) You can use the ROWNUM keyword as an additional filter in the WHERE clause.
To obtain the first x number of rows, simply use ROWNUM <= x. (Do not use = Except for row 1)
First x Rows:
Unfortunately ROWNUM assigns row numbers prior to performing any GROUP BY and/or
ORDER BY. As such, you cannot rely on ROWNUM if you are planning on grouping the data or
sorting the data prior to selecting the top rows.
_________________________________________________________________
2) A better technique is to use the analytics function ROW_NUMBER( ). (See discussion later).
This function assigns a number to each row after completion of the query, but before the final sort.
Most often we do not use a final order by, since there is an ORDER BY clause in analytic functions.
And now to select the top row (or top few rows), or to select a range of rows, we need a subquery
(also called nested query).
Range of Rows: