10 Sorting

This document summarizes sorting and aggregation algorithms used in database management systems. It discusses external merge sort, which divides data into runs that fit in memory and writes them to disk, then merges the runs. It also covers aggregation, which can be done by sorting on the group by keys or using hashing to group tuples. Hashing may spill partitions to disk if they do not fit in memory, then rehash each partition to compute the aggregation.

Uploaded by

sondos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views2 pages

10 Sorting

Uploaded by

sondos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Lecture #10: Sorting & Aggregation Algorithms

15-445/645 Database Systems (Fall 2019)

https://15445.courses.cs.cmu.edu/fall2019/
Carnegie Mellon University
Prof. Andy Pavlo

1 Sorting
We need sorting because in the relation model, tuples in a table have no specific order Sorting is (potentially)
used in ORDER BY, GROUP BY, JOIN, and DISTINCT operators.
We can accelerate sorting using a clustered B+tree by scanning the leaf nodes from left to right. This is
a bad idea, however, if we use an unclustered B+tree to sort because it causes a lot of I/O reads (random
access through pointer chasing).
If the data that we need to sort fits in memory, then the DBMS can use a standard sorting algorithms (e.g.,
quicksort). If the data does not fit, then the DBMS needs to use external sorting that is able to spill to disk
as needed and prefers sequential over random I/O.

2 External Merge Sort

Divide-and-conquer sorting algorithm that splits the data set into separate runs and then sorts them individ-
ually. It can spill runs to disk as needed then read them back in one at a time.
Phase #1 – Sorting: Sort small chunks of data that fit in main memory, and then write back to disk.
Phase #2 – Merge: Combine sorted sub-files into a larger single file.

Two-way Merge Sort

1. Pass #0: Reads every B pages of the table into memory. Sorts them, and writes them back into disk.
Each sorted set of pages is called a run.
2. Pass #1,2,3...: Recursively merges pairs of runs into runs twice as long.
Number of Passes: 1 + dlog2 N e
Total I/O Cost: 2N × (# of passes)

General (K-way) Merge Sort

1. Pass #0: Use B buffer pages, produce N/B sorted runs of size B.
2. Pass #1,2,3...: Recursively merge B − 1 runs.
Number of Passes = 1 + logB−1 N

B
Total I/O Cost: 2N × (# of passes)

Double Buffering Optimization

Prefetch the next run in the background and store it in a second buffer while the system is processing the
current run. This reduces the wait time for I/O requests at each step by continuously utilizing the disk.
Fall 2019– Lecture #10 Sorting & Aggregation Algorithms

3 Aggregations
An aggregation operator in a query plan collapses the values of one or more tuples into a single scalar value.
There are two approaches for implementing an aggregation: (1) sorting and (2) hashing.

Sorting
The DBMS first sorts the tuples on the GROUP BY key(s). It can use either an in-memory sorting algorithm if
everything fits in the buffer pool (e.g., quicksort) or the external merge sort algorithm if the size of the data
exceeds memory.
The DBMS then performs a sequential scan over the sorted data to compute the aggregation. The output of
the operator will be sorted on the keys.

Hashing
Hashing can be computationally cheaper than sorting for computing aggregations. The DBMS populates an
ephemeral hash table as it scans the table. For each record, check whether there is already an entry in the
hash table and perform the appropriate modification.
If the size of the hash table is too large to fit in memory, then the DBMS has to spill it to disk:
• Phase #1 – Partition: Use a hash function h1 to split tuples into partitions on disk based on target
hash key. This will put all tuples that match into the same partition. The DBMS spills partitions to
disk via output buffers.
• Phase #2 – ReHash: For each partition on disk, read its pages into memory and build an in-memory
hash table based on a second hash function h2 (where h1 6= h2 ). Then go through each bucket of this
hash table to bring together matching tuples to compute the aggregation. Note that this assumes that
each partition fits in memory.
During the ReHash phase, the DBMS can store pairs of the form (GroupByKey→RunningValue) to compute
the aggregation. The contents of RunningValue depends on the aggregation function. To insert a new tuple
into the hash table:
• If it finds a matching GroupByKey, then update the RunningValue appropriately.
• Else insert a new (GroupByKey→RunningValue) pair.

15-445/645 Database Systems

Page 2 of 2

C# - 3 in 1 - Beginner - S Guide+ Simple and Effective Tips and Tricks+ Advanced Guide To Learn C
No ratings yet
C# - 3 in 1 - Beginner - S Guide+ Simple and Effective Tips and Tricks+ Advanced Guide To Learn C
405 pages
Im C4510.C6010 Partes
100% (1)
Im C4510.C6010 Partes
230 pages
TMs VMware Workspace ONE - Deploy and Manage (V22.x) - Lab Manual PDF
No ratings yet
TMs VMware Workspace ONE - Deploy and Manage (V22.x) - Lab Manual PDF
110 pages
MMPC 008 Dec 2022 EM TEE IGNOUAssignmentGuru 6cpr01
No ratings yet
MMPC 008 Dec 2022 EM TEE IGNOUAssignmentGuru 6cpr01
22 pages
x 431+Pro3+（Pro3s+Elite）+User+Manual
No ratings yet
x 431+Pro3+（Pro3s+Elite）+User+Manual
57 pages
528192-001P_InstantID_Release Notes
No ratings yet
528192-001P_InstantID_Release Notes
25 pages
1994 Vitara SERVICE MANUAL 99500-60A10-01E PDF
No ratings yet
1994 Vitara SERVICE MANUAL 99500-60A10-01E PDF
835 pages
Boyd L Summers BL Summers Consulting LLC Usa
No ratings yet
Boyd L Summers BL Summers Consulting LLC Usa
23 pages
Unit - I R23 Part 1
No ratings yet
Unit - I R23 Part 1
18 pages
Test Practice For SAP PP Certification
No ratings yet
Test Practice For SAP PP Certification
13 pages
Final Innovations in Phase II
No ratings yet
Final Innovations in Phase II
86 pages
Unit 2 Assessment - Attempt Review - Saylor Academy
No ratings yet
Unit 2 Assessment - Attempt Review - Saylor Academy
24 pages
Conversational Health Agents: A Personalized LLM-Powered Agent Framework
No ratings yet
Conversational Health Agents: A Personalized LLM-Powered Agent Framework
23 pages
7184L1003 Paradigm PTS-18 Datasheet RevC
No ratings yet
7184L1003 Paradigm PTS-18 Datasheet RevC
2 pages
32GQ850 32GQ85X 32GQ850L Eng
No ratings yet
32GQ850 32GQ85X 32GQ850L Eng
41 pages
Lect 7 Seq Logic Struct
No ratings yet
Lect 7 Seq Logic Struct
31 pages
Mrjob Documentation: Release 0.6.0.dev0
No ratings yet
Mrjob Documentation: Release 0.6.0.dev0
150 pages
Society, Law and Ethics
No ratings yet
Society, Law and Ethics
64 pages
DNS CNAME Tracking
No ratings yet
DNS CNAME Tracking
21 pages
Launchpad Mini - Programmers Reference Manual
No ratings yet
Launchpad Mini - Programmers Reference Manual
23 pages
M.techCyberForensicsInformationSecurity CyberSecurity IISem
No ratings yet
M.techCyberForensicsInformationSecurity CyberSecurity IISem
20 pages
Chapter: 8 Telecommunications: MIS - Lecture - 2
No ratings yet
Chapter: 8 Telecommunications: MIS - Lecture - 2
37 pages
Cryptocurrency - Content File PDF
No ratings yet
Cryptocurrency - Content File PDF
19 pages
Job Simulator Reviews - Google Search
No ratings yet
Job Simulator Reviews - Google Search
1 page
Implementation of Discrete-Time Systems
No ratings yet
Implementation of Discrete-Time Systems
45 pages
3GPP TS 29.525
No ratings yet
3GPP TS 29.525
62 pages
Lesson 1 - Adobe Photoshop
No ratings yet
Lesson 1 - Adobe Photoshop
50 pages
Very High Speed Integrated Circuits Hardware Description Language) VHDL (
No ratings yet
Very High Speed Integrated Circuits Hardware Description Language) VHDL (
16 pages
NCERT Solutions For Class 9 Maths Chapter 1 Number Systems Solutions
No ratings yet
NCERT Solutions For Class 9 Maths Chapter 1 Number Systems Solutions
16 pages
ReadyDesk User Manual
No ratings yet
ReadyDesk User Manual
83 pages
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6440)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5145)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (642)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (581)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (998)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (628)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1174)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (463)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2010)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1018)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (279)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (2133)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1138)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4360)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4088)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4102)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2788)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2884)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

10 Sorting

Uploaded by

10 Sorting

Uploaded by

Lecture #10: Sorting & Aggregation Algorithms

15-445/645 Database Systems (Fall 2019)

2 External Merge Sort

Two-way Merge Sort

General (K-way) Merge Sort

Double Buffering Optimization

15-445/645 Database Systems

You might also like