SQL Antipatterns - Bill Karwin
SQL Antipatterns - Bill Karwin
www.percona.com
Problem
• Store & query hierarchical data
- Categories/subcategories
- Bill of materials
- Threaded discussions
www.percona.com
Example: Bug Report
Comments
(1) Fran:
What’s the cause
of this bug?
(7) Kukla:
That fixed it.
www.percona.com
Solutions
•Adjacency list
•Path enumeration
•Nested sets
•Closure table
www.percona.com
Adjacency List
www.percona.com
Adjacency List
• Naive solution nearly everyone uses
• Each entry knows its immediate parent
www.percona.com
Insert a New Node
INSERT INTO Comments (parent_id, author, comment)
VALUES (5, ‘Fran’, ‘I agree!’);
(1) Fran:
What’s the cause of
this bug?
(7) Kukla:
That fixed it.
www.percona.com
Insert a New Node
INSERT INTO Comments (parent_id, author, comment)
VALUES (5, ‘Fran’, ‘I agree!’);
(1) Fran:
What’s the cause of
this bug?
www.percona.com
Move a Node or Subtree
UPDATE Comments SET parent_id = 3
WHERE comment_id = 6;
(1) Fran:
What’s the cause of
this bug?
(7) Kukla:
That fixed it.
www.percona.com
Move a Node or Subtree
UPDATE Comments SET parent_id = 3
WHERE comment_id = 6;
(1) Fran:
What’s the cause of
this bug?
(3) Fran:
(5) Ollie:
No, I checked for
Yes, that’s a bug.
that.
www.percona.com
Move a Node or Subtree
UPDATE Comments SET parent_id = 3
WHERE comment_id = 6;
(1) Fran:
What’s the cause of
this bug?
(3) Fran:
(5) Ollie:
No, I checked for
Yes, that’s a bug.
that.
www.percona.com
Move a Node or Subtree
UPDATE Comments SET parent_id = 3
WHERE comment_id = 6;
(1) Fran:
What’s the cause of
this bug?
(3) Fran:
(5) Ollie:
No, I checked for
Yes, that’s a bug.
that.
(6) Fran:
Yes, please add a
check.
(7) Kukla:
That fixed it. www.percona.com
Query Immediate Child/Parent
• Query a node’s children:
SELECT * FROM Comments c1
LEFT JOIN Comments c2
ON (c2.parent_id = c1.comment_id);
www.percona.com
Can’t Handle Deep Trees
SELECT * FROM Comments c1
LEFT JOIN Comments c2 ON (c2.parent_id = c1.comment_id)
LEFT JOIN Comments c3 ON (c3.parent_id = c2.comment_id)
LEFT JOIN Comments c4 ON (c4.parent_id = c3.comment_id)
LEFT JOIN Comments c5 ON (c5.parent_id = c4.comment_id)
LEFT JOIN Comments c6 ON (c6.parent_id = c5.comment_id)
LEFT JOIN Comments c7 ON (c7.parent_id = c6.comment_id)
LEFT JOIN Comments c8 ON (c8.parent_id = c7.comment_id)
LEFT JOIN Comments c9 ON (c9.parent_id = c8.comment_id)
LEFT JOIN Comments c10 ON (c10.parent_id = c9.comment_id)
...
www.percona.com
Can’t Handle Deep Trees
SELECT * FROM Comments c1
LEFT JOIN Comments c2 ON (c2.parent_id = c1.comment_id)
LEFT JOIN Comments c3 ON (c3.parent_id = c2.comment_id)
LEFT JOIN Comments c4 ON (c4.parent_id = c3.comment_id)
LEFT JOIN Comments c5 ON (c5.parent_id = c4.comment_id)
LEFT JOIN Comments c6 ON (c6.parent_id = c5.comment_id)
LEFT JOIN Comments c7 ON (c7.parent_id = c6.comment_id)
LEFT JOIN Comments c8 ON (c8.parent_id = c7.comment_id)
LEFT JOIN Comments c9 ON (c9.parent_id = c8.comment_id)
LEFT JOIN Comments c10 ON (c10.parent_id = c9.comment_id)
...
it still doesn’t support
unlimited depth!
www.percona.com
SQL-99 recursive syntax
WITH [RECURSIVE] CommentTree
(comment_id, bug_id, parent_id, author, comment, depth)
AS (
SELECT *, 0 AS depth FROM Comments
WHERE parent_id IS NULL
UNION ALL
SELECT c.*, ct.depth+1 AS depth FROM CommentTree ct
JOIN Comments c ON (ct.comment_id = c.parent_id)
)
SELECT * FROM CommentTree WHERE bug_id = 1234;
✓
PostgreSQL, Oracle 11g,
IBM DB2, Microsoft SQL
Server, Apache Derby ✗ MySQL, SQLite, Informix,
Firebird,etc.
www.percona.com
Path Enumeration
www.percona.com
Path Enumeration
• Store chain of ancestors in each node
www.percona.com
Path Enumeration
• Store chain of ancestors in each node
good for
breadcrumbs
comment_id path author comment
1 1/ Fran What’s the cause of this bug?
2 1/2/ Ollie I think it’s a null pointer.
3 1/2/3/ Fran No, I checked for that.
4 1/4/ Kukla We need to check valid input.
5 1/4/5/ Ollie Yes, that’s a bug.
6 1/4/6/ Fran Yes, please add a check
7 1/4/6/7/ Kukla That fixed it.
www.percona.com
Query Ancestors and Subtrees
• Query ancestors of comment #7:
SELECT * FROM Comments
WHERE ‘1/4/6/7/’ LIKE path || ‘%’;
www.percona.com
Add a New Child of #7
INSERT INTO Comments (author, comment)
VALUES (‘Ollie’, ‘Good job!’);
SELECT path FROM Comments
WHERE comment_id = 7;
UPDATE Comments
SET path = $parent_path || LAST_INSERT_ID() || ‘/’
WHERE comment_id = LAST_INSERT_ID();
www.percona.com
Nested Sets
www.percona.com
Nested Sets
• Each comment encodes its descendants
using two numbers:
- A comment’s left number is less than all numbers
used by the comment’s descendants.
- A comment’s right number is greater than all
numbers used by the comment’s descendants.
- A comment’s numbers are between all
numbers used by the comment’s ancestors.
www.percona.com
What Does This Look Like?
(1) Fran:
What’s the
cause of this
bug?
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
(7) Kukla:
That fixed it.
www.percona.com
What Does This Look Like?
(1) Fran:
What’s the
cause of this
bug?
1 14
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
2 5 6 13
(7) Kukla:
That fixed it.
10 11
www.percona.com
What Does This Look Like?
www.percona.com
What Does This Look Like?
www.percona.com
Query Ancestors of #7
(7) Kukla:
That fixed it.
10 11
www.percona.com
Query Ancestors of #7
www.percona.com
Query Subtree Under #4
(1) Fran:
What’s the
parent
cause of this
bug?
1 14
(7) Kukla:
That fixed it.
10 11
www.percona.com
Query Subtree Under #4
www.percona.com
Insert New Child of #5
(1) Fran:
What’s the
cause of this
bug?
1 14
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
2 5 6 13
(7) Kukla:
That fixed it.
10 11
www.percona.com
Insert New Child of #5
(1) Fran:
What’s the
cause of this
bug?
1 14
16
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
2 5 6 13
15
(7) Kukla:
That fixed it.
10
12 13
11
www.percona.com
Insert New Child of #5
(1) Fran:
What’s the
cause of this
bug?
1 14
16
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
2 5 6 13
15
8 9 12
10 13
11
www.percona.com
Insert New Child of #5
UPDATE Comments
SET nsleft = CASE WHEN nsleft >= 8 THEN nsleft+2
ELSE nsleft END,
nsright = nsright+2
WHERE nsright >= 7;
INSERT INTO Comments (nsleft, nsright, author, comment)
VALUES (8, 9, 'Fran', 'I agree!');
• Recalculate left values for all nodes to the right of
the new child. Recalculate right values for all
nodes above and to the right.
www.percona.com
Query Immediate Parent of #6
(1) Fran:
What’s the
cause of this
bug?
1 14
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
2 5 6 13
(7) Kukla:
That fixed it.
10 11
www.percona.com
Query Immediate Parent of #6
• Parent of #6 is an ancestor who has no
descendant who is also an ancestor of #6.
SELECT parent.* FROM Comments AS c
JOIN Comments AS parent
ON (c.nsleft BETWEEN parent.nsleft AND parent.nsright)
LEFT OUTER JOIN Comments AS in_between
ON (c.nsleft BETWEEN in_between.nsleft AND in_between.nsright
AND in_between.nsleft BETWEEN parent.nsleft AND parent.nsright)
WHERE c.comment_id = 6 AND in_between.comment_id IS NULL;
www.percona.com
Query Immediate Parent of #6
• Parent of #6 is an ancestor who has no
descendant who is also an ancestor of #6.
SELECT parent.* FROM Comments AS c
JOIN Comments AS parent
ON (c.nsleft BETWEEN parent.nsleft AND parent.nsright)
LEFT OUTER JOIN Comments AS in_between
ON (c.nsleft BETWEEN in_between.nsleft AND in_between.nsright
AND in_between.nsleft BETWEEN parent.nsleft AND parent.nsright)
WHERE c.comment_id = 6 AND in_between.comment_id IS NULL;
www.percona.com
Closure Table
www.percona.com
Closure Table
CREATE TABLE TreePaths (
ancestor INT NOT NULL,
descendant INT NOT NULL,
PRIMARY KEY (ancestor, descendant),
FOREIGN KEY(ancestor)
REFERENCES Comments(comment_id),
FOREIGN KEY(descendant)
REFERENCES Comments(comment_id)
);
www.percona.com
Closure Table
• Many-to-many table
• Stores every path from each node
to each of its descendants
• A node even connects to itself
www.percona.com
Closure Table illustration
(1) Fran:
What’s the
cause of this
bug?
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
(7) Kukla:
That fixed it.
www.percona.com
Closure Table illustration
(1) Fran:
What’s the
cause of this
bug?
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
(7) Kukla:
That fixed it.
www.percona.com
Closure Table illustration
(1) Fran:
What’s the
cause of this
bug?
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
(7) Kukla:
That fixed it.
www.percona.com
Closure Table illustration
(1) Fran:
What’s the
cause of this
bug?
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
(7) Kukla:
That fixed it.
www.percona.com
What Does This Look Like?
ancestor descendant
comment_id author comment 1 1
1 2
1 Fran What’s the cause of this
bug? 1 3
1 4
2 Ollie I think it’s a null pointer. 1 5
3 Fran No, I checked for that. 1 6
1 7
4 Kukla We need to check valid
2 2
input.
2 3
5 Ollie Yes, that’s a bug. 3 3
4 4
6 Fran Yes, please add a check
4 5
7 Kukla That fixed it. 4 6
4 7
5 5
www.percona.com
What Does This Look Like?
ancestor descendant
comment_id author comment 1 1
1 2
1 Fran What’s the cause of this
bug? 1 3
1 4
2 Ollie I think it’s a null pointer. 1 5
3 Fran No, I checked for that. 1 6
1 7
4 Kukla We need to check valid
2 2
input.
2 3
5 Ollie Yes, that’s a bug. 3 3
4 4
6 Fran Yes, please add a check
4 5
7 Kukla That fixed it. 4 6
4 7
5 5
www.percona.com
Paths Starting from #4
(1) Fran:
What’s the
cause of this
bug?
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
(7) Kukla:
That fixed it.
www.percona.com
Query Ancestors of #6
www.percona.com
Paths Terminating at #6
(1) Fran:
What’s the
cause of this
bug?
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
(7) Kukla:
That fixed it.
www.percona.com
Insert New Child of #5
www.percona.com
Copy Paths from Parent
(1) Fran:
What’s the
cause of this
bug?
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
www.percona.com
Copy Paths from Parent
(1) Fran:
What’s the
cause of this
bug?
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
www.percona.com
Copy Paths from Parent
(1) Fran:
What’s the
cause of this
bug?
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
www.percona.com
Delete Child #7
www.percona.com
Delete Paths Terminating at #7
(1) Fran:
What’s the
cause of this
bug?
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
(7) Kukla:
That fixed it.
www.percona.com
Delete Paths Terminating at #7
(1) Fran:
What’s the
cause of this
bug?
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
(7) Kukla:
That fixed it.
www.percona.com
Delete Paths Terminating at #7
(1) Fran:
What’s the
cause of this
bug?
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
www.percona.com
Delete Paths Terminating at #7
(1) Fran:
What’s the
cause of this
bug?
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
(7) Kukla:
That fixed it.
www.percona.com
Delete Subtree Under #4
www.percona.com
Delete Any Paths Under #4
(1) Fran:
What’s the
cause of this
bug?
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
(7) Kukla:
That fixed it.
www.percona.com
Delete Any Paths Under #4
(1) Fran:
What’s the
cause of this
bug?
(4) Kukla:
(2) Ollie:
We need to
I think it’s a null
pointer. check valid
input.
(7) Kukla:
That fixed it.
www.percona.com
Delete Any Paths Under #4
(1) Fran:
What’s the
cause of this
bug?
(2) Ollie:
I think it’s a null
pointer.
(3) Fran:
No, I checked
for that.
www.percona.com
Delete Any Paths Under #4
(1) Fran:
What’s the
cause of this
bug?
(4) Kukla:
(2) Ollie: We need to
I think it’s a null check valid
pointer. input.
(7) Kukla:
That fixed it.
www.percona.com
Path Length
• Add a length column ancestor descendant length
FROM Comments c 2 3 1
JOIN TreePaths t 3 3 0
4 4 0
ON (c.comment_id = t.descendant) 4 5 1
WHERE t.ancestor = 4 4 6 1
AND t.length = 1; 4 7 2
5 5 0
6 6 0
6 7 1
7 7 0
www.percona.com
Path Length
• Add a length column ancestor descendant length
FROM Comments c 2 3 1
JOIN TreePaths t 3 3 0
4 4 0
ON (c.comment_id = t.descendant) 4 5 1
WHERE t.ancestor = 4 4 6 1
AND t.length = 1; 4 7 2
5 5 0
6 6 0
6 7 1
7 7 0
www.percona.com
Choosing the Right Design
www.percona.com
PHP Demo
of Closure Table
www.percona.com
Hierarchical Test Data
• Integrated Taxonomic Information System
- http://itis.gov/
- Free authoritative taxonomic information on plants,
animals, fungi, microbes
- 518,756 scientific names (as of Feb 2011)
www.percona.com
California Poppy
Kingdom: Plantae
Division: Tracheobionta
Class: Magnoliophyta
Order: Magnoliopsida
unranked: Magnoliidae
unranked: Papaverales
Family: Papaveraceae
Genus: Eschscholzia
Species: Eschscholzia californica
www.percona.com
California Poppy
Kingdom: Plantae
Division: Tracheobionta
Class: Magnoliophyta
Order: Magnoliopsida
unranked: Magnoliidae
unranked: Papaverales
Family: Papaveraceae
Genus: Eschscholzia
Species: Eschscholzia californica
id=18956
www.percona.com
California Poppy: ITIS Entry
hierarchy_string
202422-564824-18061-18063-18064-18879-18880-18954-18956
www.percona.com
California Poppy: ITIS Entry
hierarchy_string
202422-564824-18061-18063-18064-18879-18880-18954-18956
www.percona.com
Hierarchical Data Classes
abstract class ZendX_Db_Table_TreeTable
extends Zend_Db_Table_Abstract
{
public function fetchTreeByRoot($rootId, $expand)
public function fetchBreadcrumbs($leafId)
}
www.percona.com
Hierarchical Data Classes
class ZendX_Db_Table_Row_TreeRow
extends Zend_Db_Table_Row_Abstract
{
public function addChildRow($childRow)
public function getChildren()
}
class ZendX_Db_Table_Rowset_TreeRowset
extends Zend_Db_Table_Rowset_Abstract
{
public function append($row)
}
www.percona.com
Using TreeTable
class ItisTable extends ZendX_Db_Table_TreeTable
{
protected $_name = “longnames”;
protected $_closureName = “treepaths”;
}
$itis = new ItisTable();
www.percona.com
Breadcrumbs
$breadcrumbs = $itis->fetchBreadcrumbs(18956);
foreach ($breadcrumbs as $crumb) {
print $crumb->completename . “ > ”;
}
www.percona.com
Breadcrumbs SQL
www.percona.com
How Does it Perform?
• Query profile = 0.0006 sec
• MySQL EXPLAIN:
table type key ref rows extra
www.percona.com
Dump Tree
$tree = $itis->fetchTreeByRoot(18880); // Papaveraceae
print_tree($tree);
www.percona.com
Dump Tree Result
Papaveraceae Romneya
Platystigma Romneya coulteri
Platystigma linearis Romneya trichocalyx
Glaucium Dendromecon
Glaucium corniculatum Dendromecon harfordii
Glaucium flavum Dendromecon rigida
Chelidonium Eschscholzia
Chelidonium majus Eschscholzia californica
Bocconia Eschscholzia glyptosperma
Bocconia frutescens Eschscholzia hypecoides
Stylophorum Eschscholzia lemmonii
Stylophorum diphyllum Eschscholzia lobbii
Stylomecon Eschscholzia minutiflora
Stylomecon heterophylla Eschscholzia parishii
Canbya Eschscholzia ramosa
Canbya aurea Eschscholzia rhombipetala
Canbya candida Eschscholzia caespitosa
Chlidonium etc...
Chlidonium majus
www.percona.com
Dump Tree SQL
SELECT d.*, p.a AS _parent
FROM treepaths AS c
INNER JOIN longnames AS d ON c.d = d.tsn
LEFT JOIN treepaths AS p ON p.d = d.tsn
AND p.a IN (202422, 564824, 18053, 18020)
AND p.l = 1
WHERE (c.a = 202422)
AND (p.a IS NOT NULL OR d.tsn = 202422)
ORDER BY c.l, d.completename;
www.percona.com
Dump Tree SQL
show children
SELECT d.*, p.a AS _parent of these nodes
FROM treepaths AS c
INNER JOIN longnames AS d ON c.d = d.tsn
LEFT JOIN treepaths AS p ON p.d = d.tsn
AND p.a IN (202422, 564824, 18053, 18020)
AND p.l = 1
WHERE (c.a = 202422)
AND (p.a IS NOT NULL OR d.tsn = 202422)
ORDER BY c.l, d.completename;
www.percona.com
How Does it Perform?
• Query profile = 0.20 sec on Macbook Pro
• MySQL EXPLAIN:
table type key ref rows extra
www.percona.com
SHOW CREATE TABLE
CREATE TABLE `treepaths` (
`a` int(11) NOT NULL DEFAULT '0',
`d` int(11) NOT NULL DEFAULT '0',
`l` tinyint(3) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`a`,`d`),
KEY `tree_adl` (`a`,`d`,`l`),
KEY `tree_dl` (`d`,`l`),
CONSTRAINT FOREIGN KEY (`a`)
REFERENCES `longnames` (`tsn`),
CONSTRAINT FOREIGN KEY (`d`)
REFERENCES `longnames` (`tsn`)
) ENGINE=InnoDB
www.percona.com
SHOW TABLE STATUS
Name: treepaths
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 4600439
Avg_row_length: 62
Data_length: 288276480
Max_data_length: 0
Index_length: 273137664
Data_free: 7340032
www.percona.com
Demo Time!
www.percona.com
SQL Antipatterns
http://www.pragprog.com/titles/bksqla/
www.percona.com
License and Copyright
Copyright 2010-2013 Bill Karwin
www.slideshare.net/billkarwin
www.percona.com