PPC2009 Mysql Pagination
PPC2009 Mysql Pagination
Yahoo Inc
1. Overview
– Common pagination UI pattern
– Sample table and typical solution using OFFSET
– Techniques to avoid large OFFSET
– Performance comparison
– Concerns
-2-
Common Patterns
-3-
Basics
First step toward having efficient pagination over large data set
Step zero
– http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html
– http://dev.mysql.com/doc/refman/5.1/en/order-by-optimization.html
– http://dev.mysql.com/doc/refman/5.1/en/limit-optimization.html
-4-
Using Index
• http://domain.com/message?page=1
• ORDER BY id DESC LIMIT 0, 20
• http://domain.com/message?page=2
• ORDER BY id DESC LIMIT 20, 20
• http://domain.com/message?page=3
• ORDER BY id DESC LIMIT 40, 20
Note: id is auto_increment, same as create_time order, no need to create index on create_time, save space
– -7-
Explain
-8-
Performance Implications
– Larger OFFSET is going to increase active data set, MySQL has to bring data
in memory that is never returned to caller.
– Performance issue is more visible when your have database that can't fit in
main memory.
– Small percentage of request with large OFFSET would be able to hit disk I/O
Disk I/O bottleneck
– In order to display “21 to 40 of 1000,000” , some one has to count 1000,000
rows.
-9-
Simple Solution
- 10 -
Avoid Count(*)
1. Never display total messages, let user see more message by clicking
'next'
2. Do not count on every request, cache it, display stale count, user do not
care about 324533 v/s 324633
3. Display 41 to 80 of Thousands
- 11 -
Solution to avoid offset
- 12 -
Find the clue
150
111
102 Page One
101
100 <a href=”/page=2;last_seen=100;dir=next>Next</a>
98 <a href=”/page=1;last_seen=98;dir=prev>Prev</a>
97
96 Page Two
95
94 <a href=”/page=3;last_seen=94;dir=next>Next</a>
93 <a href=”/page=3;last_seen=93;dir=prev>Prev</a>
92
91 Page Three
90
89 <a href=”/page=4;last_seen=89;dir=prev>Next</a>
- 13 -
Solution using clue
Next Page:
http://domain.com/forum?page=2&last_seen=100&dir=next
Prev Page:
http://domain.com/forum?page=1&last_seen=98&dir=prev
- 14 -
Explain
mysql> explain
SELECT * FROM message
WHERE id < '49999961'
ORDER BY id DESC LIMIT 20 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: message
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
Rows: 25000020 /* ignore this */
Extra: Using where
1 row in set (0.00 sec)
- 15 -
What about order by non unique values?
99
99
98 Page One
98
98
98
98
97 Page Two
97
10
We can't do:
WHERE thumbs_up < 98
ORDER BY thumbs_up DESC /* It will return few seen rows */
- 16 -
Add more condition
- 17 -
Solution
First Page
SELECT thumbs_up, id
FROM message
ORDER BY thumbs_up DESC, id DESC
LIMIT $page_size
+-----------+----+
| thumbs_up | id |
+-----------+----+
| 99 | 14 |
| 99 | 2 |
| 98 | 18 |
| 98 | 15 |
| 98 | 13 |
+-----------+----+
Next Page
SELECT thumbs_up, id
FROM message
WHERE thumbs_up <= 98 AND (id < 13 OR thumbs_up < 98)
ORDER BY thumbs_up DESC, id DESC
LIMIT $page_size
+-----------+----+
| thumbs_up | id |
+-----------+----+
| 98 | 10 |
| 98 | 6 |
| 97 | 17 |
- 18 -
Make it better..
Query:
- 19 -
Explain
- 20 -
Performance Gain (Primary Key Order)
- 21 -
Performance Gain (Secondary Key Order)
- 22 -
Throughput Gain
- 23 -
Bonus Point
User is reading a page, in the mean time some records may be added to
previous page.
- 24 -
Drawback
Two Solutions:
• Read extra rows
– Read extra rows in advance and construct links for few previous & next pages
- 26 -