0% found this document useful (0 votes)
89 views21 pages

0 To 60 in 3.1: Tyler Carlton Cory Sessions

The document discusses upgrading a legacy database system to MySQL Cluster to handle a large demographics data mining project with over 1.7 million users. It describes the issues with the old system and reasons for choosing MySQL Cluster, including its scalability, distributed processing, and high reliability. It then details how the MySQL Cluster was implemented and configured, and discusses some performance and scaling issues encountered with large datasets.

Uploaded by

Oleksiy Kovyrin
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views21 pages

0 To 60 in 3.1: Tyler Carlton Cory Sessions

The document discusses upgrading a legacy database system to MySQL Cluster to handle a large demographics data mining project with over 1.7 million users. It describes the issues with the old system and reasons for choosing MySQL Cluster, including its scalability, distributed processing, and high reliability. It then details how the MySQL Cluster was implemented and configured, and discusses some performance and scaling issues encountered with large datasets.

Uploaded by

Oleksiy Kovyrin
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

0 to 60 in 3.

1
Presented by,
MySQL AB® & O’Reilly Media, Inc.

Tyler Carlton
Cory Sessions
<Insert funny joke here>
Presented by,
MySQL AB® & O’Reilly Media, Inc.
The Project
 Medium sized demographics data mining
project
 1,700,000+ User base
 Hundreds of data points per user
“Legacy” System – Why Upgrade?
 Main DB (External Users)
 Offline backup (Internal Users)
+
 Weekly manual copy backups
 Max of 3 simultaneous data
pulls
 8hr+ data pull times for
complex data pulls
 Random index corruption
Notes:
Smaller is Better

On average, CPU usage with MySQL was


20% lower than our old database solution.
Why We Chose MySQL Cluster

 Scalable
 Distributed processing
 5 – 9’s Reliability
 Instant data availability
between internal
& external users
What We Built – NDB Data Nodes
 8 Node NDB cluster
Dual Core 2 Quad 1.8 ghz
16 Gig ram (Data memory)
6x Raid 10 SAS 15k RPM drives
What We Built – API & MGMT Nodes
 3 API nodes + 1 management node
Dual Core 2 Quad 1.8 ghz
8 Gig ram
300 gig 7200rpm (Raid 0)
NDB Issues with a Large Data Set
 NDB load times
Loading from backup: ~ 1 hour
Restarting NDB nodes: ~ 1 hour

Note: Load times differ depending on your data size


NDB Issues with a Large Data Set
 Indexing Issues
Force index (NDB picks wrong)
Index creation/modification order matters (Seriously!)
 Local Checkpoint Tuning
TimeBetweenLocalCheckpoints - 20 means 4MB (4 ×
220) of write operations
NoOfFragmentLogFiles – No. of 4 x 16MB files
None deleted until 3 local checkpoints
On startup: Local checkpoint buffers would overrun
RTFM (two, maybe three times)
NDB Network Issues
 Network transport packet size
Buffer would fill and overrun
This caused nodes to miss their heartbeats and drop

 This would happen when:


A backup was running
A local checkpoint was running at the same time

 Solved by : Increasing network packet buffer


Issues - IN Statements
 IN statements die with
engine_condition_pushdown=ON with a set of
apx. 10,000 or more. (caused with zip codes)

 Really need engine_condition_pushdown=ON,


but this broke it for us, so… we had to disable it.
Structuring Apps: Redundency
 Redundant power supply + dual power sources
 Port trunking w/ redundant Gig-E switches
 # NDB Replicas: 2 (2x4 setup) 64 gig max
data size
 MySQL (API Nodes ) Heads: Load balanced
with automatic fail over
Structuring Apps: Internal Apps
 Ultimate goal: Offload the data intensive
processing to the MySQL nodes
The Good Stuff: Stats!
Queries per Second (over 20 days)
 Average 1100-1500 Queries / Sec
during our peak times
 Average 250 Queries / Sec
Website Traffic
Stats for March 2008
Net Usage: NDB Node
 All NDB data nodes have nearly identical
network bandwidth usage
 MySQL ( API ) Nodes
use about 9 MBs max

under our current
structure

Totaling 75 MBs
during peak(600 Mbs)
Monitoring & Maintenance
 SNMP Monitoring: CPU, Network, Memory, Load, Disk

 Cron Scripts:
Node status &
Node down notification
Backups
Database maintenance routines

MySQL Clustering book provided the base the scripts


Net Usage: Next steps…
 Dolphin NIC Testing
4 node test cluster
4 x overall performance
Brand new patch to
handle automatic Ethernet
failover / Dolphin Fail Over
( beta as of March 28 )
Questions?
Contact Information

 Tyler Carlton
www.qdial.com
[email protected]

 Cory Sessions
CorySessions.com
OrangeSoda.com

You might also like