SlideShare a Scribd company logo
Ivan Zoratti
Big Data with MySQL
Percona Live Santa Clara 2013
V1304.01
Friday, 3 May 13
Who is Ivan
?
Friday, 3 May 13
SkySQL
•Leading provider of open source
databases, services and
solutions
•Home for the founders and the
original developers of the core
of MySQL
•The creators of MariaDB, the
drop-off, innovative
replacement of MySQL
Friday, 3 May 13
What is Big Data?
http://marketingblogged.marketingmagazine.co.uk/files/Big-Data-3.jpg
Friday, 3 May 13
PAGE
Big Data!
Big data is a collection of data
sets so large and complex that it
becomes difficult to process
using on-hand database
management tools or traditional
data processing applications.
5
http://readwrite.com/files/styles/800_450sc/public/files/fields/shutterstock_bigdata.jpg
Friday, 3 May 13
PAGE
Big Data By Structure
6
Unstructured
•Store everything you have/you find
•In any format and shape
•You do not know how to use it, but it may
come handy
•Storing unstructured data is usually cheaper than
storing it in a more structured datastore
•Does not fit well in a relational database
•Examples:
•Text: Plain text, documents, web content,
messages
•Bitmap: Image, audio, video
•Typical approach:
•Mining, pattern recognition, tagging
•Usually batch analysis
Structured
•Store only what you need
•In a good format, ready to be used
•You should already know how to use it, or at
least what it means
•Storing structured data is quite expensive
•Raw data, indexing, denormalisation,
aggregation
•Arelational database is still the best choice
•Examples:
•Machine-Generated Data (MGD)
•Tags, counters, sales
•Typical approach:
•BI tools, reporting
•Real time analysis change data capture
Friday, 3 May 13
PAGE
Unstructured
•Store everything you have/you find
•In any format and shape
•You do not know how to use it, but it may
come handy
•Storing unstructured data is usually cheaper than
storing it in a more structured datastore
•Does not fit well in a relational database
•Examples:
•Text: Plain text, documents, web content,
messages
•Bitmap: Image, audio, video
•Typical approach:
•Mining, pattern recognition, tagging
•Usually batch analysis
Structured
•Store only what you need
•In a good format, ready to be used
•You should already know how to use it, or at
least what it means
•Storing structured data is quite expensive
•Raw data, indexing, denormalisation,
aggregation
•Arelational database is still the best choice
•Examples:
•Machine-Generated Data (MGD)
•Tags, counters, sales
•Typical approach:
•BI tools, reporting
•Real time analysis change data capture
Big Data By Structure
7
Friday, 3 May 13
PAGE
How “Big” is Big Data?
•Data Factors
•Size
•Speed to collect/
generate
•Variety
•Resources
•Administrators
•Developers
•Infrastructure
•Growth
•Collection
•Processing
•Availability
•To whom?
•For how long?
•In which format?
•Aggregated
•Detailed
8
Friday, 3 May 13
PAGE
How to manage Big Data
•Collection - Storage -Archive
•Load - Transform -Analyze
•Access - Explore - Utilize
9
http://www.futuresmag.com/2012/07/01/big-data-manage-it-dont-drown-in-it
Friday, 3 May 13
Big Data with MySQL
http://news.mydosti.com/newsphotos/tech/BigDataV1Dec22012.jpg
Friday, 3 May 13
PAGE
Technologies to
Use / Consider / Watch
•MyISAM and MyISAM compression
•InnoDB compression
•MySQL 5.6 Partitioning
•MariaDB Optimizer
•MariaDB Virtual & Dynamic
Columns
•Cassandra Storage Engine
•Connect Storage Engine
•Columnar Databases
•InfiniDB
•Infobright
•TokuDB Storage Engine
11
Friday, 3 May 13
PAGE
Columnar Databases
•Automatic compression
•Automatic column storage
•Data distribution
•Map/Reduce approach
•MPP / Parallel loading
•No indexes
•On public clouds, HW or SW
appliances
12
Friday, 3 May 13
PAGE
TokuDB
•Increased Performance
•Increased Compression
•Online administration
•No Index rebuild
13
Friday, 3 May 13
PAGE
MyISAM
•Static, dynamic and compressed
format
•Multiple key cache, CACHE INDEX
and LOAD INDEX
•Compressed tables
•Horizontal partitioning (manual)
•External locking
14
Friday, 3 May 13
PAGE
InnoDB/XtraDB
•Data Load
•Pre-order data
•Split data into chunks
•unique_checks = 0;
•foreign_key_checks = 0;
•sql_log_bin = 0;
•innodb_autoinc_lock_mode = 2;
•Compression and block size
•Persistent optimizer stats
•innodb_stats_persistent
•innodb_stats_auto_recalc
15
SET GLOBAL innodb_file_per_table = 1;
SET GLOBAL innodb_file_format = Barracuda;
CREATE TABLE t1
( c1 INT PRIMARY KEY,
c2 VARCHAR(255) )
ROW_FORMAT = COMPRESSED
KEY_BLOCK_SIZE = 8;
LOAD   DATA LOCAL INFILE '/usr2/t1_01_simple' INTO TABLE t1;
Query OK, 134217728 rows affected (1 hour 34 min 7.49 sec)
Records: 134217728  Deleted: 0  Skipped: 0  Warnings: 0
LOAD   DATA LOCAL INFILE '/usr2/t1_01_simple' INTO TABLE t2;
Query OK, 134217728 rows affected (25 min 20.75 sec)
Records: 134217728  Deleted: 0  Skipped: 0  Warnings: 0
Friday, 3 May 13
PAGE
Partitioning (MySQL 5.6)
•Partitioning Types
•RANGE, LIST, RANGE COLUMN,
HASH, LINEAR HASH, KEY LINEAR
KEY, sub-partitions
•Partition and lock pruning
•Use of INDEX and DATA
DIRECTORY
•PARTITIONADD, DROP,
REORGANIZE, COALESCE,
TRUNCATE, EXCHANGE,
REBUILD, OPTIMIZE, CHECK,
ANALYZE, REPAIR
16
CREATE TABLE t1 ( c1 INT, c2 DATE )
PARTITION BY RANGE( YEAR( c2 ) )
SUBPARTITION BY HASH ( TO_DAYS( c2 ) )
( PARTITION p0 VALUES LESS THAN (1990) (
SUBPARTITION s0
DATA DIRECTORY = '/disk0/data'
INDEX DIRECTORY = '/disk0/idx',
SUBPARTITION s1
DATA DIRECTORY = '/disk1/data'
INDEX DIRECTORY = '/disk1/idx' ),...
ALTER TABLE t1
EXCHANGE PARTITION p3 WITH TABLE t2;
-- Range and List partitions
ALTER TABLE t1 REORGANIZE PARTITION
p0,p1,p2,p3 INTO (
PARTITION m0 VALUES LESS THAN (1980),
PARTITION m1 VALUES LESS THAN (2000));
-- Hash and Key partitions
ALTER TABLE t1 COALESCE PARTITION 10;
ALTER TABLE t1 ADD PARTITION PARTITIONS 5;
Friday, 3 May 13
PAGE
MariaDB Optimizer
•Multi-Range Read (MRR)*
•Index Merge / Sort intersection
•Batch KeyAccess*
•Block hash join
•Cost-based choice of range vs.
index_merge
•ORDER BY ... LIMIT <limit>*
•MariaDB 10
•Subqueries
•Semi-join*
•Materialization*
•subquery cache
•LIMIT ... ROWS EXAMINED
<limit>
17
(*) - Available in MySQL 5.6
Friday, 3 May 13
PAGE
Virtual & Dynamic Columns
VIRTUAL COLUMNS
•For InnoDB, MyISAM andAria
•PERSISTENT (stored) or VIRTUAL
(generated)
18
CREATE TABLE t1 (
c1 INT NOT NULL,
c2 VARCHAR(32),
c3 INT AS
( c1 MOD 10 ) VIRTUAL,
c4 VARCHAR(5) AS
( LEFT(B,5) ) PERSISTENT);
DYNAMIC COLUMNS
•Implement a schemaless,
document store
•COLUMN_ CREATE,ADD, GET, LIST,
JSON, EXISTS, CHECK, DELETE
•Nested colums are allowed
•Main datatypes are allowed
•Max 1GB documents
CREATE TABLE assets (
item_name VARCHAR(32) PRIMARY KEY,
dynamic_cols BLOB );
INSERT INTO assets VALUES (
'MariaDB T-shirt',
COLUMN_CREATE( 'color', 'blue',
'size', 'XL' ) );
INSERT INTO assets VALUES (
'Thinkpad Laptop',
COLUMN_CREATE( 'color', 'black',
'price', 500 ) );
Friday, 3 May 13
PAGE
Cassandra Storage Engine
•Column Family == Table
•Rowkey, static and dynamic
columns allowed
•Batch key access support
SET cassandra_default_thrift_host =
'192.168.0.10'
CREATE TABLE cassandra_tbl (
rowkey INT PRIMARY KEY,
col1 VARCHAR(25),
col2 BIGINT,
dyn_cols BLOB DYNAMIC_COLUMN_STORAGE = yes )
ENGINE = cassandra
KEYSPACE = 'cassandra_key_space'
COLUMN_FAMILY = 'column_family_name';
19
Friday, 3 May 13
PAGE
Connect Storage Engine
•Any file format as MySQLTABLE:
•ODBC
•Text, XML, *ML
•Excel,Access etc.
•MariaDB CREATE TABLE options
•Multi-file table
•TableAutocreation
•Condition push down
•Read/Write and Multi Storage Engine Join
•CREATE INDEX
20
CREATE TABLE handout
ENGINE = CONNECT
TABLE_TYPE = XML
FILE_NAME = 'handout.htm'
HEADER = yes OPTION_LIST =
'name = TABLE,
coltype = HTML,
attribute =
(border=1;cellpadding=5)';
Friday, 3 May 13
Starting Your Big Data Project
Friday, 3 May 13
PAGE
Why would you use MySQL?
• Time
• Knowledge
• Infrastructure
• Costs
• Simplified Integration
• Not so “big” data
22
Friday, 3 May 13
PAGE
Apache Hadoop & Friends
23
HDFS
MapReduce
PIG HIVE
HCatalog
HBASE
ZooKeeper
•Mahout
•Ambari, Ganglia,
Nagios
•Sqoop
•Cascading
•Oozie
•Flume
•Protobuf, Avro,
Thrift
•Fuse-DFS
•Chukwa
•Cassandra
Friday, 3 May 13
PAGE
MySQL & Friends
24
MySQL/MariaDB/Storage Engines
SQL Optimizer
Scripts
Stored Procedures DML
DB Schema / DDL
MySQL/MariaDB
SkySQLDS
•Mahout
•SDS, Ganglia,
Nagios
•mysqlimport
•Cascading
•Talend, Pentaho
•Connect
Friday, 3 May 13
PAGE
Join us at the Solutions Day
•Cassandra and Connect Storage Engine
•Map/Reduce approach - Proxy optimisation
•Multiple protocols and more
25
Friday, 3 May 13
Thank You!
ivan@skysql.com
izoratti.blogspot.com
www.slideshare.net/izorattiwww.skysql.com
Friday, 3 May 13
Ad

More Related Content

What's hot (20)

Dimensional Modelling
Dimensional ModellingDimensional Modelling
Dimensional Modelling
Prithwis Mukerjee
 
MS-SQL SERVER ARCHITECTURE
MS-SQL SERVER ARCHITECTUREMS-SQL SERVER ARCHITECTURE
MS-SQL SERVER ARCHITECTURE
Douglas Bernardini
 
Azure Cosmos DB
Azure Cosmos DBAzure Cosmos DB
Azure Cosmos DB
Mohamed Tawfik
 
Oracle statistics by example
Oracle statistics by exampleOracle statistics by example
Oracle statistics by example
Mauro Pagano
 
Analyzing awr report
Analyzing awr reportAnalyzing awr report
Analyzing awr report
satish Gaddipati
 
Top 10 tips for Oracle performance (Updated April 2015)
Top 10 tips for Oracle performance (Updated April 2015)Top 10 tips for Oracle performance (Updated April 2015)
Top 10 tips for Oracle performance (Updated April 2015)
Guy Harrison
 
06 SSIS Data Flow
06 SSIS Data Flow06 SSIS Data Flow
06 SSIS Data Flow
Slava Kokaev
 
MYSQL-Database
MYSQL-DatabaseMYSQL-Database
MYSQL-Database
V.V.Vanniaperumal College for Women
 
Introduction to Relational Databases
Introduction to Relational DatabasesIntroduction to Relational Databases
Introduction to Relational Databases
Research Support Team, IT Services, University of Oxford
 
Chapter 1 introduction to sql server
Chapter 1 introduction to sql serverChapter 1 introduction to sql server
Chapter 1 introduction to sql server
baabtra.com - No. 1 supplier of quality freshers
 
Sql Functions And Procedures
Sql Functions And ProceduresSql Functions And Procedures
Sql Functions And Procedures
DataminingTools Inc
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Aaron Shilo
 
SQL Commands
SQL Commands SQL Commands
SQL Commands
Sachidananda M H
 
Introduction to structured query language (sql)
Introduction to structured query language (sql)Introduction to structured query language (sql)
Introduction to structured query language (sql)
Sabana Maharjan
 
Oracle User Management
Oracle User ManagementOracle User Management
Oracle User Management
Arun Sharma
 
Introduction to oracle database (basic concepts)
Introduction to oracle database (basic concepts)Introduction to oracle database (basic concepts)
Introduction to oracle database (basic concepts)
Bilal Arshad
 
Data warehouse
Data warehouseData warehouse
Data warehouse
Sonali Chawla
 
Introduction to azure cosmos db
Introduction to azure cosmos dbIntroduction to azure cosmos db
Introduction to azure cosmos db
Ratan Parai
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
PolarSeven Pty Ltd
 
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
Mark Ginnebaugh
 
Oracle statistics by example
Oracle statistics by exampleOracle statistics by example
Oracle statistics by example
Mauro Pagano
 
Top 10 tips for Oracle performance (Updated April 2015)
Top 10 tips for Oracle performance (Updated April 2015)Top 10 tips for Oracle performance (Updated April 2015)
Top 10 tips for Oracle performance (Updated April 2015)
Guy Harrison
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Aaron Shilo
 
Introduction to structured query language (sql)
Introduction to structured query language (sql)Introduction to structured query language (sql)
Introduction to structured query language (sql)
Sabana Maharjan
 
Oracle User Management
Oracle User ManagementOracle User Management
Oracle User Management
Arun Sharma
 
Introduction to oracle database (basic concepts)
Introduction to oracle database (basic concepts)Introduction to oracle database (basic concepts)
Introduction to oracle database (basic concepts)
Bilal Arshad
 
Introduction to azure cosmos db
Introduction to azure cosmos dbIntroduction to azure cosmos db
Introduction to azure cosmos db
Ratan Parai
 
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
Mark Ginnebaugh
 

Similar to Big Data with MySQL (20)

Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
ColdFusionConference
 
Data Warehouse Logical Design using Mysql
Data Warehouse Logical Design using MysqlData Warehouse Logical Design using Mysql
Data Warehouse Logical Design using Mysql
HAFIZ Islam
 
Star schema my sql
Star schema   my sqlStar schema   my sql
Star schema my sql
deathsubte
 
What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?
Ivan Zoratti
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for Analytics
Ike Ellis
 
Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)
IDERA Software
 
Oracle 12c New Features For Better Performance
Oracle 12c New Features For Better PerformanceOracle 12c New Features For Better Performance
Oracle 12c New Features For Better Performance
Zohar Elkayam
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Caserta
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
Ike Ellis
 
Oracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big DataOracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big Data
Abishek V S
 
The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse design
Calpont
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
Christos Charmatzis
 
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfytxjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
WrushabhShirsat3
 
unit-ii.pptx
unit-ii.pptxunit-ii.pptx
unit-ii.pptx
NilamHonmane
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Michael Rys
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStore
MariaDB plc
 
CCS334 BIG DATA ANALYTICS Session 2 Types NoSQL.pptx
CCS334 BIG DATA ANALYTICS Session 2 Types NoSQL.pptxCCS334 BIG DATA ANALYTICS Session 2 Types NoSQL.pptx
CCS334 BIG DATA ANALYTICS Session 2 Types NoSQL.pptx
Guru Nanak Technical Institutions
 
SQLServer Database Structures
SQLServer Database Structures SQLServer Database Structures
SQLServer Database Structures
Antonios Chatzipavlis
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
Caserta
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
ColdFusionConference
 
Data Warehouse Logical Design using Mysql
Data Warehouse Logical Design using MysqlData Warehouse Logical Design using Mysql
Data Warehouse Logical Design using Mysql
HAFIZ Islam
 
Star schema my sql
Star schema   my sqlStar schema   my sql
Star schema my sql
deathsubte
 
What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?
Ivan Zoratti
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for Analytics
Ike Ellis
 
Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)
IDERA Software
 
Oracle 12c New Features For Better Performance
Oracle 12c New Features For Better PerformanceOracle 12c New Features For Better Performance
Oracle 12c New Features For Better Performance
Zohar Elkayam
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Caserta
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
Ike Ellis
 
Oracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big DataOracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big Data
Abishek V S
 
The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse design
Calpont
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
Christos Charmatzis
 
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfytxjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
xjtrutdctrd5454drxxresersestryugyufy6rythgfytfyt
WrushabhShirsat3
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Michael Rys
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStore
MariaDB plc
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
Caserta
 
Ad

More from Ivan Zoratti (20)

AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jAI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
Ivan Zoratti
 
Introducing the Open Edge Module
Introducing the Open Edge ModuleIntroducing the Open Edge Module
Introducing the Open Edge Module
Ivan Zoratti
 
MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017
Ivan Zoratti
 
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More FlexibilityNOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
Ivan Zoratti
 
MariaDB ColumnStore - LONDON MySQL Meetup
MariaDB ColumnStore - LONDON MySQL MeetupMariaDB ColumnStore - LONDON MySQL Meetup
MariaDB ColumnStore - LONDON MySQL Meetup
Ivan Zoratti
 
ScaleDB Technical Presentation
ScaleDB Technical PresentationScaleDB Technical Presentation
ScaleDB Technical Presentation
Ivan Zoratti
 
Time Series From Collection To Analysis
Time Series From Collection To AnalysisTime Series From Collection To Analysis
Time Series From Collection To Analysis
Ivan Zoratti
 
ScaleDB Technical Presentation
ScaleDB Technical PresentationScaleDB Technical Presentation
ScaleDB Technical Presentation
Ivan Zoratti
 
MySQL for Beginners - part 1
MySQL for Beginners - part 1MySQL for Beginners - part 1
MySQL for Beginners - part 1
Ivan Zoratti
 
Anatomy of a Proxy Server - MaxScale Internals
Anatomy of a Proxy Server - MaxScale InternalsAnatomy of a Proxy Server - MaxScale Internals
Anatomy of a Proxy Server - MaxScale Internals
Ivan Zoratti
 
Orchestrating MySQL
Orchestrating MySQLOrchestrating MySQL
Orchestrating MySQL
Ivan Zoratti
 
GTIDs Explained
GTIDs ExplainedGTIDs Explained
GTIDs Explained
Ivan Zoratti
 
The Evolution of Open Source Databases
The Evolution of Open Source DatabasesThe Evolution of Open Source Databases
The Evolution of Open Source Databases
Ivan Zoratti
 
MaxScale for Effective MySQL Meetup NYC - 14.01.21
MaxScale for Effective MySQL Meetup NYC - 14.01.21MaxScale for Effective MySQL Meetup NYC - 14.01.21
MaxScale for Effective MySQL Meetup NYC - 14.01.21
Ivan Zoratti
 
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live LondonMariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
Ivan Zoratti
 
SkySQL & MariaDB What's all the buzz?
SkySQL & MariaDB What's all the buzz?SkySQL & MariaDB What's all the buzz?
SkySQL & MariaDB What's all the buzz?
Ivan Zoratti
 
MySQL & MariaDB - Innovation Happens Here
MySQL & MariaDB - Innovation Happens HereMySQL & MariaDB - Innovation Happens Here
MySQL & MariaDB - Innovation Happens Here
Ivan Zoratti
 
Sky Is The limit
Sky Is The limitSky Is The limit
Sky Is The limit
Ivan Zoratti
 
The sky's the limit
The sky's the limitThe sky's the limit
The sky's the limit
Ivan Zoratti
 
HA Reloaded
HA ReloadedHA Reloaded
HA Reloaded
Ivan Zoratti
 
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jAI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
Ivan Zoratti
 
Introducing the Open Edge Module
Introducing the Open Edge ModuleIntroducing the Open Edge Module
Introducing the Open Edge Module
Ivan Zoratti
 
MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017
Ivan Zoratti
 
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More FlexibilityNOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
Ivan Zoratti
 
MariaDB ColumnStore - LONDON MySQL Meetup
MariaDB ColumnStore - LONDON MySQL MeetupMariaDB ColumnStore - LONDON MySQL Meetup
MariaDB ColumnStore - LONDON MySQL Meetup
Ivan Zoratti
 
ScaleDB Technical Presentation
ScaleDB Technical PresentationScaleDB Technical Presentation
ScaleDB Technical Presentation
Ivan Zoratti
 
Time Series From Collection To Analysis
Time Series From Collection To AnalysisTime Series From Collection To Analysis
Time Series From Collection To Analysis
Ivan Zoratti
 
ScaleDB Technical Presentation
ScaleDB Technical PresentationScaleDB Technical Presentation
ScaleDB Technical Presentation
Ivan Zoratti
 
MySQL for Beginners - part 1
MySQL for Beginners - part 1MySQL for Beginners - part 1
MySQL for Beginners - part 1
Ivan Zoratti
 
Anatomy of a Proxy Server - MaxScale Internals
Anatomy of a Proxy Server - MaxScale InternalsAnatomy of a Proxy Server - MaxScale Internals
Anatomy of a Proxy Server - MaxScale Internals
Ivan Zoratti
 
Orchestrating MySQL
Orchestrating MySQLOrchestrating MySQL
Orchestrating MySQL
Ivan Zoratti
 
The Evolution of Open Source Databases
The Evolution of Open Source DatabasesThe Evolution of Open Source Databases
The Evolution of Open Source Databases
Ivan Zoratti
 
MaxScale for Effective MySQL Meetup NYC - 14.01.21
MaxScale for Effective MySQL Meetup NYC - 14.01.21MaxScale for Effective MySQL Meetup NYC - 14.01.21
MaxScale for Effective MySQL Meetup NYC - 14.01.21
Ivan Zoratti
 
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live LondonMariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
Ivan Zoratti
 
SkySQL & MariaDB What's all the buzz?
SkySQL & MariaDB What's all the buzz?SkySQL & MariaDB What's all the buzz?
SkySQL & MariaDB What's all the buzz?
Ivan Zoratti
 
MySQL & MariaDB - Innovation Happens Here
MySQL & MariaDB - Innovation Happens HereMySQL & MariaDB - Innovation Happens Here
MySQL & MariaDB - Innovation Happens Here
Ivan Zoratti
 
The sky's the limit
The sky's the limitThe sky's the limit
The sky's the limit
Ivan Zoratti
 
Ad

Recently uploaded (20)

Buckeye Dreamin' 2023: De-fogging Debug Logs
Buckeye Dreamin' 2023: De-fogging Debug LogsBuckeye Dreamin' 2023: De-fogging Debug Logs
Buckeye Dreamin' 2023: De-fogging Debug Logs
Lynda Kane
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Automation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From AnywhereAutomation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From Anywhere
Lynda Kane
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Salesforce AI Associate 2 of 2 Certification.docx
Salesforce AI Associate 2 of 2 Certification.docxSalesforce AI Associate 2 of 2 Certification.docx
Salesforce AI Associate 2 of 2 Certification.docx
José Enrique López Rivera
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.
gregtap1
 
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtBuckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Lynda Kane
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
Buckeye Dreamin' 2023: De-fogging Debug Logs
Buckeye Dreamin' 2023: De-fogging Debug LogsBuckeye Dreamin' 2023: De-fogging Debug Logs
Buckeye Dreamin' 2023: De-fogging Debug Logs
Lynda Kane
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Automation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From AnywhereAutomation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From Anywhere
Lynda Kane
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Salesforce AI Associate 2 of 2 Certification.docx
Salesforce AI Associate 2 of 2 Certification.docxSalesforce AI Associate 2 of 2 Certification.docx
Salesforce AI Associate 2 of 2 Certification.docx
José Enrique López Rivera
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.
gregtap1
 
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtBuckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Lynda Kane
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 

Big Data with MySQL

  • 1. Ivan Zoratti Big Data with MySQL Percona Live Santa Clara 2013 V1304.01 Friday, 3 May 13
  • 3. SkySQL •Leading provider of open source databases, services and solutions •Home for the founders and the original developers of the core of MySQL •The creators of MariaDB, the drop-off, innovative replacement of MySQL Friday, 3 May 13
  • 4. What is Big Data? http://marketingblogged.marketingmagazine.co.uk/files/Big-Data-3.jpg Friday, 3 May 13
  • 5. PAGE Big Data! Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. 5 http://readwrite.com/files/styles/800_450sc/public/files/fields/shutterstock_bigdata.jpg Friday, 3 May 13
  • 6. PAGE Big Data By Structure 6 Unstructured •Store everything you have/you find •In any format and shape •You do not know how to use it, but it may come handy •Storing unstructured data is usually cheaper than storing it in a more structured datastore •Does not fit well in a relational database •Examples: •Text: Plain text, documents, web content, messages •Bitmap: Image, audio, video •Typical approach: •Mining, pattern recognition, tagging •Usually batch analysis Structured •Store only what you need •In a good format, ready to be used •You should already know how to use it, or at least what it means •Storing structured data is quite expensive •Raw data, indexing, denormalisation, aggregation •Arelational database is still the best choice •Examples: •Machine-Generated Data (MGD) •Tags, counters, sales •Typical approach: •BI tools, reporting •Real time analysis change data capture Friday, 3 May 13
  • 7. PAGE Unstructured •Store everything you have/you find •In any format and shape •You do not know how to use it, but it may come handy •Storing unstructured data is usually cheaper than storing it in a more structured datastore •Does not fit well in a relational database •Examples: •Text: Plain text, documents, web content, messages •Bitmap: Image, audio, video •Typical approach: •Mining, pattern recognition, tagging •Usually batch analysis Structured •Store only what you need •In a good format, ready to be used •You should already know how to use it, or at least what it means •Storing structured data is quite expensive •Raw data, indexing, denormalisation, aggregation •Arelational database is still the best choice •Examples: •Machine-Generated Data (MGD) •Tags, counters, sales •Typical approach: •BI tools, reporting •Real time analysis change data capture Big Data By Structure 7 Friday, 3 May 13
  • 8. PAGE How “Big” is Big Data? •Data Factors •Size •Speed to collect/ generate •Variety •Resources •Administrators •Developers •Infrastructure •Growth •Collection •Processing •Availability •To whom? •For how long? •In which format? •Aggregated •Detailed 8 Friday, 3 May 13
  • 9. PAGE How to manage Big Data •Collection - Storage -Archive •Load - Transform -Analyze •Access - Explore - Utilize 9 http://www.futuresmag.com/2012/07/01/big-data-manage-it-dont-drown-in-it Friday, 3 May 13
  • 10. Big Data with MySQL http://news.mydosti.com/newsphotos/tech/BigDataV1Dec22012.jpg Friday, 3 May 13
  • 11. PAGE Technologies to Use / Consider / Watch •MyISAM and MyISAM compression •InnoDB compression •MySQL 5.6 Partitioning •MariaDB Optimizer •MariaDB Virtual & Dynamic Columns •Cassandra Storage Engine •Connect Storage Engine •Columnar Databases •InfiniDB •Infobright •TokuDB Storage Engine 11 Friday, 3 May 13
  • 12. PAGE Columnar Databases •Automatic compression •Automatic column storage •Data distribution •Map/Reduce approach •MPP / Parallel loading •No indexes •On public clouds, HW or SW appliances 12 Friday, 3 May 13
  • 13. PAGE TokuDB •Increased Performance •Increased Compression •Online administration •No Index rebuild 13 Friday, 3 May 13
  • 14. PAGE MyISAM •Static, dynamic and compressed format •Multiple key cache, CACHE INDEX and LOAD INDEX •Compressed tables •Horizontal partitioning (manual) •External locking 14 Friday, 3 May 13
  • 15. PAGE InnoDB/XtraDB •Data Load •Pre-order data •Split data into chunks •unique_checks = 0; •foreign_key_checks = 0; •sql_log_bin = 0; •innodb_autoinc_lock_mode = 2; •Compression and block size •Persistent optimizer stats •innodb_stats_persistent •innodb_stats_auto_recalc 15 SET GLOBAL innodb_file_per_table = 1; SET GLOBAL innodb_file_format = Barracuda; CREATE TABLE t1 ( c1 INT PRIMARY KEY, c2 VARCHAR(255) ) ROW_FORMAT = COMPRESSED KEY_BLOCK_SIZE = 8; LOAD   DATA LOCAL INFILE '/usr2/t1_01_simple' INTO TABLE t1; Query OK, 134217728 rows affected (1 hour 34 min 7.49 sec) Records: 134217728  Deleted: 0  Skipped: 0  Warnings: 0 LOAD   DATA LOCAL INFILE '/usr2/t1_01_simple' INTO TABLE t2; Query OK, 134217728 rows affected (25 min 20.75 sec) Records: 134217728  Deleted: 0  Skipped: 0  Warnings: 0 Friday, 3 May 13
  • 16. PAGE Partitioning (MySQL 5.6) •Partitioning Types •RANGE, LIST, RANGE COLUMN, HASH, LINEAR HASH, KEY LINEAR KEY, sub-partitions •Partition and lock pruning •Use of INDEX and DATA DIRECTORY •PARTITIONADD, DROP, REORGANIZE, COALESCE, TRUNCATE, EXCHANGE, REBUILD, OPTIMIZE, CHECK, ANALYZE, REPAIR 16 CREATE TABLE t1 ( c1 INT, c2 DATE ) PARTITION BY RANGE( YEAR( c2 ) ) SUBPARTITION BY HASH ( TO_DAYS( c2 ) ) ( PARTITION p0 VALUES LESS THAN (1990) ( SUBPARTITION s0 DATA DIRECTORY = '/disk0/data' INDEX DIRECTORY = '/disk0/idx', SUBPARTITION s1 DATA DIRECTORY = '/disk1/data' INDEX DIRECTORY = '/disk1/idx' ),... ALTER TABLE t1 EXCHANGE PARTITION p3 WITH TABLE t2; -- Range and List partitions ALTER TABLE t1 REORGANIZE PARTITION p0,p1,p2,p3 INTO ( PARTITION m0 VALUES LESS THAN (1980), PARTITION m1 VALUES LESS THAN (2000)); -- Hash and Key partitions ALTER TABLE t1 COALESCE PARTITION 10; ALTER TABLE t1 ADD PARTITION PARTITIONS 5; Friday, 3 May 13
  • 17. PAGE MariaDB Optimizer •Multi-Range Read (MRR)* •Index Merge / Sort intersection •Batch KeyAccess* •Block hash join •Cost-based choice of range vs. index_merge •ORDER BY ... LIMIT <limit>* •MariaDB 10 •Subqueries •Semi-join* •Materialization* •subquery cache •LIMIT ... ROWS EXAMINED <limit> 17 (*) - Available in MySQL 5.6 Friday, 3 May 13
  • 18. PAGE Virtual & Dynamic Columns VIRTUAL COLUMNS •For InnoDB, MyISAM andAria •PERSISTENT (stored) or VIRTUAL (generated) 18 CREATE TABLE t1 ( c1 INT NOT NULL, c2 VARCHAR(32), c3 INT AS ( c1 MOD 10 ) VIRTUAL, c4 VARCHAR(5) AS ( LEFT(B,5) ) PERSISTENT); DYNAMIC COLUMNS •Implement a schemaless, document store •COLUMN_ CREATE,ADD, GET, LIST, JSON, EXISTS, CHECK, DELETE •Nested colums are allowed •Main datatypes are allowed •Max 1GB documents CREATE TABLE assets ( item_name VARCHAR(32) PRIMARY KEY, dynamic_cols BLOB ); INSERT INTO assets VALUES ( 'MariaDB T-shirt', COLUMN_CREATE( 'color', 'blue', 'size', 'XL' ) ); INSERT INTO assets VALUES ( 'Thinkpad Laptop', COLUMN_CREATE( 'color', 'black', 'price', 500 ) ); Friday, 3 May 13
  • 19. PAGE Cassandra Storage Engine •Column Family == Table •Rowkey, static and dynamic columns allowed •Batch key access support SET cassandra_default_thrift_host = '192.168.0.10' CREATE TABLE cassandra_tbl ( rowkey INT PRIMARY KEY, col1 VARCHAR(25), col2 BIGINT, dyn_cols BLOB DYNAMIC_COLUMN_STORAGE = yes ) ENGINE = cassandra KEYSPACE = 'cassandra_key_space' COLUMN_FAMILY = 'column_family_name'; 19 Friday, 3 May 13
  • 20. PAGE Connect Storage Engine •Any file format as MySQLTABLE: •ODBC •Text, XML, *ML •Excel,Access etc. •MariaDB CREATE TABLE options •Multi-file table •TableAutocreation •Condition push down •Read/Write and Multi Storage Engine Join •CREATE INDEX 20 CREATE TABLE handout ENGINE = CONNECT TABLE_TYPE = XML FILE_NAME = 'handout.htm' HEADER = yes OPTION_LIST = 'name = TABLE, coltype = HTML, attribute = (border=1;cellpadding=5)'; Friday, 3 May 13
  • 21. Starting Your Big Data Project Friday, 3 May 13
  • 22. PAGE Why would you use MySQL? • Time • Knowledge • Infrastructure • Costs • Simplified Integration • Not so “big” data 22 Friday, 3 May 13
  • 23. PAGE Apache Hadoop & Friends 23 HDFS MapReduce PIG HIVE HCatalog HBASE ZooKeeper •Mahout •Ambari, Ganglia, Nagios •Sqoop •Cascading •Oozie •Flume •Protobuf, Avro, Thrift •Fuse-DFS •Chukwa •Cassandra Friday, 3 May 13
  • 24. PAGE MySQL & Friends 24 MySQL/MariaDB/Storage Engines SQL Optimizer Scripts Stored Procedures DML DB Schema / DDL MySQL/MariaDB SkySQLDS •Mahout •SDS, Ganglia, Nagios •mysqlimport •Cascading •Talend, Pentaho •Connect Friday, 3 May 13
  • 25. PAGE Join us at the Solutions Day •Cassandra and Connect Storage Engine •Map/Reduce approach - Proxy optimisation •Multiple protocols and more 25 Friday, 3 May 13