SlideShare a Scribd company logo
Databases
Kodok Márton
■ simb.ro
■ kodokmarton.eu
■ twitter.com/martonkodok
■ facebook.com/marton.kodok
■ stackoverflow.com/users/243782/pentium10
23 May, 2013 @Sapientia
Relational Databases
● A relational database is essentially a group of tables (entities).
● Tables are made up of columns and rows.
● Those tables have constraints, and relationships are defined between them.
● Relational databases are queried using SQL
● Multiple tables being accessed in a single query are "joined" together, typically by
a criteria defined in the table relationship columns.
● Normalization is a data-structuring model used with relational databases that
ensures data consistency and removes data duplication.
Non-Relational Databases (NoSQL)
● Key-value stores are the simplest NoSQL
databases. Every single item in the database is
stored as an attribute name, or key, together with its
value. Examples of key-value stores are Riak and
MongoDB. Some key-value stores, such as Redis,
allow each value to have a type, such as "integer",
which adds functionality.
● Document databases can contain many different
key-value pairs, or key-array pairs, or even nested
documents
● Graph stores are used to store information about
networks, such as social connections
● Wide-column stores such as Cassandra and
HBase are optimized for queries over large datasets,
and store columns of data together, instead of rows.
SQL vs NoSQL
The purpose of this presentation is NOT about
SQL vs NoSQL.
Let’s be blunt: none of them are difficult, we need both of them.
We evolve.
Spreadsheets/Frontends
Excel and Access are not a database.
Let’s be blunt: Excel does not need more than 256 columns and 65 536 rows.
DDL (Data Definition Language)
CREATE TABLE employees (
id INTEGER(11) PRIMARY KEY,
first_name VARCHAR(50) NULL,
last_name VARCHAR(75) NOT NULL,
dateofbirth DATE NULL
);
ALTER TABLE sink ADD bubbles INTEGER;
ALTER TABLE sink DROP COLUMN bubbles;
DROP TABLE employees;
RENAME TABLE My_table TO
Tmp_table;
TRUNCATE TABLE My_table;
CREATE, ALTER, DROP, RENAME, TRUNCATE
SQL (Structured Query Language)
INSERT INTO My_table
(field1, field2, field3)
VALUES
('test', 'N', NULL);
SELECT Book.title AS Title,
COUNT(*) AS Authors
FROM Book JOIN Book_author
ON Book.isbn = Book_author.isbn
GROUP BY Book.title;
UPDATE My_table
SET field1 = 'updated value'
WHERE field2 = 'N';
DELETE FROM My_table
WHERE field2 = 'N';
CRUD (CREATE, READ, UPDATE, DELETE)
Indexes (fast lookup + constraints)
Constraint:
a. PRIMARY
b. UNIQUE
c. FOREIGN KEY
● Index reduce the amount of data the server has to examine
● can speed up reads but can slow down inserts and updates
● is used to enforce constraints
Type:
1. BTREE
- can be used for look-ups and sorting
- can match the full value
- can match a leftmost prefix ( LIKE 'ma%' )
2. HASH
- only supports equality comparisons: =, IN()
- can't be used for sorting
3. FULLTEXT
- only MyISAM tables
- compares words or phrases
- returns a relevance value
Some Queries That Can Use BTREE Index
● point look-up
SELECT * FROM students WHERE grade = 100;
● open range
SELECT * FROM students WHERE grade > 75;
● closed range
SELECT * FROM students WHERE 70 < grade < 80;
● special range
SELECT * FROM students WHERE name LIKE 'ma%';
Multi Column Indexes Useful for sorting/where
CREATE INDEX `salary_name_idx` ON emp(salary, name);
SELECT salary, name FROM emp ORDER BY salary, name;
(5000, 'john') < (5000, 'michael')
(9000, 'philip') < (9999, 'steve')
Indexing InnoDB Tables
● data is clustered by primary key
● primary key is implicitly appended to all indexes
CREATE INDEX fname_idx ON emp(firstname);
actually creates KEY(firstname, id) internally
Avoid long primary keys!
TS-09061982110055-12345
349950002348857737488334
supercalifragilisticexpialidocious
How MySQL Uses Indexes
● looking up data
● joining tables
● sorting
● avoiding reading data
MySQL chooses only ONE index per table.
DON'Ts
● don't follow optimization rules blindly
● don't create an index for every column in your table
thinking that it will make things faster
● don't create duplicate indexes
ex.
BAD:
create index firstname_ix on Employee(firstname);
create index lastname_ix on Employee(lastname);
GOOD:
create index first_last_ix on Employee(firstname, lastname);
create index id_ix on Employee(id);
DOs
● use index for optimizing look-ups, sorting and retrieval of
data
● use short primary keys if possible when using the
InnoDB storage engine
● extend index if you can, instead of creating new indexes
● validate performance impact as you're doing changes
● remove unused indexes
Speeding it up
● proper table design (3nf)
● understand query cache (internal of MySQL)
● EXPLAIN syntax
● proper indexes
● MySQL server daemon optimization
● Slow Query Logs
● Stored Procedures
● Profiling
● Redundancy -> Master : Slave
● Sharding
● mix database servers based on business logic
-> Memcache, Redis, MongoDB
EXPLAIN
EXPLAIN SELECT * FROM attendees
WHERE conference_id = 123 AND registration_status > 0
table possible_keys key rows
attendees NULL NULL 14052
The three most important columns returned by EXPLAIN
1) Possible keys
● All the possible indexes which MySQL could have used
● Based on a series of very quick lookups and calculations
2) Chosen key
3) Rows scanned
● Indication of effort required to identify your result set
-> Interpreting the results
Interpreting the results
EXPLAIN SELECT * FROM attendees
WHERE conference_id = 123 AND registration_status > 0
table possible_keys key rows
attendees NULL NULL 14052
● No suitable indexes for this query
● MySQL had to do a full table scan
● Full table scans are almost always the slowest query
● Full table scans, while not always bad, are usually an indication that an
index is required
-> Adding indexes
Adding indexes
ALTER TABLE ADD INDEX conf (conference_id);
ALTER TABLE ADD INDEX reg (registration_status);
EXPLAIN SELECT * FROM attendees
WHERE conference_id = 123 AND registration_status > 0
table possible_keys key rows
attendees conf, reg conf 331
● MySQL had two indexes to choose from, but discarded “reg”
● “reg” isn't sufficiently unique
● The spread of values can also be a factor (e.g when 99% of rows contain
the same value)
● Index “uniqueness” is called cardinality
● There is scope for some performance increase... Lower server load,
quicker response
-> Choosing a better index
Choosing a better index
ALTER TABLE ADD INDEX reg_conf_index (registration_status, conference_id);
EXPLAIN SELECT * FROM attendees
WHERE registration_status > 0 AND conference_id = 123
table possible_keys key rows
attendees reg, conf,
reg_conf_index
reg_conf_index 204
● reg_conf_index is a much better choice
● Note that the other two keys are still available, just not as effective
● Our query is now served well by the new index
-> Using it wrong
Watch for WHERE column order
DELETE INDEX conf; DELETE INDEX reg;
EXPLAIN SELECT * FROM attendees WHERE conference_id = 123
table possible_keys key rows
attendees NULL NULL 14052
● Without the “conf” index, we're back to square one
● The order in which fields were defined in a composite index affects whether it is available for
use in a query
● Remember, we defined our index : (registration_status, conference_id)
Potential workaround:
EXPLAIN SELECT * FROM attendees WHERE registration_status >= -1 AND
conference_id = 123
table possible_keys key rows
attendees reg_conf_index reg_conf_index 204
JOINs
● JOINing together large data sets (>= 10,000) is really where EXPLAIN
becomes useful
● Each JOIN in a query gets its own row in EXPLAIN
● Make sure each JOIN condition is FAST
● Make sure each joined table is getting to its result set as quickly as possible
● The benefits compound if each join requires less effort
Simple JOIN example
EXPLAIN SELECT * FROM conferences c
JOIN attendees a ON c.conference_id = a.conference_id
WHERE conferences.location_id = 2 AND conferences.topic_id IN (4,6,1) AND
attendees.registration_status > 1
table type possible_keys key rows
conferences ref conference_topic conference_topic 15
attendees ALL NULL NULL 14052
● Looks like I need an index on attendees.conference_id
● Another indication of effort, aside from rows scanned
● Here, “ALL” is bad – we should be aiming for “ref”
● There are 13 different values for “type”
● Common values are:
const, eq_ref, ref, fulltext, index_merge, range, all
http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
The "extra" column
With every EXPLAIN, you get an “extra” column, which shows additional
operations invoked to get your result set.
Some example “extra” values:
● Using index
● Using where
● Using temporary table
● Using filesort
There are many more “extra” values which are discussed in the MySQL manual:
Distinct, Full scan, Impossible HAVING, Impossible WHERE, Not exists
http://dev.mysql.com/doc/refman/5.0/en/explain-output.html#explain-join-types
table type possible_keys key rows extra
attendees ref conf conf 331 Using where
Using filesort
Using filesort
Avoid, because:
● Doesn't use an index
● Involves a full scan of your result set
● Employs a generic (i.e. one size fits all) algorithm
● Creates temporary tables
● Uses the filesystem (seek)
● Will get slower with more data
It's not all bad...
● Perfectly acceptable provided you get to your
● result set as quickly as possible, and keep it predictably small
● Sometimes unavoidable - ORDER BY RAND()
● ORDER BY operations can use indexes to do the sorting!
Using filesort
EXPLAIN SELECT * FROM attendees
WHERE conference_id = 123 ORDER BY surname
ALTER TABLE attendees ADD INDEX conf_surname (conference_id, surname);
We've avoided a filesort!
table possible_keys key rows extra
attendees conference_id conference_id 331 Using filesort
MySQL is using an index, but it's sorting the results slowly
table possible_keys key rows extra
attendees conference_id,
conf_surname
conf_surname 331
NoSQL engines
● Redis
● MongoDB
● Cassandra
● CouchDB
● DynamoDB
● Riak
● Membase
● HBase
Data is created, updated, deleted, retrieved using API calls.
All application and data integrity logic is contained in the application code.
Redis
● Written In: C/C++
● Main point: Blazing fast
● License: BSD
● Protocol: Telnet-like
● Disk-backed in-memory database,
● Master-slave replication
● Simple values or hash tables by keys,
● but complex operations like ZREVRANGEBYSCORE.
● INCR & co (good for rate limiting or statistics)
● Has sets (also union/diff/inter)
● Has lists (also a queue; blocking pop)
● Has hashes (objects of multiple fields)
● Sorted sets (high score table, good for range queries)
● Redis has transactions (!)
● Values can be set to expire (as in a cache)
● Pub/Sub lets one implement messaging (!)
Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory).
For example: Stock prices. Analytics. Real-time data collection. Real-time communication.
SET uid:1000:username antirezr
uid:1000:followers
uid:1000:following
GET foo => bar
INCR foo => 11
LPUSH mylist a (now mylist holds one element list 'a')
LPUSH mylist b (now mylist holds 'b,a')
LPUSH mylist c (now mylist holds 'c,b,a')
SADD myset a
SADD myset b
SADD myset foo
SADD myset bar
SCARD myset => 4
SMEMBERS myset => bar,a,foo,b
MongoDB
● Written In: C++
● Main point: Retains some friendly properties of SQL. (Query, index)
● License: AGPL (Drivers: Apache)
● Protocol: Custom, binary (BSON)
● Master/slave replication (auto failover with replica sets)
● Sharding built-in
● Queries are javascript expressions / Run arbitrary javascript functions server-side
● Uses memory mapped files for data storage
● Performance over features
● Journaling (with --journal) is best turned on
● On 32bit systems, limited to ~2.5Gb
● An empty database takes up 192Mb
● Has geospatial indexing
Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If
you need good performance on a big DB.
For example: For most things that you would do with MySQL or PostgreSQL, but having predefined
columns really holds you back.
MongoDB -> JSON
The MongoDB examples assume a collection named users that contain
documents of the following prototype:
{
_id: ObjectID("509a8fb2f3f4948bd2f983a0"),
user_id: "abc123",
age: 55,
status: 'A'
}
MongoDB -> insert
SQL MongoDB
CREATE TABLE users (
id MEDIUMINT NOT NULL
AUTO_INCREMENT,
user_id Varchar(30),
age Number,
status char(1),
PRIMARY KEY (id)
)
db.users.insert( {
user_id: "abc123",
age: 55,
status: "A"
} )
Implicitly created on first insert operation.
The primary key _id is automatically added
if _id field is not specified.
MongoDB -> Alter, Index, Select
SQL MongoDB
ALTER TABLE users
ADD join_date DATETIME
db.users.update(
{ },
{ $set: { join_date: new Date() } },
{ multi: true }
)
CREATE INDEX idx_user_id_asc
ON users(user_id)
db.users.ensureIndex( { user_id: 1 } )
SELECT user_id, status
FROM users
WHERE status = "A"
db.users.find(
{ status: "A" },
{ user_id: 1, status: 1, _id: 0 }
)
SELECT *
FROM users
WHERE status = "A"
OR age = 50
db.users.find(
{ $or: [ { status: "A" } ,
{ age: 50 } ] }
)
Job Trends from Indeed.com
Thank you.
Questions?

More Related Content

What's hot (20)

PDF
Oracle SQL Basics
Dhananjay Goel
 
PPT
Hybrid framework
Sudhakar Mangi
 
PDF
Elastic Search (엘라스틱서치) 입문
SeungHyun Eom
 
PDF
Building Your Robot using AWS Robomaker
Alex Barbosa Coqueiro
 
PDF
MySQL: Indexing for Better Performance
jkeriaki
 
PDF
MySQL Tuning
Ford AntiTrust
 
PPT
MySql slides (ppt)
webhostingguy
 
PPTX
Recommender systems using collaborative filtering
D Yogendra Rao
 
PDF
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Jaime Crespo
 
PPTX
Html forms
Er. Nawaraj Bhandari
 
PPTX
Python Developer Roadmap 2023
Simplilearn
 
PDF
Recommender Systems
T212
 
PPTX
12. oracle database architecture
Amrit Kaur
 
PPT
Working with Databases and MySQL
Nicole Ryan
 
PPTX
Oracle Database Introduction
Chhom Karath
 
PDF
Nhập môn Css
Ly hai
 
PPTX
Oracle sql high performance tuning
Guy Harrison
 
PDF
SQL
kaushal123
 
PPTX
Elastic search overview
ABC Talks
 
PPTX
SQL UNION
Ritwik Das
 
Oracle SQL Basics
Dhananjay Goel
 
Hybrid framework
Sudhakar Mangi
 
Elastic Search (엘라스틱서치) 입문
SeungHyun Eom
 
Building Your Robot using AWS Robomaker
Alex Barbosa Coqueiro
 
MySQL: Indexing for Better Performance
jkeriaki
 
MySQL Tuning
Ford AntiTrust
 
MySql slides (ppt)
webhostingguy
 
Recommender systems using collaborative filtering
D Yogendra Rao
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Jaime Crespo
 
Python Developer Roadmap 2023
Simplilearn
 
Recommender Systems
T212
 
12. oracle database architecture
Amrit Kaur
 
Working with Databases and MySQL
Nicole Ryan
 
Oracle Database Introduction
Chhom Karath
 
Nhập môn Css
Ly hai
 
Oracle sql high performance tuning
Guy Harrison
 
Elastic search overview
ABC Talks
 
SQL UNION
Ritwik Das
 

Similar to Introduction to Databases - query optimizations for MySQL (20)

PDF
Mysql Explain Explained
Jeremy Coates
 
PDF
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
Hemant Kumar Singh
 
PDF
Mysql query optimization
Baohua Cai
 
PPTX
MySQL Indexes
Anton Zhukov
 
ODP
San diegophp
Dave Stokes
 
PDF
MySQL Indexing Crash Course
Aaron Silverman
 
PPT
Indexing
Davood Barfeh
 
PPT
Explain that explain
Fabrizio Parrella
 
PDF
MySQL Query tuning 101
Sveta Smirnova
 
PDF
Need for Speed: MySQL Indexing
MYXPLAIN
 
PDF
My MySQL SQL Presentation
Justin Rhinesmith
 
PPTX
Optimizing MySQL queries
GMO-Z.com Vietnam Lab Center
 
PDF
MySQL Query Optimisation 101
Federico Razzoli
 
PDF
Covering indexes
MYXPLAIN
 
PDF
Introduction into MySQL Query Tuning
Sveta Smirnova
 
PDF
MySQL INDEXES
HripsimeGhaltaghchya
 
PDF
MariaDB workshop
Alex Chistyakov
 
PPTX
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
Dave Stokes
 
PDF
MySQL Query And Index Tuning
Manikanda kumar
 
PPTX
Guide To Mastering The MySQL Query Execution Plan
Optimiz DBA
 
Mysql Explain Explained
Jeremy Coates
 
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
Hemant Kumar Singh
 
Mysql query optimization
Baohua Cai
 
MySQL Indexes
Anton Zhukov
 
San diegophp
Dave Stokes
 
MySQL Indexing Crash Course
Aaron Silverman
 
Indexing
Davood Barfeh
 
Explain that explain
Fabrizio Parrella
 
MySQL Query tuning 101
Sveta Smirnova
 
Need for Speed: MySQL Indexing
MYXPLAIN
 
My MySQL SQL Presentation
Justin Rhinesmith
 
Optimizing MySQL queries
GMO-Z.com Vietnam Lab Center
 
MySQL Query Optimisation 101
Federico Razzoli
 
Covering indexes
MYXPLAIN
 
Introduction into MySQL Query Tuning
Sveta Smirnova
 
MySQL INDEXES
HripsimeGhaltaghchya
 
MariaDB workshop
Alex Chistyakov
 
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
Dave Stokes
 
MySQL Query And Index Tuning
Manikanda kumar
 
Guide To Mastering The MySQL Query Execution Plan
Optimiz DBA
 
Ad

More from Márton Kodok (20)

PDF
AI Agents with Gemini 2.0 - Beyond the Chatbot
Márton Kodok
 
PDF
Gemini 2.0 and Vertex AI for Innovation Workshop
Márton Kodok
 
PDF
Function Calling with the Vertex AI Gemini API
Márton Kodok
 
PDF
Vector search and multimodal embeddings in BigQuery
Márton Kodok
 
PDF
BigQuery Remote Functions for Dynamic Mapping of E-mobility Charging Networks
Márton Kodok
 
PDF
Build applications with generative AI on Google Cloud
Márton Kodok
 
PDF
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Márton Kodok
 
PDF
DevBCN Vertex AI - Pipelines for your MLOps workflows
Márton Kodok
 
PDF
Discover BigQuery ML, build your own CREATE MODEL statement
Márton Kodok
 
PDF
Cloud Run - the rise of serverless and containerization
Márton Kodok
 
PDF
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
Márton Kodok
 
PDF
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Márton Kodok
 
PDF
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
PDF
Cloud Workflows What's new in serverless orchestration and automation
Márton Kodok
 
PDF
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
PDF
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
PDF
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
PDF
BigdataConference Europe - BigQuery ML
Márton Kodok
 
PDF
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
Márton Kodok
 
PDF
BigQuery ML - Machine learning at scale using SQL
Márton Kodok
 
AI Agents with Gemini 2.0 - Beyond the Chatbot
Márton Kodok
 
Gemini 2.0 and Vertex AI for Innovation Workshop
Márton Kodok
 
Function Calling with the Vertex AI Gemini API
Márton Kodok
 
Vector search and multimodal embeddings in BigQuery
Márton Kodok
 
BigQuery Remote Functions for Dynamic Mapping of E-mobility Charging Networks
Márton Kodok
 
Build applications with generative AI on Google Cloud
Márton Kodok
 
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Márton Kodok
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
Márton Kodok
 
Discover BigQuery ML, build your own CREATE MODEL statement
Márton Kodok
 
Cloud Run - the rise of serverless and containerization
Márton Kodok
 
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
Márton Kodok
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Márton Kodok
 
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
Cloud Workflows What's new in serverless orchestration and automation
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Márton Kodok
 
BigdataConference Europe - BigQuery ML
Márton Kodok
 
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
Márton Kodok
 
BigQuery ML - Machine learning at scale using SQL
Márton Kodok
 
Ad

Recently uploaded (20)

PPTX
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PDF
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
PPTX
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PPTX
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
PPTX
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PDF
Executive Business Intelligence Dashboards
vandeslie24
 
PPT
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
Tally software_Introduction_Presentation
AditiBansal54083
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Executive Business Intelligence Dashboards
vandeslie24
 
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 

Introduction to Databases - query optimizations for MySQL

  • 1. Databases Kodok Márton ■ simb.ro ■ kodokmarton.eu ■ twitter.com/martonkodok ■ facebook.com/marton.kodok ■ stackoverflow.com/users/243782/pentium10 23 May, 2013 @Sapientia
  • 2. Relational Databases ● A relational database is essentially a group of tables (entities). ● Tables are made up of columns and rows. ● Those tables have constraints, and relationships are defined between them. ● Relational databases are queried using SQL ● Multiple tables being accessed in a single query are "joined" together, typically by a criteria defined in the table relationship columns. ● Normalization is a data-structuring model used with relational databases that ensures data consistency and removes data duplication.
  • 3. Non-Relational Databases (NoSQL) ● Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name, or key, together with its value. Examples of key-value stores are Riak and MongoDB. Some key-value stores, such as Redis, allow each value to have a type, such as "integer", which adds functionality. ● Document databases can contain many different key-value pairs, or key-array pairs, or even nested documents ● Graph stores are used to store information about networks, such as social connections ● Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.
  • 4. SQL vs NoSQL The purpose of this presentation is NOT about SQL vs NoSQL. Let’s be blunt: none of them are difficult, we need both of them. We evolve.
  • 5. Spreadsheets/Frontends Excel and Access are not a database. Let’s be blunt: Excel does not need more than 256 columns and 65 536 rows.
  • 6. DDL (Data Definition Language) CREATE TABLE employees ( id INTEGER(11) PRIMARY KEY, first_name VARCHAR(50) NULL, last_name VARCHAR(75) NOT NULL, dateofbirth DATE NULL ); ALTER TABLE sink ADD bubbles INTEGER; ALTER TABLE sink DROP COLUMN bubbles; DROP TABLE employees; RENAME TABLE My_table TO Tmp_table; TRUNCATE TABLE My_table; CREATE, ALTER, DROP, RENAME, TRUNCATE
  • 7. SQL (Structured Query Language) INSERT INTO My_table (field1, field2, field3) VALUES ('test', 'N', NULL); SELECT Book.title AS Title, COUNT(*) AS Authors FROM Book JOIN Book_author ON Book.isbn = Book_author.isbn GROUP BY Book.title; UPDATE My_table SET field1 = 'updated value' WHERE field2 = 'N'; DELETE FROM My_table WHERE field2 = 'N'; CRUD (CREATE, READ, UPDATE, DELETE)
  • 8. Indexes (fast lookup + constraints) Constraint: a. PRIMARY b. UNIQUE c. FOREIGN KEY ● Index reduce the amount of data the server has to examine ● can speed up reads but can slow down inserts and updates ● is used to enforce constraints Type: 1. BTREE - can be used for look-ups and sorting - can match the full value - can match a leftmost prefix ( LIKE 'ma%' ) 2. HASH - only supports equality comparisons: =, IN() - can't be used for sorting 3. FULLTEXT - only MyISAM tables - compares words or phrases - returns a relevance value
  • 9. Some Queries That Can Use BTREE Index ● point look-up SELECT * FROM students WHERE grade = 100; ● open range SELECT * FROM students WHERE grade > 75; ● closed range SELECT * FROM students WHERE 70 < grade < 80; ● special range SELECT * FROM students WHERE name LIKE 'ma%'; Multi Column Indexes Useful for sorting/where CREATE INDEX `salary_name_idx` ON emp(salary, name); SELECT salary, name FROM emp ORDER BY salary, name; (5000, 'john') < (5000, 'michael') (9000, 'philip') < (9999, 'steve')
  • 10. Indexing InnoDB Tables ● data is clustered by primary key ● primary key is implicitly appended to all indexes CREATE INDEX fname_idx ON emp(firstname); actually creates KEY(firstname, id) internally Avoid long primary keys! TS-09061982110055-12345 349950002348857737488334 supercalifragilisticexpialidocious
  • 11. How MySQL Uses Indexes ● looking up data ● joining tables ● sorting ● avoiding reading data MySQL chooses only ONE index per table.
  • 12. DON'Ts ● don't follow optimization rules blindly ● don't create an index for every column in your table thinking that it will make things faster ● don't create duplicate indexes ex. BAD: create index firstname_ix on Employee(firstname); create index lastname_ix on Employee(lastname); GOOD: create index first_last_ix on Employee(firstname, lastname); create index id_ix on Employee(id);
  • 13. DOs ● use index for optimizing look-ups, sorting and retrieval of data ● use short primary keys if possible when using the InnoDB storage engine ● extend index if you can, instead of creating new indexes ● validate performance impact as you're doing changes ● remove unused indexes
  • 14. Speeding it up ● proper table design (3nf) ● understand query cache (internal of MySQL) ● EXPLAIN syntax ● proper indexes ● MySQL server daemon optimization ● Slow Query Logs ● Stored Procedures ● Profiling ● Redundancy -> Master : Slave ● Sharding ● mix database servers based on business logic -> Memcache, Redis, MongoDB
  • 15. EXPLAIN EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 AND registration_status > 0 table possible_keys key rows attendees NULL NULL 14052 The three most important columns returned by EXPLAIN 1) Possible keys ● All the possible indexes which MySQL could have used ● Based on a series of very quick lookups and calculations 2) Chosen key 3) Rows scanned ● Indication of effort required to identify your result set -> Interpreting the results
  • 16. Interpreting the results EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 AND registration_status > 0 table possible_keys key rows attendees NULL NULL 14052 ● No suitable indexes for this query ● MySQL had to do a full table scan ● Full table scans are almost always the slowest query ● Full table scans, while not always bad, are usually an indication that an index is required -> Adding indexes
  • 17. Adding indexes ALTER TABLE ADD INDEX conf (conference_id); ALTER TABLE ADD INDEX reg (registration_status); EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 AND registration_status > 0 table possible_keys key rows attendees conf, reg conf 331 ● MySQL had two indexes to choose from, but discarded “reg” ● “reg” isn't sufficiently unique ● The spread of values can also be a factor (e.g when 99% of rows contain the same value) ● Index “uniqueness” is called cardinality ● There is scope for some performance increase... Lower server load, quicker response -> Choosing a better index
  • 18. Choosing a better index ALTER TABLE ADD INDEX reg_conf_index (registration_status, conference_id); EXPLAIN SELECT * FROM attendees WHERE registration_status > 0 AND conference_id = 123 table possible_keys key rows attendees reg, conf, reg_conf_index reg_conf_index 204 ● reg_conf_index is a much better choice ● Note that the other two keys are still available, just not as effective ● Our query is now served well by the new index -> Using it wrong
  • 19. Watch for WHERE column order DELETE INDEX conf; DELETE INDEX reg; EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 table possible_keys key rows attendees NULL NULL 14052 ● Without the “conf” index, we're back to square one ● The order in which fields were defined in a composite index affects whether it is available for use in a query ● Remember, we defined our index : (registration_status, conference_id) Potential workaround: EXPLAIN SELECT * FROM attendees WHERE registration_status >= -1 AND conference_id = 123 table possible_keys key rows attendees reg_conf_index reg_conf_index 204
  • 20. JOINs ● JOINing together large data sets (>= 10,000) is really where EXPLAIN becomes useful ● Each JOIN in a query gets its own row in EXPLAIN ● Make sure each JOIN condition is FAST ● Make sure each joined table is getting to its result set as quickly as possible ● The benefits compound if each join requires less effort
  • 21. Simple JOIN example EXPLAIN SELECT * FROM conferences c JOIN attendees a ON c.conference_id = a.conference_id WHERE conferences.location_id = 2 AND conferences.topic_id IN (4,6,1) AND attendees.registration_status > 1 table type possible_keys key rows conferences ref conference_topic conference_topic 15 attendees ALL NULL NULL 14052 ● Looks like I need an index on attendees.conference_id ● Another indication of effort, aside from rows scanned ● Here, “ALL” is bad – we should be aiming for “ref” ● There are 13 different values for “type” ● Common values are: const, eq_ref, ref, fulltext, index_merge, range, all http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
  • 22. The "extra" column With every EXPLAIN, you get an “extra” column, which shows additional operations invoked to get your result set. Some example “extra” values: ● Using index ● Using where ● Using temporary table ● Using filesort There are many more “extra” values which are discussed in the MySQL manual: Distinct, Full scan, Impossible HAVING, Impossible WHERE, Not exists http://dev.mysql.com/doc/refman/5.0/en/explain-output.html#explain-join-types table type possible_keys key rows extra attendees ref conf conf 331 Using where Using filesort
  • 23. Using filesort Avoid, because: ● Doesn't use an index ● Involves a full scan of your result set ● Employs a generic (i.e. one size fits all) algorithm ● Creates temporary tables ● Uses the filesystem (seek) ● Will get slower with more data It's not all bad... ● Perfectly acceptable provided you get to your ● result set as quickly as possible, and keep it predictably small ● Sometimes unavoidable - ORDER BY RAND() ● ORDER BY operations can use indexes to do the sorting!
  • 24. Using filesort EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 ORDER BY surname ALTER TABLE attendees ADD INDEX conf_surname (conference_id, surname); We've avoided a filesort! table possible_keys key rows extra attendees conference_id conference_id 331 Using filesort MySQL is using an index, but it's sorting the results slowly table possible_keys key rows extra attendees conference_id, conf_surname conf_surname 331
  • 25. NoSQL engines ● Redis ● MongoDB ● Cassandra ● CouchDB ● DynamoDB ● Riak ● Membase ● HBase Data is created, updated, deleted, retrieved using API calls. All application and data integrity logic is contained in the application code.
  • 26. Redis ● Written In: C/C++ ● Main point: Blazing fast ● License: BSD ● Protocol: Telnet-like ● Disk-backed in-memory database, ● Master-slave replication ● Simple values or hash tables by keys, ● but complex operations like ZREVRANGEBYSCORE. ● INCR & co (good for rate limiting or statistics) ● Has sets (also union/diff/inter) ● Has lists (also a queue; blocking pop) ● Has hashes (objects of multiple fields) ● Sorted sets (high score table, good for range queries) ● Redis has transactions (!) ● Values can be set to expire (as in a cache) ● Pub/Sub lets one implement messaging (!) Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory). For example: Stock prices. Analytics. Real-time data collection. Real-time communication. SET uid:1000:username antirezr uid:1000:followers uid:1000:following GET foo => bar INCR foo => 11 LPUSH mylist a (now mylist holds one element list 'a') LPUSH mylist b (now mylist holds 'b,a') LPUSH mylist c (now mylist holds 'c,b,a') SADD myset a SADD myset b SADD myset foo SADD myset bar SCARD myset => 4 SMEMBERS myset => bar,a,foo,b
  • 27. MongoDB ● Written In: C++ ● Main point: Retains some friendly properties of SQL. (Query, index) ● License: AGPL (Drivers: Apache) ● Protocol: Custom, binary (BSON) ● Master/slave replication (auto failover with replica sets) ● Sharding built-in ● Queries are javascript expressions / Run arbitrary javascript functions server-side ● Uses memory mapped files for data storage ● Performance over features ● Journaling (with --journal) is best turned on ● On 32bit systems, limited to ~2.5Gb ● An empty database takes up 192Mb ● Has geospatial indexing Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. For example: For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.
  • 28. MongoDB -> JSON The MongoDB examples assume a collection named users that contain documents of the following prototype: { _id: ObjectID("509a8fb2f3f4948bd2f983a0"), user_id: "abc123", age: 55, status: 'A' }
  • 29. MongoDB -> insert SQL MongoDB CREATE TABLE users ( id MEDIUMINT NOT NULL AUTO_INCREMENT, user_id Varchar(30), age Number, status char(1), PRIMARY KEY (id) ) db.users.insert( { user_id: "abc123", age: 55, status: "A" } ) Implicitly created on first insert operation. The primary key _id is automatically added if _id field is not specified.
  • 30. MongoDB -> Alter, Index, Select SQL MongoDB ALTER TABLE users ADD join_date DATETIME db.users.update( { }, { $set: { join_date: new Date() } }, { multi: true } ) CREATE INDEX idx_user_id_asc ON users(user_id) db.users.ensureIndex( { user_id: 1 } ) SELECT user_id, status FROM users WHERE status = "A" db.users.find( { status: "A" }, { user_id: 1, status: 1, _id: 0 } ) SELECT * FROM users WHERE status = "A" OR age = 50 db.users.find( { $or: [ { status: "A" } , { age: 50 } ] } )
  • 31. Job Trends from Indeed.com