SlideShare a Scribd company logo
MySQL Performance
   Optimization
            Part I

         Abhijit Mondal

 Software Engineer at HolidayIQ
Contents
InnoDB or MyISAM or is there something better ?


Choosing optimal data types


Normalization vs. Denormalization


Cache and Summary Tables


Explaining “EXPLAIN”
InnoDB vs. MyISAM
●   “You should use InnoDB for your tables unless you have a compelling need to use a
    different engine” - High Performance MySQL by Peter Zaitsev
●   InnoDB :
    Pros-
             1. Row based locking mechanism, enabling scaling of insert and update
    queries. Whole table is not locked when one client is writing to selected rows, only
    those rows are locked (and any gaps in between- “phantom rows”).
             2. Clustering by primary key for faster lookups and ordering.
             3. High concurrency.
             4. Transactional, crash-safe, better online backup capability.
             5. Adaptive Hash index construction from B Tree indexes for faster
    lookups from main memory.

    Cons-
            1. Slower writes ( insert, update queries).
            2. Slower BLOB handling.
            3. COUNT(*) queries require full table scans.
InnoDB vs. MyISAM
●   MyISAM :
    Pros-
            1. Faster reads and writes for small to medium sized tables.
            2. COUNT(*) queries are fast. Separate field that keeps track of number of
    columns.
            3. Better for FULL Text Searching. But InnoDB can now use Sphinx for
    Full Text searching.

    Cons-
             1. Non transactional, Data loss issues during crashes.
             2. Table level locking. Entire table locked in the event of read and write,
    but can insert rows while select query is being processed.
             3. Insert and update queries are not scalable, concurrency issues.
●   Memory Engine for Temporary tables : Hash indexes for faster select queries from
    temporary tables . All data stored in memory. Data lost after server restart. Example
    usage – Mapping cities/attractions to region/countries, Caching data, Temporary
    summary tables for joins.
Choosing the optimal data type
●   Always choose the smallest data type that is large enough for the largest value that it
    is representing. Smaller data type takes up lesser space in memory and CPU cache.
●   Given an option for integer or character, should choose integer because due to
    character sets and sorting rules character comparisons are complicated.
●   Unless the requirement for storing NULL value inside a field, always choose NOT
    NULL. Null values makes index construction, index stats and value comparisons
    more complicated. They also require more space. When a nullable is indexed it
    requires and extra byte per entry. InnoDB handles NULL better (only single bit)
    than MyISAM.
●   TIMESTAMP vs. DATETIME: TIMESTAMP takes half as much space (4 bytes) as
    DATETIME (8 bytes) and also has auto updating feature.
●   Using UNSIGNED integer types for AUTO_INCREMENT primary key fields
    (unless negative integers are explicitly required) . For storing cities in India (around
    1000 cities) use UNSIGNED SMALLINT, it takes values from 0 to 65535 enough
    to hold all the cities in India. Whereas INT (as per current implementation) would
    use 32 bits compared to SMALLINT(16 bits).
Choosing the optimal data type
●   VARCHAR vs. CHAR : VARCHAR is variable length data type while CHAR is
    fixed length. For shorter strings VARCHAR saves space but when updated rows
    may grow or shrink depending on the update value. VARCHAR uses 1 byte extra to
    store the length of the value if length is less than 255 bytes else it use 2 additional
    bytes. Also VARCHAR is suitable for columns which are not updated frequently as
    this requires dynamic size adjustment everytime a value is updated.




●   VARCHAR is suitable for storing city/state/country/region/attraction names as these
    values are not updated much. Whereas CHAR may be suitable for storing MD5
    passwords (fixed length) or user names/activities ( updated and inserted frequently ).
Choosing the optimal data type
●   ENUM : For storing strings values which have fixed and small sample space use
    ENUM data type. Eg. Gender (M or F), is_active (1 or 0), Day of week etc.
    Create table activity(id primary key not null auto_increment, activity varchar(20),
    day_of_week enum('sun','mon','tue','wed','thu','fri','sat'));
    ENUM values are stored as integers (TINYINT) in table hence comparisons are
    faster and takes less space.
    But joins between ENUM and VARCHAR or CHAR is less efficient as ENUM
    needs to be converted into one of those types first then comparison is done.
●   BLOB and TEXT fields cannot be indexed.
●   Using SET to combine many true/false values into single column.
    Create table test1(perms set('can_read','can_write','can_delete'));
    Insert into test1(perms) values('can_read','can_delete');
    Select perms from test1 where find_in_set('can_delete',perms);
●   Identifier for table joins should be of the same data type to improve performance by
    reducing type conversion.
    Select count(*) from destination join attractions using(destinationid, active,
    countryid);
Normalization vs. Denormalization
●   Normalization :
        Pros :
        1. Normalized updates are usually faster than denormalized updates.
        2. No duplicate data so there is less data to change.
        3. Tables are usually smaller so they fit better in memory and perform better.
        4. Lack of redundant data means less need for GROUP BY or DISTINCT
    queries.

        Cons :
        1. JOINS required to retrieve values from Normalized tables. This is usually
    expensive and would have benefitted with indexing on a denormalized table.
●   Eg. Find the users and their reviews such that review given between 4th March and
    30th June and order by user's age. Expensive join required.

    SELECT u.user_name, r.review from user u join review r using(user_id) where
    r.date_reviewed between ('2012-03-04','2012-06-30') order by u.age limit 100;
Normalization vs. Denormalization
●   Denormalization :
            Pros :
            1. No JOINS required for denormalized data. Full table scan without
    indexing is still faster than joins that doesn't fit into memory.

             Cons :
             1. Duplicate data issue arises. Denormalized table has large rows that are
    almost same except for one single column. Happens for many-to-many relation
    mapping in a single table.
             2. Inconsistencies in data during updates may arise and updates are
    expensive.
●   Eg. Find the users and their reviews such that review given between 4th March and
    30th June and order by user's age. Index on (date_reviewed, age) will greatly
    increase the performance of this query.

    SELECT user_name, review from user_review where date_reviewed between
    ('2012-03-04','2012-06-30') order by age limit 100;
Normalization vs. Denormalization
●   When same tables are joined frequently in queries it is better to denormalize one of
    the table by duplicating data from the other table. Insert, updates and deletes can be
    made consistent by creating triggers on one of them. For eg. In the case of user and
    reviews table, copy review,review_id and date_reviewed from reviews to user. Then
    create triggers for insert, update and delete on reviews table.
●   DELIMITER #
    CREATE TRIGGER `after_insert_in_reviews` after insert on reviews
    FOR EACH ROW BEGIN
        INSERT INTO user(user_id, review, date_reviewed) values(NEW.user_id,
    review,NOW());
    END#
    DELIMITER ;
●   DELIMITER #
    CREATE TRIGGER `after_delete_in_reviews` after delete on reviews
    FOR EACH ROW BEGIN
       DELETE FROM user where review_id=OLD.review_id;
    END#
    DELIMITER ;
Summary and Cache Tables
●   Consider the situations:
         1. There are 3 tables for user, reviews and destination. We want to analyze the
    number of reviews for all destinations in a particular city grouped by destination and
    in a particular user age range and in another case grouped by user gender. So we
    write the two queries as:
●   SELECT destination.destname, count(review.review_id) as review_count from user
    join reviews join destination where user.age between 20 and 30 and
    user.userid=reviews.user_id and reviews.destination_id=destination.destid and
    destination.city='Bangalore' group by destination.destid;
●   SELECT user.gender, count(review.review_id) as review_count from user join
    reviews join destination where user.age between 20 and 30 and
    user.userid=reviews.user_id and reviews.destination_id=destination.destid and
    destination.city='Bangalore' group by user.gender;
●   Instead of doing expensive joins on 3 large tables everytime where on the summary
    of data differs, we can create a summary table and update it periodically using a
    cronjob.
Summary and Cache Tables
●   CREATE table user_rev_dest_summary SELECT * from user join reviews join
    destination where user.userid=reviews.user_id and
    reviews.destination_id=destination.destid;
●   ALTER table user_rev_dest_summary add index city_index(age, city);
●   SELECT dest_name,count(review_id) as review_count from
    user_rev_dest_summary where age between 20 and 30 and city='Bangalore' group
    by destid;
●   SELECT gender,count(review_id) as review_count from user_rev_dest_summary
    where age between 20 and 30 and city='Bangalore' group by gender;
●   Using summary table our query performance has greatly improved but if
    user,destination or review tables are updated frequently our summary data may
    become stale. So need to decide at what interval to update the summary table.
Explaining “Explain”
●   EXPLAIN output columns :
Explaining “Explain”
●   EXPLAIN output columns : Important columns are type, possible_keys, key, rows
    and Extra.
●   EXPLAIN extended select dest.`Destination_name`, attr.`attractionid`,
    attr.`attractionname` from destination as dest,attractions as attr,hotels_by_locality
    as hl where dest.`Destination_id`=attr.`destinationid` and dest.`CountryID`='1' and
    dest.`other_destination`='0' and attr.`active`='1' and hl.`typeid`=attr.`attractionid`;




●   Types of “type” : From best to worse
         1. const - The table has at most one matching row, which is read at the start of
    the query. const tables are very fast because they are read only once.
    const is used when you compare all parts of a PRIMARY KEY or UNIQUE index
    to constant values.
         SELECT * FROM attractions WHERE attraction_id=8385;
Explaining “Explain”
●   Types of “type” : contd.
         2. eq_ref - One row is read from this table for each combination of rows from
    the previous tables. It is used when all parts of an index are used by the join and the
    index is a PRIMARY KEY or UNIQUE NOT NULL index.
    SELECT * from resort join city using(CityID); (CityId is primary key of city).

        3. ref - All rows with matching index values are read from this table for each
    combination of rows from the previous tables.
    SELECT * from resort join city using(StateID); (index on city.StateId but many
    rows in city having same state id).

        4. fulltext - The join is performed using a FULLTEXT index.
        5. range - Only rows that are in a given range are retrieved, using an index to
    select the rows. The key column in the output row indicates which index is used.
        SELECT * from reviews where date_reviewed between '2012-06-30' and
    '2012-08-07'; (p.s. Index on date_reviewed doesn't work with DATE() functions).
Explaining “Explain”
●   Types of “type” : contd.
        6. index - This join type is the same as ALL, except that only the index tree is
    scanned. This usually is faster than ALL because the index file usually is smaller
    than the data file.
        SELECT StateID from resort; (covering index on StateID).
        7. ALL - A full table scan is done for each combination of rows from the
    previous tables. Avoid this by adding index to the appropriate table.
●   The common “Extra” 's :
         1. Using filesort - MySQL must do an extra pass to find out how to retrieve the
    rows in sorted order. The sort is done by going through all rows according to the
    join type and storing the sort key and pointer to the row for all rows that match the
    WHERE clause.
         SELECT resort.Location from resort order by resort.StateID; (no index on
    Location).
Explaining “Explain”
●   The common “Extra” 's : contd.
         2. Using index - The column information is retrieved from the table using only
    information in the index tree without having to do an additional seek to read the
    actual row. This strategy can be used when the query uses only columns that are
    part of a single index. (covering indexes).
         SELECT resort.StateID from resort order by resort.Destination_id;       (index
    on Destination_id, StateID, StateID picked from index only after sorting by
    Destination_id ).
         3. Using temporary - To resolve the query, MySQL needs to create a temporary
    table to hold the result. This typically happens if the query contains GROUP BY
    and ORDER BY clauses that list columns differently.
         4. Using where - A WHERE clause is used to restrict which rows to match
    against the next table or send to the client. Even if you are using an index for all
    parts of a WHERE clause, you may see Using where if the column can be NULL.
         SELECT resort.Location from resort where StateID!='NULL';
References
●   High Performance MySQL by Baron Schwartz, Peter Zaitsev and Vadim
    Tkachenko.
●   http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/
●   http://www.mysqlperformanceblog.com/2009/01/12/should-you-move-from-myisam-to-in
●   http://www.techrepublic.com/blog/10things/10-ways-to-screw-up-your-database-design/18



                                Thank You

More Related Content

Viewers also liked (20)

PDF
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
Aurimas Mikalauskas
 
PDF
JSON Web Tokens Will Improve Your Life
John Anderson
 
PDF
ขั้นตอนศาสนพิธี
นายสมหมาย ฉิมมาลี
 
PDF
1 ea5ea59 39b4-4e4c-a0cd183077e7b0aa
Carlos Carvalho
 
PPTX
Year 7 websites evaluation
frances20
 
PPT
Lermontov
Armine
 
PPTX
Way storm 行動廣告簡介(games0124)
MaVis Tseng
 
PPTX
Pt 4
Sammi Wilde
 
PPTX
Romanticism
ms_faris
 
PPT
C 4
Les Davy
 
PDF
Meltwater Buzz Service Overview
ammit0724
 
PPT
How to hire a relief Dr without getting into Jeopardy!
Raymond J. Ramirez DVM speaking
 
PPTX
Aef4 week 2
Les Davy
 
PDF
HTML an introduction
Niamh Foley
 
PDF
Cayla t
lesleymccardle
 
PPTX
Презентация П.Хауза на расширенное совещание АО "Самрук-Қазына" 07.02.2013
АО "Самрук-Казына"
 
PDF
Extensive sampling of basidiomycete genomes demonstrates inadequacy of the wh...
Gomez García
 
PPT
Power Notes Atomic Structure Day 3
jmori1
 
PPT
Libaisi Common Interest Youth Groups' contribution to food security in Wester...
futureagricultures
 
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
Aurimas Mikalauskas
 
JSON Web Tokens Will Improve Your Life
John Anderson
 
ขั้นตอนศาสนพิธี
นายสมหมาย ฉิมมาลี
 
1 ea5ea59 39b4-4e4c-a0cd183077e7b0aa
Carlos Carvalho
 
Year 7 websites evaluation
frances20
 
Lermontov
Armine
 
Way storm 行動廣告簡介(games0124)
MaVis Tseng
 
Romanticism
ms_faris
 
Meltwater Buzz Service Overview
ammit0724
 
How to hire a relief Dr without getting into Jeopardy!
Raymond J. Ramirez DVM speaking
 
Aef4 week 2
Les Davy
 
HTML an introduction
Niamh Foley
 
Презентация П.Хауза на расширенное совещание АО "Самрук-Қазына" 07.02.2013
АО "Самрук-Казына"
 
Extensive sampling of basidiomycete genomes demonstrates inadequacy of the wh...
Gomez García
 
Power Notes Atomic Structure Day 3
jmori1
 
Libaisi Common Interest Youth Groups' contribution to food security in Wester...
futureagricultures
 

Similar to MySQL Performance Optimization (20)

PDF
MySQL Performance Optimization
Mindfire Solutions
 
PPT
Explain that explain
Fabrizio Parrella
 
ODP
Mysql For Developers
Carol McDonald
 
PDF
Scaling MySQL Strategies for Developers
Jonathan Levin
 
PDF
Zurich2007 MySQL Query Optimization
Hiệp Lê Tuấn
 
PDF
Zurich2007 MySQL Query Optimization
Hiệp Lê Tuấn
 
PDF
Introduction to Databases - query optimizations for MySQL
Márton Kodok
 
PDF
query optimization
Dimara Hakim
 
PDF
Database Design most common pitfalls
Federico Razzoli
 
PPT
Indexing
Davood Barfeh
 
PPTX
Tunning sql query
vuhaininh88
 
PDF
Quick Wins
HighLoad2009
 
PPTX
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
Dave Stokes
 
PDF
U C2007 My S Q L Performance Cookbook
guestae36d0
 
PPT
MySQL Performance Secrets
OSSCube
 
PDF
MySQL Query And Index Tuning
Manikanda kumar
 
PDF
Mysql query optimization
Baohua Cai
 
PPTX
MySQL 101
Jason Nguyen
 
PDF
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
Dave Stokes
 
PDF
MySQL best practices at Trovit
Ivan Lopez
 
MySQL Performance Optimization
Mindfire Solutions
 
Explain that explain
Fabrizio Parrella
 
Mysql For Developers
Carol McDonald
 
Scaling MySQL Strategies for Developers
Jonathan Levin
 
Zurich2007 MySQL Query Optimization
Hiệp Lê Tuấn
 
Zurich2007 MySQL Query Optimization
Hiệp Lê Tuấn
 
Introduction to Databases - query optimizations for MySQL
Márton Kodok
 
query optimization
Dimara Hakim
 
Database Design most common pitfalls
Federico Razzoli
 
Indexing
Davood Barfeh
 
Tunning sql query
vuhaininh88
 
Quick Wins
HighLoad2009
 
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
Dave Stokes
 
U C2007 My S Q L Performance Cookbook
guestae36d0
 
MySQL Performance Secrets
OSSCube
 
MySQL Query And Index Tuning
Manikanda kumar
 
Mysql query optimization
Baohua Cai
 
MySQL 101
Jason Nguyen
 
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
Dave Stokes
 
MySQL best practices at Trovit
Ivan Lopez
 
Ad

More from Abhijit Mondal (8)

PDF
Pagerank
Abhijit Mondal
 
PDF
Poster Presentation
Abhijit Mondal
 
ODP
Mysql Performance Optimization Indexing Algorithms and Data Structures
Abhijit Mondal
 
PDF
My MSc. Project
Abhijit Mondal
 
PDF
Security protocols
Abhijit Mondal
 
PDF
Public Key Cryptography
Abhijit Mondal
 
PDF
Number Theory for Security
Abhijit Mondal
 
ODP
Quantum games
Abhijit Mondal
 
Pagerank
Abhijit Mondal
 
Poster Presentation
Abhijit Mondal
 
Mysql Performance Optimization Indexing Algorithms and Data Structures
Abhijit Mondal
 
My MSc. Project
Abhijit Mondal
 
Security protocols
Abhijit Mondal
 
Public Key Cryptography
Abhijit Mondal
 
Number Theory for Security
Abhijit Mondal
 
Quantum games
Abhijit Mondal
 
Ad

Recently uploaded (20)

PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Digital Circuits, important subject in CS
contactparinay1
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 

MySQL Performance Optimization

  • 1. MySQL Performance Optimization Part I Abhijit Mondal Software Engineer at HolidayIQ
  • 2. Contents InnoDB or MyISAM or is there something better ? Choosing optimal data types Normalization vs. Denormalization Cache and Summary Tables Explaining “EXPLAIN”
  • 3. InnoDB vs. MyISAM ● “You should use InnoDB for your tables unless you have a compelling need to use a different engine” - High Performance MySQL by Peter Zaitsev ● InnoDB : Pros- 1. Row based locking mechanism, enabling scaling of insert and update queries. Whole table is not locked when one client is writing to selected rows, only those rows are locked (and any gaps in between- “phantom rows”). 2. Clustering by primary key for faster lookups and ordering. 3. High concurrency. 4. Transactional, crash-safe, better online backup capability. 5. Adaptive Hash index construction from B Tree indexes for faster lookups from main memory. Cons- 1. Slower writes ( insert, update queries). 2. Slower BLOB handling. 3. COUNT(*) queries require full table scans.
  • 4. InnoDB vs. MyISAM ● MyISAM : Pros- 1. Faster reads and writes for small to medium sized tables. 2. COUNT(*) queries are fast. Separate field that keeps track of number of columns. 3. Better for FULL Text Searching. But InnoDB can now use Sphinx for Full Text searching. Cons- 1. Non transactional, Data loss issues during crashes. 2. Table level locking. Entire table locked in the event of read and write, but can insert rows while select query is being processed. 3. Insert and update queries are not scalable, concurrency issues. ● Memory Engine for Temporary tables : Hash indexes for faster select queries from temporary tables . All data stored in memory. Data lost after server restart. Example usage – Mapping cities/attractions to region/countries, Caching data, Temporary summary tables for joins.
  • 5. Choosing the optimal data type ● Always choose the smallest data type that is large enough for the largest value that it is representing. Smaller data type takes up lesser space in memory and CPU cache. ● Given an option for integer or character, should choose integer because due to character sets and sorting rules character comparisons are complicated. ● Unless the requirement for storing NULL value inside a field, always choose NOT NULL. Null values makes index construction, index stats and value comparisons more complicated. They also require more space. When a nullable is indexed it requires and extra byte per entry. InnoDB handles NULL better (only single bit) than MyISAM. ● TIMESTAMP vs. DATETIME: TIMESTAMP takes half as much space (4 bytes) as DATETIME (8 bytes) and also has auto updating feature. ● Using UNSIGNED integer types for AUTO_INCREMENT primary key fields (unless negative integers are explicitly required) . For storing cities in India (around 1000 cities) use UNSIGNED SMALLINT, it takes values from 0 to 65535 enough to hold all the cities in India. Whereas INT (as per current implementation) would use 32 bits compared to SMALLINT(16 bits).
  • 6. Choosing the optimal data type ● VARCHAR vs. CHAR : VARCHAR is variable length data type while CHAR is fixed length. For shorter strings VARCHAR saves space but when updated rows may grow or shrink depending on the update value. VARCHAR uses 1 byte extra to store the length of the value if length is less than 255 bytes else it use 2 additional bytes. Also VARCHAR is suitable for columns which are not updated frequently as this requires dynamic size adjustment everytime a value is updated. ● VARCHAR is suitable for storing city/state/country/region/attraction names as these values are not updated much. Whereas CHAR may be suitable for storing MD5 passwords (fixed length) or user names/activities ( updated and inserted frequently ).
  • 7. Choosing the optimal data type ● ENUM : For storing strings values which have fixed and small sample space use ENUM data type. Eg. Gender (M or F), is_active (1 or 0), Day of week etc. Create table activity(id primary key not null auto_increment, activity varchar(20), day_of_week enum('sun','mon','tue','wed','thu','fri','sat')); ENUM values are stored as integers (TINYINT) in table hence comparisons are faster and takes less space. But joins between ENUM and VARCHAR or CHAR is less efficient as ENUM needs to be converted into one of those types first then comparison is done. ● BLOB and TEXT fields cannot be indexed. ● Using SET to combine many true/false values into single column. Create table test1(perms set('can_read','can_write','can_delete')); Insert into test1(perms) values('can_read','can_delete'); Select perms from test1 where find_in_set('can_delete',perms); ● Identifier for table joins should be of the same data type to improve performance by reducing type conversion. Select count(*) from destination join attractions using(destinationid, active, countryid);
  • 8. Normalization vs. Denormalization ● Normalization : Pros : 1. Normalized updates are usually faster than denormalized updates. 2. No duplicate data so there is less data to change. 3. Tables are usually smaller so they fit better in memory and perform better. 4. Lack of redundant data means less need for GROUP BY or DISTINCT queries. Cons : 1. JOINS required to retrieve values from Normalized tables. This is usually expensive and would have benefitted with indexing on a denormalized table. ● Eg. Find the users and their reviews such that review given between 4th March and 30th June and order by user's age. Expensive join required. SELECT u.user_name, r.review from user u join review r using(user_id) where r.date_reviewed between ('2012-03-04','2012-06-30') order by u.age limit 100;
  • 9. Normalization vs. Denormalization ● Denormalization : Pros : 1. No JOINS required for denormalized data. Full table scan without indexing is still faster than joins that doesn't fit into memory. Cons : 1. Duplicate data issue arises. Denormalized table has large rows that are almost same except for one single column. Happens for many-to-many relation mapping in a single table. 2. Inconsistencies in data during updates may arise and updates are expensive. ● Eg. Find the users and their reviews such that review given between 4th March and 30th June and order by user's age. Index on (date_reviewed, age) will greatly increase the performance of this query. SELECT user_name, review from user_review where date_reviewed between ('2012-03-04','2012-06-30') order by age limit 100;
  • 10. Normalization vs. Denormalization ● When same tables are joined frequently in queries it is better to denormalize one of the table by duplicating data from the other table. Insert, updates and deletes can be made consistent by creating triggers on one of them. For eg. In the case of user and reviews table, copy review,review_id and date_reviewed from reviews to user. Then create triggers for insert, update and delete on reviews table. ● DELIMITER # CREATE TRIGGER `after_insert_in_reviews` after insert on reviews FOR EACH ROW BEGIN INSERT INTO user(user_id, review, date_reviewed) values(NEW.user_id, review,NOW()); END# DELIMITER ; ● DELIMITER # CREATE TRIGGER `after_delete_in_reviews` after delete on reviews FOR EACH ROW BEGIN DELETE FROM user where review_id=OLD.review_id; END# DELIMITER ;
  • 11. Summary and Cache Tables ● Consider the situations: 1. There are 3 tables for user, reviews and destination. We want to analyze the number of reviews for all destinations in a particular city grouped by destination and in a particular user age range and in another case grouped by user gender. So we write the two queries as: ● SELECT destination.destname, count(review.review_id) as review_count from user join reviews join destination where user.age between 20 and 30 and user.userid=reviews.user_id and reviews.destination_id=destination.destid and destination.city='Bangalore' group by destination.destid; ● SELECT user.gender, count(review.review_id) as review_count from user join reviews join destination where user.age between 20 and 30 and user.userid=reviews.user_id and reviews.destination_id=destination.destid and destination.city='Bangalore' group by user.gender; ● Instead of doing expensive joins on 3 large tables everytime where on the summary of data differs, we can create a summary table and update it periodically using a cronjob.
  • 12. Summary and Cache Tables ● CREATE table user_rev_dest_summary SELECT * from user join reviews join destination where user.userid=reviews.user_id and reviews.destination_id=destination.destid; ● ALTER table user_rev_dest_summary add index city_index(age, city); ● SELECT dest_name,count(review_id) as review_count from user_rev_dest_summary where age between 20 and 30 and city='Bangalore' group by destid; ● SELECT gender,count(review_id) as review_count from user_rev_dest_summary where age between 20 and 30 and city='Bangalore' group by gender; ● Using summary table our query performance has greatly improved but if user,destination or review tables are updated frequently our summary data may become stale. So need to decide at what interval to update the summary table.
  • 13. Explaining “Explain” ● EXPLAIN output columns :
  • 14. Explaining “Explain” ● EXPLAIN output columns : Important columns are type, possible_keys, key, rows and Extra. ● EXPLAIN extended select dest.`Destination_name`, attr.`attractionid`, attr.`attractionname` from destination as dest,attractions as attr,hotels_by_locality as hl where dest.`Destination_id`=attr.`destinationid` and dest.`CountryID`='1' and dest.`other_destination`='0' and attr.`active`='1' and hl.`typeid`=attr.`attractionid`; ● Types of “type” : From best to worse 1. const - The table has at most one matching row, which is read at the start of the query. const tables are very fast because they are read only once. const is used when you compare all parts of a PRIMARY KEY or UNIQUE index to constant values. SELECT * FROM attractions WHERE attraction_id=8385;
  • 15. Explaining “Explain” ● Types of “type” : contd. 2. eq_ref - One row is read from this table for each combination of rows from the previous tables. It is used when all parts of an index are used by the join and the index is a PRIMARY KEY or UNIQUE NOT NULL index. SELECT * from resort join city using(CityID); (CityId is primary key of city). 3. ref - All rows with matching index values are read from this table for each combination of rows from the previous tables. SELECT * from resort join city using(StateID); (index on city.StateId but many rows in city having same state id). 4. fulltext - The join is performed using a FULLTEXT index. 5. range - Only rows that are in a given range are retrieved, using an index to select the rows. The key column in the output row indicates which index is used. SELECT * from reviews where date_reviewed between '2012-06-30' and '2012-08-07'; (p.s. Index on date_reviewed doesn't work with DATE() functions).
  • 16. Explaining “Explain” ● Types of “type” : contd. 6. index - This join type is the same as ALL, except that only the index tree is scanned. This usually is faster than ALL because the index file usually is smaller than the data file. SELECT StateID from resort; (covering index on StateID). 7. ALL - A full table scan is done for each combination of rows from the previous tables. Avoid this by adding index to the appropriate table. ● The common “Extra” 's : 1. Using filesort - MySQL must do an extra pass to find out how to retrieve the rows in sorted order. The sort is done by going through all rows according to the join type and storing the sort key and pointer to the row for all rows that match the WHERE clause. SELECT resort.Location from resort order by resort.StateID; (no index on Location).
  • 17. Explaining “Explain” ● The common “Extra” 's : contd. 2. Using index - The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This strategy can be used when the query uses only columns that are part of a single index. (covering indexes). SELECT resort.StateID from resort order by resort.Destination_id; (index on Destination_id, StateID, StateID picked from index only after sorting by Destination_id ). 3. Using temporary - To resolve the query, MySQL needs to create a temporary table to hold the result. This typically happens if the query contains GROUP BY and ORDER BY clauses that list columns differently. 4. Using where - A WHERE clause is used to restrict which rows to match against the next table or send to the client. Even if you are using an index for all parts of a WHERE clause, you may see Using where if the column can be NULL. SELECT resort.Location from resort where StateID!='NULL';
  • 18. References ● High Performance MySQL by Baron Schwartz, Peter Zaitsev and Vadim Tkachenko. ● http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/ ● http://www.mysqlperformanceblog.com/2009/01/12/should-you-move-from-myisam-to-in ● http://www.techrepublic.com/blog/10things/10-ways-to-screw-up-your-database-design/18 Thank You