SlideShare a Scribd company logo
Slick Data Sharding How to Develop Scalable Data Applications With Drupal Tobby Hagler, Phase2 Technology
Don ' t Forget... Official DrupalCon London Party Batman Live World Arena Tour Buses leave main entrance Fairfield Halls at 4pm
Overview Purpose – Reasons for sharding Problems/Examples of a need for sharding Types of scaling and sharding Sharding options in Drupal
Scale: Horizontal vs Vertical Horizontal Scale Add more machines of the same type Vertical Scale Bigger and badder machines
Sharding What is sharding? Types of sharding – Partitioning and Federation How sharding helps Vs. typical monolithic Drupal database
What Is sharding? Simply put, sharding is physically breaking large data into smaller pieces (shards) of data. The trick is putting them back together again…
Reasons for Sharding Sharding for scaling your application Sharding for shared application data Leveraging specialized technologies Caching is a form of federated sharding
How Sharding Helps Scale your applications by reducing data sets in any single database Secure sensitive data by isolating it elsewhere Segregates data
Be Sure You ' ve Tried Everything Else Memcached Boost Module Load balanced web servers MySQL Master/Slave replicate Turning Views into Custom Queries
More Things To Try... Moar memory! Move .htacess to vhost config Apache tunes MySQL tunes Replace search with Apache Solr Optimizing PHP (custom compile) Apache Drupal module Replace Apache with nginx Switched to 3 rd  party services for comments Replace contrib modules with custom development
Typical Balanced Environment
Types of sharding Partitioning Horizontal Divides something into two parts Unshuffle Reduced index size Hard to do Federation Vertical A set of things Uses logical divisions Split up across physically different machines
Horizontal Partitioning Scaling your application’s performance Distributed data load This is the Shard of Last Resort
Even/Odd Partitions This is not Master/Master replication Rows are divided between physical databases Will require custom database API to properly achieve split rows Applies to node loads, entity loads, etc Achieved by auto_increment by N with different  starting offsets and application distributes writes in  round-robin fashion and via keyed mechanisms to distribute  reads and  reassemble data
Horizontally Partitioned Databases
Federation Vertically partitioning data by logical affiliation Sharding for shared application data Manageability – distributing data sets Security - Allows for exposing certain bits of data to other applications without exposing all
Vertically Scaled Databases
Application Sharding Not just sharding data Shard the components of your site
Sample Use Cases Collecting resumes within your existing site Building an ideation tool
Sharding Resume Data Accepting resumes for a large corporation Users submit resume via Webform Submit and process data into separate database Resume data is processed by internal HR software to evaluate potential employees
Sharding Schemas Same physical database, different schemas Uses database prefixing in settings.php ~ or ~ Different physical databases Uses db_set_active to switch db connections
Database Prefixes Handled in settings.php Uses MySQL’s dot separator to target different schemas Requires that the MySQL user used by Drupal has proper permissions Ex: db_1.users and db_2.users
Database Prefixes Drupal 6 $db_prefix  =  array  ( 'default'  => '', 'users'  => 'shared_.', 'sessions'  => 'shared_.', 'role'  => 'shared_.', 'authmap'  => 'shared_.', 'users_roles'  => 'shared_.', 'profile_fields' => 'shared_.', 'profile_values' => 'shared_.', );
Database Prefixes Drupal 7 $databases  =  array   ( 'default' =>  array   ( 'default' =>  array   ( 'prefix' =>  array ( 'default' => '', 'users'  => 'shared_.', 'sessions'  => 'shared_.', 'role'  => 'shared_.', 'authmap'  => 'shared_.', 'users_roles' => 'shared_.', ), ), ), );
Database Prefixes Tips, Tricks, and Caveats Can share user data between Drupal and Drupal 7 with table alters and strict prevention of Drupal 7 logins or user saves Should log in with the lower version of Drupal
Different Physical Databases Set up additional connections in  settings.php Change connections using  db_set_active() Use  db_set_active()  to switch back when done Watch for schema caching and watchdog errors
Different Databases Drupal 6 $db_url  =  array   (   ' default '  =>  ' mysql://user:pass@host1/db1 ' ,    ' second '   =>  ' mysql://user:pass@host2/db2 ',     'third'   =>  ' mysql://user:pass@host3/db3 ', );
Database Prefixes Drupal 7 $other_database  =  array  (   'database'  =>  'databasename',   'username'  =>  'username',    'password'  =>  'password',   'host'  =>  'localhost', 'driver’  =>  'mysql', ); Database :: addConnectionInfo (’ moduleKey ', 'default',  $other_database ) ; db_set_active (' moduleKey ') ; // Execute queries db_set_active ();
Switching Databases $schema  =   drupal_get_schema ( ' table_name ' ) ; db_set_active (' database_key ') ; // Execute queries Drupal_write_record ( ' table_name ' , $data) ; db_set_active () ;
Saving Data in Another Database Hook_install_schema() drupal_write_record() Keeps web site database smaller Can keep sensitive data offsite Partitioned tables can limit/protect your web site database from internal users
Saving Data in Another Database Resume data is submitted via form Form’s _submit function accepts final data Schema loads table definition Connects to the HR instance of MySQL Writes new record Uploads any files to private file space Switches database back HR Director can query new resumes
Using MongoDB MongoDB is a NoSQL database “ Schema-less” – data schema defined in code Fast Document-based Simpler to scale vertically than MySQL
MongoUK 10gen Conference in London, UK September 19, 2011 10gen.com/conferences/mongouk-sept-2011
MongoDB and Drupal drupal.org/project/mongodb 7.x allows for field storage, cache, sessions, and blocks to be stored in MongoDB Allows for connections to your own collections
MongoDB Data Four levels of objects Connection Database (schema) Collection Cursor (query results) Non-relational database Collections tend to be denormalized
MongoDB Documents Resumes.Resume: { first_name : " John ", last_name : " Smith ", title : " Web Developer ", address : { city : " London ", country : " UK " }, skills : [ ' PHP ', ' Drupal ', ' MySQL ' ], ssn : 123456789, }
Querying MongoDB Documents $applicant  =   $applicants -> find  ( array  ( ' username '  =>  ' Smith ' , ’ ssn ':  1 , ), array   ( ' first_name ’  =>  1 , ' last_name ’  =>  1 , ), );
MongoDB Sharing via REST Simple REST – included as part of MongoDB Sleepy Mongoose – REST interface for MongoDB (Python) MongoDB REST (Node.js)
Ideation REST Interface Get a list of all idea documents http://127.0.0.1:28017/ideation/ideas/ Get all comments for a specific idea http://127.0.0.1:28017/ideation/comments/… … ?filter__id=4a8acf6e7fbadc242de5b4f3… … &limit=10&offset=20 Will likely need a dedicated MongoDB REST inteface
Applications on Separate Web Tiers Application sharding  is  data sharding Separate Drupal instances Use mod_proxy as a pass-through Can used multiple load-balanced environments
Proxied Web Clusters
Questions?
Contact thagler@phase2technology  @phase2tech 703-548-6050 d.o: tobby Slides: agileapproach.com

More Related Content

What's hot (20)

KEY
Open Standards for the Semantic Web: XML / RDF(S) / OWL / SOAP
Pieter De Leenheer
 
KEY
Switching search to SOLR
Phase2
 
PDF
쉽게 이해하는 LOD
Myungjin Lee
 
PPTX
ImpalaToGo use case
David Groozman
 
PPTX
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Edureka!
 
PDF
Apache hive
Inthra onsap
 
PPTX
Custom Database Queries in WordPress
topher1kenobe
 
PDF
Introduction to ArangoDB (nosql matters Barcelona 2012)
ArangoDB Database
 
PDF
Big Data Processing with Spark and Scala
Edureka!
 
PPTX
Introduction to MongoDB and Workshop
AhmedabadJavaMeetup
 
PPTX
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Someshwar Kale
 
PDF
It takes two to tango! : Is SQL-on-Hadoop the next big step?
Srihari Srinivasan
 
PDF
Distributed percolator in elasticsearch
martijnvg
 
PDF
Elasticsearch in 15 minutes
David Pilato
 
PDF
Hadoop
Rajesh Piryani
 
PDF
Spark For The Business Analyst
Gustaf Cavanaugh
 
PDF
307d791b 3343-2e10-f78a-e1d50c7cf89a
vijaysrirams
 
PDF
Collecting and analyzing sensor data with hadoop or other no sql databases
Matteo Redaelli
 
PDF
Architecting and productionising data science applications at scale
samthemonad
 
DOCX
HDFS
Vardhman Kale
 
Open Standards for the Semantic Web: XML / RDF(S) / OWL / SOAP
Pieter De Leenheer
 
Switching search to SOLR
Phase2
 
쉽게 이해하는 LOD
Myungjin Lee
 
ImpalaToGo use case
David Groozman
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Edureka!
 
Apache hive
Inthra onsap
 
Custom Database Queries in WordPress
topher1kenobe
 
Introduction to ArangoDB (nosql matters Barcelona 2012)
ArangoDB Database
 
Big Data Processing with Spark and Scala
Edureka!
 
Introduction to MongoDB and Workshop
AhmedabadJavaMeetup
 
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Someshwar Kale
 
It takes two to tango! : Is SQL-on-Hadoop the next big step?
Srihari Srinivasan
 
Distributed percolator in elasticsearch
martijnvg
 
Elasticsearch in 15 minutes
David Pilato
 
Spark For The Business Analyst
Gustaf Cavanaugh
 
307d791b 3343-2e10-f78a-e1d50c7cf89a
vijaysrirams
 
Collecting and analyzing sensor data with hadoop or other no sql databases
Matteo Redaelli
 
Architecting and productionising data science applications at scale
samthemonad
 

Similar to Slick Data Sharding: Slides from DrupalCon London (20)

PDF
Open source Technology
Amardeep Vishwakarma
 
PDF
Shard-Query, an MPP database for the cloud using the LAMP stack
Justin Swanhart
 
PDF
My Sql And Search At Craigslist
MySQLConference
 
PDF
MySQL Cluster Scaling to a Billion Queries
Bernd Ocklin
 
KEY
MongoDB vs Mysql. A devops point of view
Pierre Baillet
 
KEY
Hybrid MongoDB and RDBMS Applications
Steven Francia
 
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
David Peyruc
 
PDF
MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen
ryanthiessen
 
PPTX
CodeFutures - Scaling Your Database in the Cloud
RightScale
 
PPTX
No sql introduction_v1.1.1
Fan Ang
 
PDF
Scaling up and accelerating Drupal 8 with NoSQL
OSInet
 
PPTX
MongoDB
Fayez Shayeb
 
PDF
Introduction to MongoDB Basics from SQL to NoSQL
Mayur Patil
 
PDF
Drupal Day 2011 - Drupal and the rise of the documents
DrupalDay
 
PDF
Scaling Social Games
Paolo Negri
 
PDF
Drupal and the rise of the documents
Claudio Beatrice
 
PPTX
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
ODP
MySQL And Search At Craigslist
Jeremy Zawodny
 
PPTX
MongoDB - A next-generation database that lets you create applications never ...
Ram Murat Sharma
 
PDF
Scaling MySQL -- Swanseacon.co.uk
Dave Stokes
 
Open source Technology
Amardeep Vishwakarma
 
Shard-Query, an MPP database for the cloud using the LAMP stack
Justin Swanhart
 
My Sql And Search At Craigslist
MySQLConference
 
MySQL Cluster Scaling to a Billion Queries
Bernd Ocklin
 
MongoDB vs Mysql. A devops point of view
Pierre Baillet
 
Hybrid MongoDB and RDBMS Applications
Steven Francia
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
David Peyruc
 
MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen
ryanthiessen
 
CodeFutures - Scaling Your Database in the Cloud
RightScale
 
No sql introduction_v1.1.1
Fan Ang
 
Scaling up and accelerating Drupal 8 with NoSQL
OSInet
 
MongoDB
Fayez Shayeb
 
Introduction to MongoDB Basics from SQL to NoSQL
Mayur Patil
 
Drupal Day 2011 - Drupal and the rise of the documents
DrupalDay
 
Scaling Social Games
Paolo Negri
 
Drupal and the rise of the documents
Claudio Beatrice
 
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
MySQL And Search At Craigslist
Jeremy Zawodny
 
MongoDB - A next-generation database that lets you create applications never ...
Ram Murat Sharma
 
Scaling MySQL -- Swanseacon.co.uk
Dave Stokes
 
Ad

More from Phase2 (20)

PDF
Phase2 Health and Wellness Brochure
Phase2
 
PDF
A Modern Digital Experience Platform
Phase2
 
PDF
Beyond websites: A Modern Digital Experience Platform
Phase2
 
PDF
Omnichannel For Government
Phase2
 
PDF
Bad camp2016 Release Management On Live Websites
Phase2
 
PDF
A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8
Phase2
 
PPTX
The Future of Digital Storytelling - Phase2 Talk
Phase2
 
PDF
Site building with end user in mind
Phase2
 
PDF
Fields, entities, lists, oh my!
Phase2
 
PDF
Performance Profiling Tools and Tricks
Phase2
 
PDF
NORTH CAROLINA Open Source, OpenPublic, OpenShift
Phase2
 
PDF
Drupal 8 for Enterprise: D8 in a Changing Digital Landscape
Phase2
 
PDF
Riding the Drupal Wave: The Future for Drupal and Open Source Content Manage...
Phase2
 
PDF
Site Building with the End User in Mind
Phase2
 
PDF
The Yes, No, and Maybe of "Can We Build That With Drupal?"
Phase2
 
PDF
User Testing For Humanitarian ID App
Phase2
 
PDF
Redhat.com: An Architectural Case Study
Phase2
 
PDF
The New Design Workflow
Phase2
 
PDF
Drupal 8, Don’t Be Late (Enterprise Orgs, We’re Looking at You)
Phase2
 
PDF
Memorial Sloan Kettering: Adventures in Drupal 8
Phase2
 
Phase2 Health and Wellness Brochure
Phase2
 
A Modern Digital Experience Platform
Phase2
 
Beyond websites: A Modern Digital Experience Platform
Phase2
 
Omnichannel For Government
Phase2
 
Bad camp2016 Release Management On Live Websites
Phase2
 
A FUTURE-FOCUSED DIGITAL PLATFORM WITH DRUPAL 8
Phase2
 
The Future of Digital Storytelling - Phase2 Talk
Phase2
 
Site building with end user in mind
Phase2
 
Fields, entities, lists, oh my!
Phase2
 
Performance Profiling Tools and Tricks
Phase2
 
NORTH CAROLINA Open Source, OpenPublic, OpenShift
Phase2
 
Drupal 8 for Enterprise: D8 in a Changing Digital Landscape
Phase2
 
Riding the Drupal Wave: The Future for Drupal and Open Source Content Manage...
Phase2
 
Site Building with the End User in Mind
Phase2
 
The Yes, No, and Maybe of "Can We Build That With Drupal?"
Phase2
 
User Testing For Humanitarian ID App
Phase2
 
Redhat.com: An Architectural Case Study
Phase2
 
The New Design Workflow
Phase2
 
Drupal 8, Don’t Be Late (Enterprise Orgs, We’re Looking at You)
Phase2
 
Memorial Sloan Kettering: Adventures in Drupal 8
Phase2
 
Ad

Recently uploaded (20)

PDF
Understanding AI Optimization AIO, LLMO, and GEO
CoDigital
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
 
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
PDF
GDG Cloud Southlake #44: Eyal Bukchin: Tightening the Kubernetes Feedback Loo...
James Anderson
 
PDF
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
PDF
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
PPTX
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
PPSX
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
PDF
Why aren't you using FME Flow's CPU Time?
Safe Software
 
PDF
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
PDF
Bridging CAD, IBM TRIRIGA & GIS with FME: The Portland Public Schools Case
Safe Software
 
PPTX
Smart Factory Monitoring IIoT in Machine and Production Operations.pptx
Rejig Digital
 
PPTX
Practical Applications of AI in Local Government
OnBoard
 
PDF
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
PDF
Unlocking FME Flow’s Potential: Architecture Design for Modern Enterprises
Safe Software
 
PDF
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
PDF
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 
Understanding AI Optimization AIO, LLMO, and GEO
CoDigital
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
 
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
GDG Cloud Southlake #44: Eyal Bukchin: Tightening the Kubernetes Feedback Loo...
James Anderson
 
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
Why aren't you using FME Flow's CPU Time?
Safe Software
 
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
Bridging CAD, IBM TRIRIGA & GIS with FME: The Portland Public Schools Case
Safe Software
 
Smart Factory Monitoring IIoT in Machine and Production Operations.pptx
Rejig Digital
 
Practical Applications of AI in Local Government
OnBoard
 
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
Unlocking FME Flow’s Potential: Architecture Design for Modern Enterprises
Safe Software
 
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 

Slick Data Sharding: Slides from DrupalCon London

  • 1. Slick Data Sharding How to Develop Scalable Data Applications With Drupal Tobby Hagler, Phase2 Technology
  • 2. Don ' t Forget... Official DrupalCon London Party Batman Live World Arena Tour Buses leave main entrance Fairfield Halls at 4pm
  • 3. Overview Purpose – Reasons for sharding Problems/Examples of a need for sharding Types of scaling and sharding Sharding options in Drupal
  • 4. Scale: Horizontal vs Vertical Horizontal Scale Add more machines of the same type Vertical Scale Bigger and badder machines
  • 5. Sharding What is sharding? Types of sharding – Partitioning and Federation How sharding helps Vs. typical monolithic Drupal database
  • 6. What Is sharding? Simply put, sharding is physically breaking large data into smaller pieces (shards) of data. The trick is putting them back together again…
  • 7. Reasons for Sharding Sharding for scaling your application Sharding for shared application data Leveraging specialized technologies Caching is a form of federated sharding
  • 8. How Sharding Helps Scale your applications by reducing data sets in any single database Secure sensitive data by isolating it elsewhere Segregates data
  • 9. Be Sure You ' ve Tried Everything Else Memcached Boost Module Load balanced web servers MySQL Master/Slave replicate Turning Views into Custom Queries
  • 10. More Things To Try... Moar memory! Move .htacess to vhost config Apache tunes MySQL tunes Replace search with Apache Solr Optimizing PHP (custom compile) Apache Drupal module Replace Apache with nginx Switched to 3 rd party services for comments Replace contrib modules with custom development
  • 12. Types of sharding Partitioning Horizontal Divides something into two parts Unshuffle Reduced index size Hard to do Federation Vertical A set of things Uses logical divisions Split up across physically different machines
  • 13. Horizontal Partitioning Scaling your application’s performance Distributed data load This is the Shard of Last Resort
  • 14. Even/Odd Partitions This is not Master/Master replication Rows are divided between physical databases Will require custom database API to properly achieve split rows Applies to node loads, entity loads, etc Achieved by auto_increment by N with different starting offsets and application distributes writes in round-robin fashion and via keyed mechanisms to distribute reads and reassemble data
  • 16. Federation Vertically partitioning data by logical affiliation Sharding for shared application data Manageability – distributing data sets Security - Allows for exposing certain bits of data to other applications without exposing all
  • 18. Application Sharding Not just sharding data Shard the components of your site
  • 19. Sample Use Cases Collecting resumes within your existing site Building an ideation tool
  • 20. Sharding Resume Data Accepting resumes for a large corporation Users submit resume via Webform Submit and process data into separate database Resume data is processed by internal HR software to evaluate potential employees
  • 21. Sharding Schemas Same physical database, different schemas Uses database prefixing in settings.php ~ or ~ Different physical databases Uses db_set_active to switch db connections
  • 22. Database Prefixes Handled in settings.php Uses MySQL’s dot separator to target different schemas Requires that the MySQL user used by Drupal has proper permissions Ex: db_1.users and db_2.users
  • 23. Database Prefixes Drupal 6 $db_prefix = array ( 'default' => '', 'users' => 'shared_.', 'sessions' => 'shared_.', 'role' => 'shared_.', 'authmap' => 'shared_.', 'users_roles' => 'shared_.', 'profile_fields' => 'shared_.', 'profile_values' => 'shared_.', );
  • 24. Database Prefixes Drupal 7 $databases = array ( 'default' => array ( 'default' => array ( 'prefix' => array ( 'default' => '', 'users' => 'shared_.', 'sessions' => 'shared_.', 'role' => 'shared_.', 'authmap' => 'shared_.', 'users_roles' => 'shared_.', ), ), ), );
  • 25. Database Prefixes Tips, Tricks, and Caveats Can share user data between Drupal and Drupal 7 with table alters and strict prevention of Drupal 7 logins or user saves Should log in with the lower version of Drupal
  • 26. Different Physical Databases Set up additional connections in settings.php Change connections using db_set_active() Use db_set_active() to switch back when done Watch for schema caching and watchdog errors
  • 27. Different Databases Drupal 6 $db_url = array ( ' default ' => ' mysql://user:pass@host1/db1 ' , ' second ' => ' mysql://user:pass@host2/db2 ', 'third' => ' mysql://user:pass@host3/db3 ', );
  • 28. Database Prefixes Drupal 7 $other_database = array (   'database' => 'databasename',   'username' => 'username',    'password' => 'password', 'host' => 'localhost', 'driver’ => 'mysql', ); Database :: addConnectionInfo (’ moduleKey ', 'default', $other_database ) ; db_set_active (' moduleKey ') ; // Execute queries db_set_active ();
  • 29. Switching Databases $schema = drupal_get_schema ( ' table_name ' ) ; db_set_active (' database_key ') ; // Execute queries Drupal_write_record ( ' table_name ' , $data) ; db_set_active () ;
  • 30. Saving Data in Another Database Hook_install_schema() drupal_write_record() Keeps web site database smaller Can keep sensitive data offsite Partitioned tables can limit/protect your web site database from internal users
  • 31. Saving Data in Another Database Resume data is submitted via form Form’s _submit function accepts final data Schema loads table definition Connects to the HR instance of MySQL Writes new record Uploads any files to private file space Switches database back HR Director can query new resumes
  • 32. Using MongoDB MongoDB is a NoSQL database “ Schema-less” – data schema defined in code Fast Document-based Simpler to scale vertically than MySQL
  • 33. MongoUK 10gen Conference in London, UK September 19, 2011 10gen.com/conferences/mongouk-sept-2011
  • 34. MongoDB and Drupal drupal.org/project/mongodb 7.x allows for field storage, cache, sessions, and blocks to be stored in MongoDB Allows for connections to your own collections
  • 35. MongoDB Data Four levels of objects Connection Database (schema) Collection Cursor (query results) Non-relational database Collections tend to be denormalized
  • 36. MongoDB Documents Resumes.Resume: { first_name : " John ", last_name : " Smith ", title : " Web Developer ", address : { city : " London ", country : " UK " }, skills : [ ' PHP ', ' Drupal ', ' MySQL ' ], ssn : 123456789, }
  • 37. Querying MongoDB Documents $applicant = $applicants -> find ( array ( ' username ' => ' Smith ' , ’ ssn ': 1 , ), array ( ' first_name ’ => 1 , ' last_name ’ => 1 , ), );
  • 38. MongoDB Sharing via REST Simple REST – included as part of MongoDB Sleepy Mongoose – REST interface for MongoDB (Python) MongoDB REST (Node.js)
  • 39. Ideation REST Interface Get a list of all idea documents http://127.0.0.1:28017/ideation/ideas/ Get all comments for a specific idea http://127.0.0.1:28017/ideation/comments/… … ?filter__id=4a8acf6e7fbadc242de5b4f3… … &limit=10&offset=20 Will likely need a dedicated MongoDB REST inteface
  • 40. Applications on Separate Web Tiers Application sharding is data sharding Separate Drupal instances Use mod_proxy as a pass-through Can used multiple load-balanced environments
  • 43. Contact thagler@phase2technology  @phase2tech 703-548-6050 d.o: tobby Slides: agileapproach.com

Editor's Notes

  • #3: Just a reminder that the Official DrupalCon Party is tonight. Buses are leaving here starting at 4pm, but will be leaving continuously for awhile; which is good since all of you have places to be for the next 50 minutes…
  • #4: Discuss what data sharding is, when you might need to shard your data, and what effects this has on your site or application HOW: Horizontal/partitioning and Vertical/Federation
  • #5: Horizontal - More machines Vertical - Bigger machines Vertical will always eventually reach a limit
  • #6: What is it – I’ll cover the different types and ways you can shard your data How does sharding help? How does it hurt? In short, WHEN is sharding right for me? Why not just keep scaling vertically?
  • #7: Breaking apart your data is the easy part. The hard part is putting it back together again seamlessly. This was one of several broken plates that came from my wife’s great grandmother. I didn’t do it?
  • #8: It’s easier to scale smaller pieces – makes it easier to horizontally scale Take one application that shares sensitive data split When you moved cache to memcache IS sharding So is using Varnish or a CDN like Akamai (forms of federated sharding)
  • #9: Reduce your table indices The more data you have, the larger your table index overhead will be. Reduce that and you gain performances. A table with a million rows will perform better than a table with 10 million rows. Share your data with other applications or users. Great for taking CVs or form data that will be processed by an internal (proprietary) system Sometimes physically storing sensitive data (user information, credit card numbers, etc) in a different database can be a good idea. Don ’ t store these things on a database that can be accessed via non-SSL web servers
  • #10: Yiouo guys are here to hear about scaling – let’s talk about all the other things you do to scale Load balancers – Apache mod_proxy and mod_proxy_balancer modules are a cheap way to load balance. There are plenty of cloud-based as well as hardware balancers you can use. '' Drupal 7 offers the concept of slave-safe queries (even in Views 3)
  • #11: Have you performance tested? Is your problem data or application? Make sure that the size of your data is your problem… Compile PHP and apache without default modules. Gentoo Joke. Do you really need PDFLib or LibXML? Memory is cheap, DBAs are not
  • #12: Load balancers – Apache mod_proxy and load balancing modules are a cheap way to load balance. There are plenty of cloud-based as well as hardware balancers you can use. '' Drupal 7 offers the concept of slave-safe querires (even in Views 3)
  • #13: Make the individually smaller vs make the whole smaller A partition is a single piece split in half Even/Odd IDs, letters of the alphabet for user names Reduces index size A “federation” is defined as a “set of things” Logical divisions such as states, counties, countries Tend to be discrete or atomic
  • #14: Reasons to choose horizontal partitioning Everything includes memcached, load balanced web servers, master/slave MySQL replication This is the sharding technique of last resort
  • #15: The total number of rows in each table is reduced. This reduces index size, which generally improves search performance
  • #16: This is why in theory horizontal scale sounds great – you have N-number of database clusters
  • #17: Manageability – have you seen the number of tables in a Drupal install, especially in an install with tons of modules
  • #18: The secondary databases no longer need to be MySQL Notice how the secondary database clusters are starting to look more like cache clusters
  • #19: Disquis for commenting Edge-side includes for CDNs These are examples of application sharding
  • #20: Want my website to collect resumes Want to dump resumes into my HR database, but don’t want all my HR data exposed to the web
  • #21: Suppose your corporation’s web site sees thousands of applications per month or week. It might be a good idea to shard this data for scale. But also, you can shard it for data repurposing with your HR department’s software. Maybe you don’t want those guys with administrative access on the site… Keep personal information secure and off your company’s main website
  • #24: This takes place in settings.php In this example we are sharing user data between multiple sites or applications. Profile field data will be available to both.
  • #25: This takes place in settings.php Since profiles are integrated as fields, you may not have those tables
  • #26: This takes place in settings.php Since profiles are integrated as fields, you may not have those tables
  • #28: Note: This scheme will only work with databases of the same type. You can’t mix PostGRES and MySQL connections here You’ll be able to use different connection strings with usernames, etc
  • #29: This does not HAVE to take place in settings.php - it should be there if at all possible moduleKey can be anything unique to your module
  • #30: Setting the schema is not part of this, but strongly advised. Drupal_get_schema will static cache the table definition Db_set_active will switch database connections and THEN load the schema from static cache first, then database cache; then from code. If it can’t find the cache tables after you’ve switched database connections, it tries to throw an error; cascades down a dark path of errors after it can’t find system table, etc
  • #31: What are the advantages to switching database connections? Can still use Drupal’s schema and database APIs Smaller database for your website helps with master/slave replication (faster), backups are more manageable, less overhead
  • #32: From Drupal’s perspective, here’s how that looks
  • #33: Mongo abstracts the need to horizontally scale – Mongo does the horizontal partitioning for you This scales vertically the application
  • #34: I’m not affiliated with 10gen, I just wanted to mention their conference since we’re all here in London. They’ll have several Drupal-related sessions.
  • #35: Out of the box, MongoDB module already does some things to help speed up and scale your site
  • #37: Here’s a sample document that contains resume data. It’s stored in BSON – binary JSON
  • #38: This is a sample query to return all users with the last name of “Smith”. - Applicants is a collection object - Applicant is a cursor object that you can loop through - $user = $users->findOne(array('username' => 'Smith', 'ssn': 1), array('first_name', 'last_name'));Can use findOne() to get a single return
  • #39: THERE’S NO WEB SERVER INVOLVED AT ALL In addition to performance, you can share your MongoDB data via REST. For use in additional services Can share your data using REST and JSON to display content without costly queries
  • #40: This gets a JSON object Note the trailing slash after the collection name Might need another REST interface like Sleepy.Mongoose for more advanced REST data