SlideShare a Scribd company logo
SQL or NoSQL    How to choose Venu Anuganti Jan 2011 http://venublog.com/
Who am I  Data Architect, Database Kernel / Internals Engineer Implement and Scale SQL, NoSQL, Analytics and Data Warehouse solutions Large scale data handling for Games, Social Networking, SaaS, Click Tracking, Recommendation, Advertisement, Mobile and SEM marketing Blog:  http://venublog.com/
Agenda Buzz around  SQL  and  NoSQL  How to design and implement  Data Flow Architecture How to choose  Data Store  Solution Performance, Scalability and Availability SQL vs NoSQL Where SQL and NoSQL fits Types of SQL and NoSQL data stores Evaluation & Decision Making
Buzzzzzzz Why everyone is talking about NoSQL What is happening to SQL Does that mean end of SQL ? NoSQL era begins ? Why nobody talks about large SQL implementations ?
Evolution of Data Architecture
Data Architecture No standard solution that fits to all Business and it’s data defines architecture It’s all about solving problems You need to find the right tool that does the job
Traditional Architecture Relational database is everything SQL Embedded Client-Server based Data Stack Web, CDN, Load Balancers, Application, Database and Storage
Traditional Scalability … Scale-up Memory and hardware has limitations Scale-out Scaling reads Cache is the king Query cache Memcache Olap Pre-fetching Replication Scaling writes Redundant disk arrays, RAID Sharding
Common Problems… Relational model is heavy : Parsing, Locking, Logging, Buffer pool and threads Not every case can work within single node SMP Sharding does not solve all problems Cross shard or join between shards Need to update across multiple shards within a transaction Shard failure Online schema changes without taking the shard offline Add or replace shards in-line
Evolution  Data is growing rapidly on day by day Motivated by the needs of large web applications Hardware is not emerging as that of data growth Things are moving to Cloud and API driven Social networking and Cloud makes hard to scale using traditional way
Data is the Business Lot of new business models are DATA centric Real-time and Interactive Big Data Millions of user base, clients, customers, applications, … Tera bytes to peta bytes of data on day to day Business can only grow if they can properly make use of data Statistics, Reporting Real-time Re-targeting Recommendation Examples of data driven companies Facebook, Twitter, LinkedIn, Zynga, Groupon, Quora, Apple AppStore, FourSquare, any API Driven, all most all new emerging companies
Solution that works Data architecture is not just choosing a right data store, but should be a solution, with: Low In Cost (preferably open source, no hidden cost..) Simple To Implement High Performance Highly Available Highly Scalable Highly Reliable Highly Recoverable Rapid Development Zero Learning Curve Ability to do online changes (schema or node or automatic) Less Operational Maintenance No firefighting on day to day
NoSQL Solution Emerges Lot of companies emerged to solve data centric problems Big Table: Google started to implement massively distributed scalable system, followed by many, first foot step to the world of massive data scale Many companies followed building scale-out architecture using commodity hardware ACID was termed as bad for scaling, so relaxed consistency model came into picture Google  Big Table  and Amazon  Dynamo  are notable
Relaxed Consistency  Consistency is a major bottleneck for scalability People started implementing eventual consistency CAP Theorem ( C onsistency,  A vailability and  P artition-tolerance)  Consistency: “ Is the data I’m looking at now the same if I look at it somewhere else?” Availability: “ What happens if my database goes down?” Partitioning:   “ What if my data is on different node?” SQL – CA NoSQL – AP http://venublog.com/2010/04/07/cap-theorem-eventual-consistency-nosql/
Data Stores
Data Stores 3 Major Data Store Solutions SQL, OLTP  Relational, transactional processing Analytics, OLAP Data Warehousing, Analytics and reporting NoSQL Non relational, distributed, high performance and highly scalable
SQL Stores Disk based storage Data is stored as table (row by row and columns – row store) Mainly B-tree as the indexing mechanism
SQL Stores … Dynamic locking/ Lock free for concurrency control Write-ahead log (WAL) / transactional log for crash recovery SQL as the access language
SQL Stores Proven and widely adopted MySQL PostGreSQL VoltDB Clustrix MySQL Cluster ScaleDB ScaleBase DbShards Oracle SQL Server DB2 Sybase & … Supports  ACID Crash recovery DDL, DML, DCL
Analytic Stores Data warehousing, mainly for large sets of data Data marts, Dimensional, Fact and Aggregate tables ETL, BI, Reporting, Analytics Columnar and Compression is the key OLAP Cubes built-in or middle-tier Mostly SQL and also MDX driven
Analytic Stores Columnar data warehouse solutions GreenPlum (+ DCA appliance) Vertica (Break through, I love it) Aster ParAccel InfoBright (MySQL based) InfiniDB (open source, Calpont appliance) Netezza (appliance) XtremeData dbX (appliance) TeraData
NoSQL Stores Does not mean  No  to  SQL Actually  No t only  SQL Data store that may not require fixed table schemas Mainly derived from Google BigTable and Amazon Dynamo
NoSQL Stores … Non relational, schema free Distributed, ability to horizontally scale  Simple CLI or API protocol Eventually consistent, depends … Limitations of SQL to scale large data Ability to dynamically define new attributes
NoSQL Stores … Multiple Types based on storage architecture Key Value, KV  Document Graph Column Family
NoSQL Stores Key-Value Stores Dynamo Clones Membase Riak Redis Tokyo Cabinet Voldemort Document Stores MongoDB CouchDB Column Family BigTable Clones Cassandra HBase HyperTable Graph Databases Neo4J InfoGrid AllegroGraph FlockDB
What they are good at  &  How to choose
Basic Decision Principles Do not over architect from day-1, it’s overkill Startups can’t afford to spend time Understand business and implement with simple well known solutions to begin with Do not follow the models, just inspire from the problem solving Engineering talent is crucial, make sure you have right resources Evaluate and implement new solutions as the business grows
Basic Decision Principles … High availability & disaster recovery is a must Understand pros and cons of each and every design model, and weigh towards the best interest of the company Remember some of the big outage stories Tumblr, FourSuaure & Twitter  Lean towards community winner and widely adopted Do not lean towards only performance, unless you can create the state of the data back
SQL – Good High Performance OLTP, Transactions, ACID Structured, SQL Access , portability and tools Small amounts of data, typically < 500G per server,  supports inline UPDATE, DELETE, multi-condition/rows Relational model at data store, application independent Many tables with different types of dtaa
SQL – Good Simple or complex aggregation Statistics, reports at data store level Need access to more than one tuple of information Results based on multiple search conditions SELECT foo FROM bar where X=1 and Y=2 Fetching of ordered or array of data Compatible with many tools
SQL – Bad SQL complexity, parsing cost Learning and relational model design Performance and Scalability Strictly single node Sharding causes more trouble operationally Operational maintenance, fire fighting Puts a break to rapid development cycles
NoSQL - Good Fits very well for volatile data High read or write throughput Automatic horizontal scalability (Consistent hashing) Simple to implement, no investment for developers to design and implement relational model Application logic defines object model Support of MVCC in some form Compaction and un-compaction happens at top tier In-memory or disk based or combination
NoSQL - Good Rapid development cycles, programmer friendly Reduces the footprint at data store level NoSQL in general faster than SQL Supports INSERT, DELETE, SELECT Data is distributed by KEY over nodes Lists, sets, queues, pub-sub are also supported by some NoSQL – Redis S3 can handle large blobs; not all NoSQL can handle it
NoSQL - Bad Packing and Un-packing of each key Lack of relation from one key to another Need whole value from the key; to read/write any partial information No security or authentication Data store is merely a storage layer, can’t be used for: Analytics Reporting Aggregation Ordered values
SQL/NoSQL – Good and Bad Performance mainly depends on amount of memory Disk bound both takes a hit SQL has advantage due to sequential and read-ahead Optimization towards frequently accessed data SQL engines maintain LRU SQL Engines are proven and widely in use NoSQL is pretty much new; but marching …
Cache MySQL  HandlerSocket InnoDB bufferpool acts as cache No explicit cache needed, no write invalidation Write Through Cache (WTC)  is a good candidate for high reads or writes Gaming world really need this Membase or periodic flush to persistent storage layer Flash cache  can also help to scale IO bound workloads We might see them pitching in private cloud slowly
Document Store Document Stores Supports complex data model than KV Good at handling content management, session, profile data Multi index support Dynamic schemas, Nested schemas Auto distributed, eventual consistency MVCC (CouchDB) or automic (MongoDB) MongoDB, SimpleDB: widely adopted in this space Use Case: Search by complex patterns & CRUD apps
Column Family Store Hbase (Apache), Cassanda (Facebook) and HyperTable (Bidu) Hbase – CA Cassandra – AP Model consists of rows and columns Scalability: Splitting of both rows and columns Rows are split across nodes using primary key, range Columns are distributed using groups Horizonal and vertical partitioning can be used simultaneous Extension of document store HBase uses HDFS; Pig, Hive, Cascading can help Use case: Grouping of frequently used and un-used over data centers / stream of writes
Graph Store Social Graph Relationship between entities Data modeling on social networks Common Use Cases List of friends Recommendation system Following Followers Common Connections FAN IN/OUT
Cloud Data Stores Database Cloud Services Xeround (MySQL) Microsoft SQL Azure Database (SQL Server) SimpleDB (NoSQL) Google App Engine (NoSQL) SalesForce Database.com (Oracle) ClearDB (MySQL)
SUMMARY Pick right data model for the right problem Understand the data storage and use cases on read, write and growth patterns; and then come up with a plan to implement. Use case dictates everything. Compare pros and cons, and weigh towards the one that helps business Public cloud, private cloud or data center also dictates what model to choose You need right people
Finally …    SQL  Works great, can’t scale for large data NoSQL Works great, can’t fit for all     SQL + NoSQL
Questions ? http://venublog.com/ [email_address] Twitter: @vanuganti
Ad

More Related Content

What's hot (20)

Big data unit i
Big data unit iBig data unit i
Big data unit i
Navjot Kaur
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
Jeff Hammerbacher
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
Lynn Langit
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Harri Kauhanen
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
Ashwani Kumar
 
Consistency in NoSQL
Consistency in NoSQLConsistency in NoSQL
Consistency in NoSQL
Dr-Dipali Meher
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
Navdeep Charan
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
ateeq ateeq
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
Ramakant Soni
 
Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??
Abdul Aslam
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 
NoSql
NoSqlNoSql
NoSql
AnitaSenthilkumar
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Skillspeed
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Rutvik Bapat
 
Kdd process
Kdd processKdd process
Kdd process
Rajesh Chandra
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
Dr. C.V. Suresh Babu
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
Lynn Langit
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
Ashwani Kumar
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
Navdeep Charan
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
Ramakant Soni
 
Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??
Abdul Aslam
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Skillspeed
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Rutvik Bapat
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
Dr. C.V. Suresh Babu
 

Similar to SQL/NoSQL How to choose ? (20)

Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
Venu Anuganti
 
No sql
No sqlNo sql
No sql
Prateek Jain
 
Role of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, WarehousingRole of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, Warehousing
Venu Anuganti
 
NoSQL
NoSQLNoSQL
NoSQL
kirandanduprolu
 
Enterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison PillEnterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison Pill
Billy Newport
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
Bikram Sinha. MBA, PMP
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 
Introduction to asdfghjkln b vfgh n v
Introduction to asdfghjkln b vfgh n    vIntroduction to asdfghjkln b vfgh n    v
Introduction to asdfghjkln b vfgh n v
23mz02
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousing
Sneha Challa
 
Know what is NOSQL
Know what is NOSQL Know what is NOSQL
Know what is NOSQL
Prasoon Sharma
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
Databricks
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
Igor Moochnick
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
James Serra
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory Webcast
Inside Analysis
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
mark madsen
 
NOSQL
NOSQLNOSQL
NOSQL
akbarashaikh
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
Jeff Smoley
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربي
Mohamed Galal
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
Venu Anuganti
 
Role of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, WarehousingRole of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, Warehousing
Venu Anuganti
 
Enterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison PillEnterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison Pill
Billy Newport
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 
Introduction to asdfghjkln b vfgh n v
Introduction to asdfghjkln b vfgh n    vIntroduction to asdfghjkln b vfgh n    v
Introduction to asdfghjkln b vfgh n v
23mz02
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousing
Sneha Challa
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
Databricks
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
Igor Moochnick
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
James Serra
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory Webcast
Inside Analysis
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
mark madsen
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
Jeff Smoley
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربي
Mohamed Galal
 
Ad

Recently uploaded (20)

Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Ad

SQL/NoSQL How to choose ?

  • 1. SQL or NoSQL How to choose Venu Anuganti Jan 2011 http://venublog.com/
  • 2. Who am I Data Architect, Database Kernel / Internals Engineer Implement and Scale SQL, NoSQL, Analytics and Data Warehouse solutions Large scale data handling for Games, Social Networking, SaaS, Click Tracking, Recommendation, Advertisement, Mobile and SEM marketing Blog: http://venublog.com/
  • 3. Agenda Buzz around SQL and NoSQL How to design and implement Data Flow Architecture How to choose Data Store Solution Performance, Scalability and Availability SQL vs NoSQL Where SQL and NoSQL fits Types of SQL and NoSQL data stores Evaluation & Decision Making
  • 4. Buzzzzzzz Why everyone is talking about NoSQL What is happening to SQL Does that mean end of SQL ? NoSQL era begins ? Why nobody talks about large SQL implementations ?
  • 5. Evolution of Data Architecture
  • 6. Data Architecture No standard solution that fits to all Business and it’s data defines architecture It’s all about solving problems You need to find the right tool that does the job
  • 7. Traditional Architecture Relational database is everything SQL Embedded Client-Server based Data Stack Web, CDN, Load Balancers, Application, Database and Storage
  • 8. Traditional Scalability … Scale-up Memory and hardware has limitations Scale-out Scaling reads Cache is the king Query cache Memcache Olap Pre-fetching Replication Scaling writes Redundant disk arrays, RAID Sharding
  • 9. Common Problems… Relational model is heavy : Parsing, Locking, Logging, Buffer pool and threads Not every case can work within single node SMP Sharding does not solve all problems Cross shard or join between shards Need to update across multiple shards within a transaction Shard failure Online schema changes without taking the shard offline Add or replace shards in-line
  • 10. Evolution Data is growing rapidly on day by day Motivated by the needs of large web applications Hardware is not emerging as that of data growth Things are moving to Cloud and API driven Social networking and Cloud makes hard to scale using traditional way
  • 11. Data is the Business Lot of new business models are DATA centric Real-time and Interactive Big Data Millions of user base, clients, customers, applications, … Tera bytes to peta bytes of data on day to day Business can only grow if they can properly make use of data Statistics, Reporting Real-time Re-targeting Recommendation Examples of data driven companies Facebook, Twitter, LinkedIn, Zynga, Groupon, Quora, Apple AppStore, FourSquare, any API Driven, all most all new emerging companies
  • 12. Solution that works Data architecture is not just choosing a right data store, but should be a solution, with: Low In Cost (preferably open source, no hidden cost..) Simple To Implement High Performance Highly Available Highly Scalable Highly Reliable Highly Recoverable Rapid Development Zero Learning Curve Ability to do online changes (schema or node or automatic) Less Operational Maintenance No firefighting on day to day
  • 13. NoSQL Solution Emerges Lot of companies emerged to solve data centric problems Big Table: Google started to implement massively distributed scalable system, followed by many, first foot step to the world of massive data scale Many companies followed building scale-out architecture using commodity hardware ACID was termed as bad for scaling, so relaxed consistency model came into picture Google Big Table and Amazon Dynamo are notable
  • 14. Relaxed Consistency Consistency is a major bottleneck for scalability People started implementing eventual consistency CAP Theorem ( C onsistency, A vailability and P artition-tolerance) Consistency: “ Is the data I’m looking at now the same if I look at it somewhere else?” Availability: “ What happens if my database goes down?” Partitioning: “ What if my data is on different node?” SQL – CA NoSQL – AP http://venublog.com/2010/04/07/cap-theorem-eventual-consistency-nosql/
  • 16. Data Stores 3 Major Data Store Solutions SQL, OLTP Relational, transactional processing Analytics, OLAP Data Warehousing, Analytics and reporting NoSQL Non relational, distributed, high performance and highly scalable
  • 17. SQL Stores Disk based storage Data is stored as table (row by row and columns – row store) Mainly B-tree as the indexing mechanism
  • 18. SQL Stores … Dynamic locking/ Lock free for concurrency control Write-ahead log (WAL) / transactional log for crash recovery SQL as the access language
  • 19. SQL Stores Proven and widely adopted MySQL PostGreSQL VoltDB Clustrix MySQL Cluster ScaleDB ScaleBase DbShards Oracle SQL Server DB2 Sybase & … Supports ACID Crash recovery DDL, DML, DCL
  • 20. Analytic Stores Data warehousing, mainly for large sets of data Data marts, Dimensional, Fact and Aggregate tables ETL, BI, Reporting, Analytics Columnar and Compression is the key OLAP Cubes built-in or middle-tier Mostly SQL and also MDX driven
  • 21. Analytic Stores Columnar data warehouse solutions GreenPlum (+ DCA appliance) Vertica (Break through, I love it) Aster ParAccel InfoBright (MySQL based) InfiniDB (open source, Calpont appliance) Netezza (appliance) XtremeData dbX (appliance) TeraData
  • 22. NoSQL Stores Does not mean No to SQL Actually No t only SQL Data store that may not require fixed table schemas Mainly derived from Google BigTable and Amazon Dynamo
  • 23. NoSQL Stores … Non relational, schema free Distributed, ability to horizontally scale Simple CLI or API protocol Eventually consistent, depends … Limitations of SQL to scale large data Ability to dynamically define new attributes
  • 24. NoSQL Stores … Multiple Types based on storage architecture Key Value, KV Document Graph Column Family
  • 25. NoSQL Stores Key-Value Stores Dynamo Clones Membase Riak Redis Tokyo Cabinet Voldemort Document Stores MongoDB CouchDB Column Family BigTable Clones Cassandra HBase HyperTable Graph Databases Neo4J InfoGrid AllegroGraph FlockDB
  • 26. What they are good at & How to choose
  • 27. Basic Decision Principles Do not over architect from day-1, it’s overkill Startups can’t afford to spend time Understand business and implement with simple well known solutions to begin with Do not follow the models, just inspire from the problem solving Engineering talent is crucial, make sure you have right resources Evaluate and implement new solutions as the business grows
  • 28. Basic Decision Principles … High availability & disaster recovery is a must Understand pros and cons of each and every design model, and weigh towards the best interest of the company Remember some of the big outage stories Tumblr, FourSuaure & Twitter Lean towards community winner and widely adopted Do not lean towards only performance, unless you can create the state of the data back
  • 29. SQL – Good High Performance OLTP, Transactions, ACID Structured, SQL Access , portability and tools Small amounts of data, typically < 500G per server, supports inline UPDATE, DELETE, multi-condition/rows Relational model at data store, application independent Many tables with different types of dtaa
  • 30. SQL – Good Simple or complex aggregation Statistics, reports at data store level Need access to more than one tuple of information Results based on multiple search conditions SELECT foo FROM bar where X=1 and Y=2 Fetching of ordered or array of data Compatible with many tools
  • 31. SQL – Bad SQL complexity, parsing cost Learning and relational model design Performance and Scalability Strictly single node Sharding causes more trouble operationally Operational maintenance, fire fighting Puts a break to rapid development cycles
  • 32. NoSQL - Good Fits very well for volatile data High read or write throughput Automatic horizontal scalability (Consistent hashing) Simple to implement, no investment for developers to design and implement relational model Application logic defines object model Support of MVCC in some form Compaction and un-compaction happens at top tier In-memory or disk based or combination
  • 33. NoSQL - Good Rapid development cycles, programmer friendly Reduces the footprint at data store level NoSQL in general faster than SQL Supports INSERT, DELETE, SELECT Data is distributed by KEY over nodes Lists, sets, queues, pub-sub are also supported by some NoSQL – Redis S3 can handle large blobs; not all NoSQL can handle it
  • 34. NoSQL - Bad Packing and Un-packing of each key Lack of relation from one key to another Need whole value from the key; to read/write any partial information No security or authentication Data store is merely a storage layer, can’t be used for: Analytics Reporting Aggregation Ordered values
  • 35. SQL/NoSQL – Good and Bad Performance mainly depends on amount of memory Disk bound both takes a hit SQL has advantage due to sequential and read-ahead Optimization towards frequently accessed data SQL engines maintain LRU SQL Engines are proven and widely in use NoSQL is pretty much new; but marching …
  • 36. Cache MySQL HandlerSocket InnoDB bufferpool acts as cache No explicit cache needed, no write invalidation Write Through Cache (WTC) is a good candidate for high reads or writes Gaming world really need this Membase or periodic flush to persistent storage layer Flash cache can also help to scale IO bound workloads We might see them pitching in private cloud slowly
  • 37. Document Store Document Stores Supports complex data model than KV Good at handling content management, session, profile data Multi index support Dynamic schemas, Nested schemas Auto distributed, eventual consistency MVCC (CouchDB) or automic (MongoDB) MongoDB, SimpleDB: widely adopted in this space Use Case: Search by complex patterns & CRUD apps
  • 38. Column Family Store Hbase (Apache), Cassanda (Facebook) and HyperTable (Bidu) Hbase – CA Cassandra – AP Model consists of rows and columns Scalability: Splitting of both rows and columns Rows are split across nodes using primary key, range Columns are distributed using groups Horizonal and vertical partitioning can be used simultaneous Extension of document store HBase uses HDFS; Pig, Hive, Cascading can help Use case: Grouping of frequently used and un-used over data centers / stream of writes
  • 39. Graph Store Social Graph Relationship between entities Data modeling on social networks Common Use Cases List of friends Recommendation system Following Followers Common Connections FAN IN/OUT
  • 40. Cloud Data Stores Database Cloud Services Xeround (MySQL) Microsoft SQL Azure Database (SQL Server) SimpleDB (NoSQL) Google App Engine (NoSQL) SalesForce Database.com (Oracle) ClearDB (MySQL)
  • 41. SUMMARY Pick right data model for the right problem Understand the data storage and use cases on read, write and growth patterns; and then come up with a plan to implement. Use case dictates everything. Compare pros and cons, and weigh towards the one that helps business Public cloud, private cloud or data center also dictates what model to choose You need right people
  • 42. Finally …  SQL Works great, can’t scale for large data NoSQL Works great, can’t fit for all  SQL + NoSQL
  • 43. Questions ? http://venublog.com/ [email_address] Twitter: @vanuganti

Editor's Notes

  • #3: MySQL Employee 2000-2004 Database Companies MySQL SOLID ANTs Data Server ScaleDB Part of Yahoo’s cloud initiates like Sherpa and Mobstor and a platform MySQL Geek Still contribute randomly to MySQL source
  • #5: To answer all these, we need to understand how the traditional data architecture is and how it is currently used and the future of data
  • #6: When web is read-only, things used to scale with one or more systems with caching or LB in the front But as things change to real-time and interactive, the same architecture can’t keep up Talk about how Facebook, Twitter, LinkedIn is evolving Public cloud sucks in performance, but offers elasticity to grow ; but you need to design systems to balance hardware, performance and scalability
  • #7: If Facebook, Twitter or someone else uses NoSQL, does not mean everyone has to use it If someone scales using MySQL, does not mean everyone can use the same concept
  • #8: Caching was used for scaling reads
  • #9: Caching was used for scaling reads
  • #10: Caching was used for scaling reads
  • #11: When web is read-only, things used to scale with one or more systems with caching or LB in the front But as things change to real-time and interactive, the same architecture can’t keep up Talk about how Facebook, Twitter, LinkedIn is evolving Public cloud sucks in performance, but offers elasticity to grow ; but you need to design systems to balance hardware, performance and scalability
  • #13: Not everyone wants to hear about systems going down for hours and hours or even days Like FourSquare, Tumbler
  • #15: Typical OLTP system needs C &amp; A Replication is also eventual consistency Eventually consistent
  • #16: Now lets understand different types of data stores
  • #18: Widely adopted for years
  • #19: Widely adopted for years
  • #20: Widely adopted for years
  • #21: Widely adopted for years
  • #22: DCA Data Computing Appliance Talk about analytics and how crucial they are now
  • #24: Bunch of cloud based solutions, which are bit surprising
  • #25: Bunch of cloud based solutions, which are bit surprising
  • #26: Bunch of cloud based solutions, which are bit surprising
  • #28: Before getting into how to design and implement, lets understand some basics of design, what to achieve
  • #29: Twitter – MySQL crash and no proper backups in place.. Rolling restore Tumblr … it was down for close to 16 hours or so FourSuare is down for 12 hrs or so You don’t want to be a in situation where you don’t know where the problem is
  • #30: Employee or user can update his profile fields Guaranteed durability
  • #31: Employee or user can update his profile fields Guaranteed durability
  • #32: Employee or user can update his profile fields Guaranteed durability
  • #33: Gaming is a classic example for volatile data
  • #34: Gaming is a classic example for volatile data
  • #35: Gaming is a classic example for volatile data
  • #36: Gaming is a classic example for volatile data
  • #37: Gaming is a classic example for volatile data
  • #38: Gaming is a classic example for volatile data
  • #39: Gaming is a classic example for volatile data
  • #40: Gaming is a classic example for volatile data