SlideShare a Scribd company logo
SQL to NoSQLLessons learnt migrating a large and highly-relational database into a "classic" NoSQLEnda Farrell @endafarrell
“It’s not you, it’s me …”This doesn’t apply to you… possibly … probably …2
Here’s what’s comingWhatWhyComplexityPeopleToolsData3Lessons
What is this service?Nokia’s “Ovi Places Registry” aims to be the largest validated point of interest repository in the world4
What kind of data?Names Categories TagsLocation information longitude and latitudepostal addressContact data5
6
7
What about it is large and highly relational?10s of millions points of interestMany many 100s of millions of contributing recordsMySQL DB is 600 GB on disk32 tables, 202 columns46 non-PRIMARY constraints8
What usage patterns do you have?9
“Classic” NoSQL? What did you use?It isn’t CouchDB but a variation of a Nokia internal oneIt’s a Key Value store holding JSON Without the key you cannot access the value It isn’t a “document store” as the store does nothing with the structure10
Why did you port this to NoSQL?bigger and biggerNokia maps – web and phoneYahoo! and soon Bing, => FacebookPostponed sharding by bigger HDDWe learnt a lot over the last 3 years11
Why did you port this to NoSQL? (continued)SQL databases can be rigidThe world is a messy place“State field”?Integrating other organisations’ data12
Complicated?The SQL and NoSQL databases will need to run in parallel for some timeOps &Disk spaceTruth or System of Record Reconciliation SyncronisationQuerying13
Complicated Ops & diskReleases, QA, staging and live deployments are more complex when there are two concurrent data storage systemsWhat assumptions are other people making?2 x HDD14
Complicated truthWhen the two systems disagree, which one is “right”?15
Complicated reconciliationHow do you know that your two data stores disagree?Do you check each on on each read/write, or do you have some “batch” code to check equivalence?Top tip: build a batch reconciler to check keys and revision/etagsyou _do_ have etags don’t you?! 16
Complicated synchronisation	Have you ever tried to keep two different calendars synchronised?Ever get two email clients telling you you have different numbers of unread mail for the same email account?17
Complicated querying	KV stores generally don’t do queryingSome NoSQL stores allow some, but usually more restricted than SQLWe used Solr for performance even though it isn’t as powerful as SQLThe synchronisation complexity is here to stay for us 18
ComplexityComplexity is often mistaken for “cleverness”19
Lessons learnt: peopleWhy. It’s a question you will be asked and you will have to answer20
Lesson learnt: people The “DB~A” role is still neededHere the “~A” is more to do with data/information architecture than with administration Top tip: design your JSON. Print it out.21
Lesson learnt: peopleThe effect on your teamYou may have a team of enterprise Java-types who are used to writing Eclipse-enabled codeIn our case we wanted to keep the flexibility that JSON gives us, but it meant we no longer had the same sort of model objects22
Lessons learnt: toolsBuild “SQL to NoSQL” and “NoSQL to SQL” seedersYou will need to seed your NoSQL from your SQL. You probably have existing DAOs which can form the basis – but this assumes your entities are essentially the same (top tip: keep them so!)23
Lessons learnt: toolsBuild “SQL to NoSQL” and “NoSQL to SQL” seeders“NoSQL to SQL” was a seeder we learnt the hard way. 24
Lessons learnt: toolsWhat’s your unit/integration test coverage?5 releases post initial launch, is your test data still exercising all code paths?25
Lesson learnt: toolsYou may find that not everything fits into a Key Value engineEven with a queryable index, some data sets really are relational ;-)The down-side is that you may therefore have to keep long term the SQL database26
Lesson learnt: toolsVisualise your systemMonitoring: calls, load, response timesVolumetrics: num docs, HDD, milestonesContext: draw a systems context diagram(which reminds me …)27
Lesson learnt: dataKey generation – it’s not sequence numbers nor “auto-increments” anymoreMany are UUIDs and they are long and uglyBut “guess and check” is ugly too	Consistent hash?28
Lesson learnt: dataMake the revision/etag of the JSON data visible in the JSONNot just in a header (assuming HTTP here)The data will be taken off-platform, if you want to change it you will need to know that the revision is (or is not) still the same29
Lesson learnt: dataVersion of the JSON “schema” in the KVYou have many docsYou might have to “upgrade” the structure of the KV docKeep the schema version in the JSON 30
Lesson learnt: dataMany little not few big?Easier to replicate“Big” docs can be tough on networksTrade-off with more client calls (esp error handling)31
Probably good ideasIf you’re _thinking_ about doing this, do use one of the open source onesGet one that replicates easilyBuild POCs32
Thank you!Other questions?@endafarrellhttp://endafarrell.net
Ad

More Related Content

What's hot (20)

Mongo DB
Mongo DBMongo DB
Mongo DB
Edureka!
 
MongoDB Best Practices for Developers
MongoDB Best Practices for DevelopersMongoDB Best Practices for Developers
MongoDB Best Practices for Developers
Moshe Kaplan
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
Uwe Printz
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
MongoDB
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
Steven Francia
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
Lee Theobald
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
Azure CosmosDb - Where we are
Azure CosmosDb - Where we areAzure CosmosDb - Where we are
Azure CosmosDb - Where we are
Marco Parenzan
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
MongoDB
 
Migrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMigrating to MongoDB: Best Practices
Migrating to MongoDB: Best Practices
MongoDB
 
Webinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineWebinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage Engine
MongoDB
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
johnrjenson
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Justin Smestad
 
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)
Ortus Solutions, Corp
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
MongoDB
 
MongoDB - An Agile NoSQL Database
MongoDB - An Agile NoSQL DatabaseMongoDB - An Agile NoSQL Database
MongoDB - An Agile NoSQL Database
Gaurav Awasthi
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
DATAVERSITY
 
Using NoSQL MongoDB with ColdFusion
Using NoSQL MongoDB with ColdFusionUsing NoSQL MongoDB with ColdFusion
Using NoSQL MongoDB with ColdFusion
indiver
 
MongoDB Best Practices for Developers
MongoDB Best Practices for DevelopersMongoDB Best Practices for Developers
MongoDB Best Practices for Developers
Moshe Kaplan
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
Uwe Printz
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
MongoDB
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
Steven Francia
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
Lee Theobald
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
Azure CosmosDb - Where we are
Azure CosmosDb - Where we areAzure CosmosDb - Where we are
Azure CosmosDb - Where we are
Marco Parenzan
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
MongoDB
 
Migrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMigrating to MongoDB: Best Practices
Migrating to MongoDB: Best Practices
MongoDB
 
Webinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineWebinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage Engine
MongoDB
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
johnrjenson
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Justin Smestad
 
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)
Ortus Solutions, Corp
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
MongoDB
 
MongoDB - An Agile NoSQL Database
MongoDB - An Agile NoSQL DatabaseMongoDB - An Agile NoSQL Database
MongoDB - An Agile NoSQL Database
Gaurav Awasthi
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
DATAVERSITY
 
Using NoSQL MongoDB with ColdFusion
Using NoSQL MongoDB with ColdFusionUsing NoSQL MongoDB with ColdFusion
Using NoSQL MongoDB with ColdFusion
indiver
 

Similar to Lessons learnt coverting from SQL to NoSQL (20)

Enterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison PillEnterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison Pill
Billy Newport
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
Pooyan Mehrparvar
 
No sql3 rmoug
No sql3 rmougNo sql3 rmoug
No sql3 rmoug
Gwen (Chen) Shapira
 
What Your Database Query is Really Doing
What Your Database Query is Really DoingWhat Your Database Query is Really Doing
What Your Database Query is Really Doing
Dave Stokes
 
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the question
Krishnakumar S
 
NOSQL
NOSQLNOSQL
NOSQL
akbarashaikh
 
Polyglot Database - Linuxcon North America 2016
Polyglot Database - Linuxcon North America 2016Polyglot Database - Linuxcon North America 2016
Polyglot Database - Linuxcon North America 2016
Dave Stokes
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data Science
Aditya Parameswaran
 
Databases benoitg 2009-03-10
Databases benoitg 2009-03-10Databases benoitg 2009-03-10
Databases benoitg 2009-03-10
benoitg
 
NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]
Huy Do
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 
Databases & Microsoft SQL Server
Databases & Microsoft SQL ServerDatabases & Microsoft SQL Server
Databases & Microsoft SQL Server
Mahmoud Abdallah
 
SQL vs NoSQL deep dive
SQL vs NoSQL deep diveSQL vs NoSQL deep dive
SQL vs NoSQL deep dive
Ahmed Shaaban
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
abdulrahmanhelan
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Filip Ilievski
 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?
Ahmed Rashwan
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics Applications
DataWorks Summit
 
Automating SolidWorks with Excel
Automating SolidWorks with ExcelAutomating SolidWorks with Excel
Automating SolidWorks with Excel
Razorleaf Corporation
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
Caserta
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodes
Ulf Wendel
 
Enterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison PillEnterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison Pill
Billy Newport
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
Pooyan Mehrparvar
 
What Your Database Query is Really Doing
What Your Database Query is Really DoingWhat Your Database Query is Really Doing
What Your Database Query is Really Doing
Dave Stokes
 
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the question
Krishnakumar S
 
Polyglot Database - Linuxcon North America 2016
Polyglot Database - Linuxcon North America 2016Polyglot Database - Linuxcon North America 2016
Polyglot Database - Linuxcon North America 2016
Dave Stokes
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data Science
Aditya Parameswaran
 
Databases benoitg 2009-03-10
Databases benoitg 2009-03-10Databases benoitg 2009-03-10
Databases benoitg 2009-03-10
benoitg
 
NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]
Huy Do
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 
Databases & Microsoft SQL Server
Databases & Microsoft SQL ServerDatabases & Microsoft SQL Server
Databases & Microsoft SQL Server
Mahmoud Abdallah
 
SQL vs NoSQL deep dive
SQL vs NoSQL deep diveSQL vs NoSQL deep dive
SQL vs NoSQL deep dive
Ahmed Shaaban
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
abdulrahmanhelan
 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?
Ahmed Rashwan
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics Applications
DataWorks Summit
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
Caserta
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodes
Ulf Wendel
 
Ad

Recently uploaded (20)

Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Ad

Lessons learnt coverting from SQL to NoSQL

  • 1. SQL to NoSQLLessons learnt migrating a large and highly-relational database into a "classic" NoSQLEnda Farrell @endafarrell
  • 2. “It’s not you, it’s me …”This doesn’t apply to you… possibly … probably …2
  • 4. What is this service?Nokia’s “Ovi Places Registry” aims to be the largest validated point of interest repository in the world4
  • 5. What kind of data?Names Categories TagsLocation information longitude and latitudepostal addressContact data5
  • 6. 6
  • 7. 7
  • 8. What about it is large and highly relational?10s of millions points of interestMany many 100s of millions of contributing recordsMySQL DB is 600 GB on disk32 tables, 202 columns46 non-PRIMARY constraints8
  • 9. What usage patterns do you have?9
  • 10. “Classic” NoSQL? What did you use?It isn’t CouchDB but a variation of a Nokia internal oneIt’s a Key Value store holding JSON Without the key you cannot access the value It isn’t a “document store” as the store does nothing with the structure10
  • 11. Why did you port this to NoSQL?bigger and biggerNokia maps – web and phoneYahoo! and soon Bing, => FacebookPostponed sharding by bigger HDDWe learnt a lot over the last 3 years11
  • 12. Why did you port this to NoSQL? (continued)SQL databases can be rigidThe world is a messy place“State field”?Integrating other organisations’ data12
  • 13. Complicated?The SQL and NoSQL databases will need to run in parallel for some timeOps &Disk spaceTruth or System of Record Reconciliation SyncronisationQuerying13
  • 14. Complicated Ops & diskReleases, QA, staging and live deployments are more complex when there are two concurrent data storage systemsWhat assumptions are other people making?2 x HDD14
  • 15. Complicated truthWhen the two systems disagree, which one is “right”?15
  • 16. Complicated reconciliationHow do you know that your two data stores disagree?Do you check each on on each read/write, or do you have some “batch” code to check equivalence?Top tip: build a batch reconciler to check keys and revision/etagsyou _do_ have etags don’t you?! 16
  • 17. Complicated synchronisation Have you ever tried to keep two different calendars synchronised?Ever get two email clients telling you you have different numbers of unread mail for the same email account?17
  • 18. Complicated querying KV stores generally don’t do queryingSome NoSQL stores allow some, but usually more restricted than SQLWe used Solr for performance even though it isn’t as powerful as SQLThe synchronisation complexity is here to stay for us 18
  • 19. ComplexityComplexity is often mistaken for “cleverness”19
  • 20. Lessons learnt: peopleWhy. It’s a question you will be asked and you will have to answer20
  • 21. Lesson learnt: people The “DB~A” role is still neededHere the “~A” is more to do with data/information architecture than with administration Top tip: design your JSON. Print it out.21
  • 22. Lesson learnt: peopleThe effect on your teamYou may have a team of enterprise Java-types who are used to writing Eclipse-enabled codeIn our case we wanted to keep the flexibility that JSON gives us, but it meant we no longer had the same sort of model objects22
  • 23. Lessons learnt: toolsBuild “SQL to NoSQL” and “NoSQL to SQL” seedersYou will need to seed your NoSQL from your SQL. You probably have existing DAOs which can form the basis – but this assumes your entities are essentially the same (top tip: keep them so!)23
  • 24. Lessons learnt: toolsBuild “SQL to NoSQL” and “NoSQL to SQL” seeders“NoSQL to SQL” was a seeder we learnt the hard way. 24
  • 25. Lessons learnt: toolsWhat’s your unit/integration test coverage?5 releases post initial launch, is your test data still exercising all code paths?25
  • 26. Lesson learnt: toolsYou may find that not everything fits into a Key Value engineEven with a queryable index, some data sets really are relational ;-)The down-side is that you may therefore have to keep long term the SQL database26
  • 27. Lesson learnt: toolsVisualise your systemMonitoring: calls, load, response timesVolumetrics: num docs, HDD, milestonesContext: draw a systems context diagram(which reminds me …)27
  • 28. Lesson learnt: dataKey generation – it’s not sequence numbers nor “auto-increments” anymoreMany are UUIDs and they are long and uglyBut “guess and check” is ugly too Consistent hash?28
  • 29. Lesson learnt: dataMake the revision/etag of the JSON data visible in the JSONNot just in a header (assuming HTTP here)The data will be taken off-platform, if you want to change it you will need to know that the revision is (or is not) still the same29
  • 30. Lesson learnt: dataVersion of the JSON “schema” in the KVYou have many docsYou might have to “upgrade” the structure of the KV docKeep the schema version in the JSON 30
  • 31. Lesson learnt: dataMany little not few big?Easier to replicate“Big” docs can be tough on networksTrade-off with more client calls (esp error handling)31
  • 32. Probably good ideasIf you’re _thinking_ about doing this, do use one of the open source onesGet one that replicates easilyBuild POCs32

Editor's Notes

  • #3: There may be large chunks of this talk that simply do not apply to your situation.If so, sorry. Perhaps you can identify situations that you will want to continue to avoid?I’ve tried to be “general” in that I hope it isn’t team/project/tool specific, but I’ve also tried to be at least a little specific in the lessons learntThe mail goal of this talk is to give you food for thought so that when you do this there will be a little less to be uncertain about or fewer “gotchas”I probably have not done a brilliant job of removing all of the “my situation” specifics, so if it’s confusing, please be kind ;-)
  • #4: There may be large chunks of this talk that simply do not apply to your situation.If so, sorry. Perhaps you can identify situations that you will want to continue to avoid?I probably have not done a brilliant job of removing all of the “my situation” specifics, so if it’s confusing, please be kind ;-)
  • #5: Nokia’s “Places Registry” aims to be the largest validated “point of interest” repository in the world.Question: hands up if you have you ever owned a Nokia phone?Question: who knows that you have free navigation on the Nokia devices?You know that navigation needs some sort of destination and that they are often interesting places (not just your friend’s address  )Hence Nokia built an ecosystem of applications/services to deal with this, and has bought some big names in the Geo/Mapping/SatNav world
  • #6: Names - one mandatory 'default' name and many optional alternative names in multiple languagesCategories - at least one predefined core categoryTags - free text keywordsLocation information – the longitude and latitude of the place on Earth as well as the postal addressContact data - information such as phone number, email address and website URL that may help matching and de-duplication
  • #7: The registry is from where all place information comes from for maps on the web and on Nokia devices – and soon on Windows Phone devices too.
  • #9: Very large? The financial industries will think not …Is it something that any of you will be dealing with? I think so …XX = I can’t say this number ;-o There’s a “Where 2.0” conference coming up soon and those folks have asked that I don’t steal their thunder!
  • #10: These are odd graphs. It’s an unusually quiet Monday morning, but clearly shows a background/underlying pattern.OPR is often a “write heavy” application. Read operations generally take place in constant time, the response times are generally consistent. Write times are pretty quick (200ms or so) for updates to existing POIs and much longer for registrations of new POIS (we check the existing places to see if we already know about that place – a process known as “matching”).The fact that we are (a) write heavy and (b) write costly was something that affected our decision to move to NoSQL.
  • #11: There are more complications here – I had to build another KV (which had the same API as the Nokia vShards) which could be replicated across multiple data centres (as I am in two and will soon be in three) and the Nokia vShards does not yet support multiple data centres.Apologies if you were hoping to see CouchDB-specific lessons
  • #12: Learnt about the business, the first version
  • #13: Moving to custom JPA-based DAOs helped enormously, but writing so many rows in many different tables (some of which were UNINDEXED to speed up insert performance) things were slow
  • #15: If everything “just works” this won’t be an issue, but, …When problems arise, as the ops team still see a regular SQL database and unless you have “real dev-ops” (another top tip) you’re app teams will have to check people’s assumptions
  • #16: In the first releases, when the NoSQL is a synchronised “slave” of the SQL the answer is clearIn later releases if you have a single “cutover” release the answer is clearIf the two co-exist side by side, you will want to have a predefined policy for handling thisbut how do you know that your two systems disagree?
  • #17: If you’re at this conference and you’re in the state of putting a new storage engine under an existing application you probably have more code than you want. Code to check two stores on each request is extraordinarily hard to write and have unit/integrations tests for
  • #18: It can be easy if you can have some sort of cross-system two phased commit
  • #21: What's wrong with what you have? It will take some time - during which things will get worseDo you have to maintain operational capability/up time/processing/new features/customers/clients etcetc while rebuilding? If so there may wellbe senior people in your organisation who do not understand the pros and cons of NoSQL and will be skeptical.
  • #22: It’s just as easy to make bad KV JSON design decisions as it is to make bad schema designs!The team argued over the JSON and we changed the structure a few times. This then meant re-writes of code that accessed things. For other reasons we did not use model objects – and these changes therefore took more and more time.
  • #24: We needed it as we had a wetware/pilot error when one DBA splatted our production database and also managed to corrupt the backup. Good work there. 1.2 million POIs were lost from the SQL database – but as they were in the NoSQL we were able eventually to repair.What are seeders?
  • #25: We needed it as we had a wetware/pilot error when one DBA splatted our production database and also managed to corrupt the backup. Good work there. 1.2 million POIs were lost from the SQL database – but as they were in the NoSQL we were able eventually to