SlideShare a Scribd company logo
NoSQL
SQL anti patterns and NoSQL alternatives

           Gleicon Moraes

    http://zenmachine.wordpress.com
         http://github.com/gleicon
Doing it wrong, Junior !
SQL Anti patterns
and related stuff


  The eternal tree (rows refer to the table itself - think
  threaded discussion)
  Dynamic table creation (and dynamic query building)
  Table as cache (lets save it in another table)
  Table as queue (wtf)
  Table as log file (table cleaning slave required)
  Stoned Procedures (living la vida business)
  Row Alignment (the careful gentleman)
  Extreme JOINs (app requires a warmed up cache)
  Your scheme must be printed in an A3 sheet.
  Your ORM issue full queries for Dataset iterations
The eternal tree
Problem: Most threaded discussion example uses something
like a table which contains all threads and answers, relating to
each other by an id. Usually the developer will come up with
his own binary-tree version to manage this mess.

id - parent_id -author - text
1 - 0 - gleicon - hello world
2 - 1 - elvis - shout !

NoSQL alternative: Document storage:
{ thread_id:1, title: 'the meeting', author: 'gleicon', replies:[
     {
       'author': elvis, text:'shout', replies:[{...}]
     }
   ]
}
Dynamic table creation
Problem: To avoid huge tables, one must come with a
"dynamic schema". For example, lets think about a document
management company, which is adding new facilities over the
country. For each storage facility, a new table is created:

item_id - row - column - stuff
1 - 10 - 20 - cat food
2 - 12 - 32 - trout

Now you have to come up with "dynamic queries", which will
probably query a "central storage" table and issue a huge join
to check if you have enough cat food over the country.

NoSQL alternative:
- Document storage, modeling a facility as a document
- Key/Value, modeling each facility as a SET
Table as cache
Problem: Complex queries demand that a result be stored in a
separated table, so it can be queried quickly. Worst than views


NoSQL alternative:
- Really ?
- Memcached
- Redis + AOF + EXPIRE
- Denormalization
Table as queue
Problem: A table which holds messages to be completed.
Worse, they must be ordered.

NoSQL alternative:
- RestMQ, Resque
- Any other message broker
- Redis (LISTS - LPUSH + RPOP)
- Use the right tool
Table as log file
Problem: A table in which data gets written as a log file. From
time to time it needs to be purged. Truncating this table once
a day usually is the first task assigned to new DBAs.

NoSQL alternative:
- MongoDB capped collection
- Redis, and a RRD pattern
- RIAK
Stoned procedures
Problem: Stored procedures hold most of your applications
logic. Also, some triggers are used to - well - trigger important
data events.

SP and triggers has the magic property of vanishing of our
memories and being impossible to keep versioned.

NoSQL alternative:
- Now be careful so you dont use map/reduce as stoned
procedures.
- Use your preferred language for business stuff, and let event
handling to pub/sub or message queues.
Row Alignment
Problem: Extra rows are created but not used, just in case.
Usually they are named as a1, a2, a3, a4 and called padding.

There's good will behind that, specially when version 1 of the
software needed an extra column in a 150M lines database
and it took 2 days to run an ALTER TABLE.

NoSQL alternative:
- Document based databases as MongoDB and CouchDB,
where new atributes are local to the document. Also, having no
schema helps

- Column based databases may be not the best choice if
column creation need restart/migrations
Extreme JOINs
Problem: Business stuff modeled as tables. Table inheritance
(Product -> SubProduct_A). To find the complete data for a
user plan, one must issue gigantic queries with lots of JOINs.

NoSQL alternative:
- Document storage, as MongoDB
- Denormalization
- Serialized objects
Your scheme fits in an A3 sheet
Problem: Huge data schemes are difficult to manage. Extreme
specialization creates tables which converges to key/value
model. The normal form get priority over common sense.

Product_A      Product_B
id - desc      id - desc

NoSQL alternative:
- Denormalization
- Another scheme ?
- Document store for flattening model
- Key/Value
Your ORM ...
Problem: Your ORM issue full queries for dataset iterations,
your ORM maps and creates tables which mimics your
classes, even the inheritance, and the performance is bad
because the queries are huge, etc, etc

NoSQL alternative:
Apart from denormalization and good old common sense,
ORMs are trying to bridge two things with distinct impedance.

There is nothing to relational models which maps cleanly to
classes and objects. Not even the basic unit which is the
domain(set) of each column. Black Magic ?
No silver bullet
- Consider alternatives

- Think outside the norm

- Denormalize

- Simplify
Cycle of changes - Product A
1.   There was the database model
2.   Then, the cache was needed. Performance was no good.
3.   Cache key: query, value: resultset
4.   High or inexistent expiration time [w00t]

(Now there's a turning point. Data didn't need to change often.
Denormalization was a given with cache)

5. The cache needs to be warmed or the app wont work.
6. Key/Value storage was a natural choice. No data on MySQL
anymore.
Cycle of changes - Product B
1. Postgres DB storing crawler results.
2. There was a counter in each row, and updating this counter
   caused contention errors.
3. Memcache for reads. Performance is better.
4. First MongoDB test, no more deadlocks from counter
   update.
5. Data model was simplified, the entire crawled doc was
   stored.
Stuff to think about
Think if the data you use aren't denormalized (cached)

Most of the anti-patterns contain signs that the NoSQL route
(or at least a partial NoSQL route) may simplify.

Are you dependent on cache ? Does your application fails
when there is no cache ? Does it just slows down ?

Are you ready to think more about your data ?

Think about the way to put and to get back your data from the
database (be it SQL or NoSQL).
Extra - MongoDB and Redis
The next two slides are here to show what is like to use
MongoDB and Redis for the same task.

There is more to managing your data than stuffing it inside a
database. You gotta plan ahead for searches and migrations.

This example is about storing books and searching between
them. MongoDB makes it simpler, just liek using its query
language. Redis requires that you keep track of tags and ids to
use SET operations to recover which books you want.

Check http://rediscookbook.org and http://cookbook.mongodb.
org/ for recipes on data handling.
MongoDB/Redis recap - Books
MongoDB                                           Redis
{
 'id': 1,
                                                  SET book:1 {'title' : 'Diving into Python',
 'title' : 'Diving into Python',
                                                  'author': 'Mark Pilgrim'}
 'author': 'Mark Pilgrim',
                                                  SET book:2 { 'title' : 'Programing Erlang',
 'tags': ['python','programming', 'computing']
                                                  'author': 'Joe Armstrong'}
}
                                                  SET book:3 { 'title' : 'Programing in Haskell',
                                                  'author': 'Graham Hutton'}
{
 'id':2,
 'title' : 'Programing Erlang',                   SADD tag:python 1
 'author': 'Joe Armstrong',                       SADD tag:erlang 2
 'tags': ['erlang','programming', 'computing',    SADD tag:haskell 3
'distributedcomputing', 'FP']                     SADD tag:programming 1 2 3
}                                                 SADD tag computing 1 2 3
                                                  SADD tag:distributedcomputing 2
{                                                 SADD tag:FP 2 3
 'id':3,
 'title' : 'Programing in Haskell',
 'author': 'Graham Hutton',
 'tags': ['haskell','programming', 'computing',
'FP']
}
MongoDB/Redis recap - Books
MongoDB                                     Redis
Search tags for erlang or haskell:
                                            SINTER 'tag:erlang' 'tag:haskell'
db.books.find({"tags":
                                            0 results
     { $in: ['erlang', 'haskell']
   }
                                            SINTER 'tag:programming' 'tag:computing'
})
                                            3 results: 1, 2, 3
Search tags for erlang AND haskell (no
                                            SUNION 'tag:erlang' 'tag:haskell'
results)
                                            2 results: 2 and 3
db.books.find({"tags":
                                            SDIFF 'tag:programming' 'tag:haskell'
     { $all: ['erlang', 'haskell']
                                            2 results: 1 and 2 (haskell is excluded)
   }
})

This search yields 3 results
db.books.find({"tags":
     { $all: ['programming', 'computing']
   }
})
Ad

Recommended

Redis
Redis
Gleicon Moraes
 
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
zznate
 
gRPC in Go
gRPC in Go
Almog Baku
 
Building a High-Performance Distributed Task Queue on MongoDB
Building a High-Performance Distributed Task Queue on MongoDB
MongoDB
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for Logs
Sematext Group, Inc.
 
Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12
N Masahiro
 
RestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message Queue
Gleicon Moraes
 
On Centralizing Logs
On Centralizing Logs
Sematext Group, Inc.
 
Easy deployment & management of cloud apps
Easy deployment & management of cloud apps
David Cunningham
 
Redis modules 101
Redis modules 101
Dvir Volk
 
Tuning Solr for Logs
Tuning Solr for Logs
Sematext Group, Inc.
 
Top 10 Perl Performance Tips
Top 10 Perl Performance Tips
Perrin Harkins
 
Presto overview
Presto overview
Shixiong Zhu
 
Jsonnet, terraform & packer
Jsonnet, terraform & packer
David Cunningham
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Modern Data Stack France
 
Devinsampa nginx-scripting
Devinsampa nginx-scripting
Tony Fabeen
 
Storing 16 Bytes at Scale
Storing 16 Bytes at Scale
Fabian Reinartz
 
Fluentd meetup #2
Fluentd meetup #2
Treasure Data, Inc.
 
Streams are Awesome - (Node.js) TimesOpen Sep 2012
Streams are Awesome - (Node.js) TimesOpen Sep 2012
Tom Croucher
 
Building Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::Client
Mike Friedman
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
Sematext Group, Inc.
 
Docker Monitoring Webinar
Docker Monitoring Webinar
Sematext Group, Inc.
 
Web program-peformance-optimization
Web program-peformance-optimization
xiaojueqq12345
 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK
hypto
 
The basics of fluentd
The basics of fluentd
Treasure Data, Inc.
 
Getting Started with PL/Proxy
Getting Started with PL/Proxy
Peter Eisentraut
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013
Andrew Dunstan
 
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Data Con LA
 
Architectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handling
Gleicon Moraes
 
Scala+data
Scala+data
Samir Bessalah
 

More Related Content

What's hot (20)

Easy deployment & management of cloud apps
Easy deployment & management of cloud apps
David Cunningham
 
Redis modules 101
Redis modules 101
Dvir Volk
 
Tuning Solr for Logs
Tuning Solr for Logs
Sematext Group, Inc.
 
Top 10 Perl Performance Tips
Top 10 Perl Performance Tips
Perrin Harkins
 
Presto overview
Presto overview
Shixiong Zhu
 
Jsonnet, terraform & packer
Jsonnet, terraform & packer
David Cunningham
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Modern Data Stack France
 
Devinsampa nginx-scripting
Devinsampa nginx-scripting
Tony Fabeen
 
Storing 16 Bytes at Scale
Storing 16 Bytes at Scale
Fabian Reinartz
 
Fluentd meetup #2
Fluentd meetup #2
Treasure Data, Inc.
 
Streams are Awesome - (Node.js) TimesOpen Sep 2012
Streams are Awesome - (Node.js) TimesOpen Sep 2012
Tom Croucher
 
Building Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::Client
Mike Friedman
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
Sematext Group, Inc.
 
Docker Monitoring Webinar
Docker Monitoring Webinar
Sematext Group, Inc.
 
Web program-peformance-optimization
Web program-peformance-optimization
xiaojueqq12345
 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK
hypto
 
The basics of fluentd
The basics of fluentd
Treasure Data, Inc.
 
Getting Started with PL/Proxy
Getting Started with PL/Proxy
Peter Eisentraut
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013
Andrew Dunstan
 
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Data Con LA
 
Easy deployment & management of cloud apps
Easy deployment & management of cloud apps
David Cunningham
 
Redis modules 101
Redis modules 101
Dvir Volk
 
Top 10 Perl Performance Tips
Top 10 Perl Performance Tips
Perrin Harkins
 
Jsonnet, terraform & packer
Jsonnet, terraform & packer
David Cunningham
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Modern Data Stack France
 
Devinsampa nginx-scripting
Devinsampa nginx-scripting
Tony Fabeen
 
Storing 16 Bytes at Scale
Storing 16 Bytes at Scale
Fabian Reinartz
 
Streams are Awesome - (Node.js) TimesOpen Sep 2012
Streams are Awesome - (Node.js) TimesOpen Sep 2012
Tom Croucher
 
Building Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::Client
Mike Friedman
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
Sematext Group, Inc.
 
Web program-peformance-optimization
Web program-peformance-optimization
xiaojueqq12345
 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK
hypto
 
Getting Started with PL/Proxy
Getting Started with PL/Proxy
Peter Eisentraut
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013
Andrew Dunstan
 
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Data Con LA
 

Similar to NoSQL and SQL Anti Patterns (20)

Architectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handling
Gleicon Moraes
 
Scala+data
Scala+data
Samir Bessalah
 
Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011
bostonrb
 
Architectural anti-patterns for data handling
Architectural anti-patterns for data handling
Gleicon Moraes
 
A fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFs
Holden Karau
 
Mapreduce Algorithms
Mapreduce Algorithms
Amund Tveit
 
Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandra
zznate
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
Hyderabad Scalability Meetup
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
Paul Chao
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
Databricks
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON
Padma shree. T
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
Holden Karau
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Databricks
 
NoSql Introduction
NoSql Introduction
Gleicon Moraes
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial En
Ankur Dongre
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial En
Ankur Dongre
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Holden Karau
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Ilya Ganelin
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Databricks
 
Architectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handling
Gleicon Moraes
 
Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011
bostonrb
 
Architectural anti-patterns for data handling
Architectural anti-patterns for data handling
Gleicon Moraes
 
A fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFs
Holden Karau
 
Mapreduce Algorithms
Mapreduce Algorithms
Amund Tveit
 
Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandra
zznate
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
Hyderabad Scalability Meetup
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
Paul Chao
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
Databricks
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON
Padma shree. T
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
Holden Karau
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Databricks
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial En
Ankur Dongre
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial En
Ankur Dongre
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Holden Karau
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Ilya Ganelin
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Databricks
 
Ad

More from Gleicon Moraes (15)

Como arquiteturas de dados quebram
Como arquiteturas de dados quebram
Gleicon Moraes
 
Arquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devops
Gleicon Moraes
 
API Gateway report
API Gateway report
Gleicon Moraes
 
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...
Gleicon Moraes
 
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
Gleicon Moraes
 
Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014
Gleicon Moraes
 
Locaweb cloud and sdn
Locaweb cloud and sdn
Gleicon Moraes
 
A closer look to locaweb IaaS
A closer look to locaweb IaaS
Gleicon Moraes
 
Semi Automatic Sentiment Analysis
Semi Automatic Sentiment Analysis
Gleicon Moraes
 
L'esprit de l'escalier
L'esprit de l'escalier
Gleicon Moraes
 
OSCon - Performance vs Scalability
OSCon - Performance vs Scalability
Gleicon Moraes
 
Architectural Anti Patterns - Notes on Data Distribution and Handling Failures
Architectural Anti Patterns - Notes on Data Distribution and Handling Failures
Gleicon Moraes
 
Architecture by Accident
Architecture by Accident
Gleicon Moraes
 
Patterns of fail
Patterns of fail
Gleicon Moraes
 
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Gleicon Moraes
 
Como arquiteturas de dados quebram
Como arquiteturas de dados quebram
Gleicon Moraes
 
Arquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devops
Gleicon Moraes
 
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...
Gleicon Moraes
 
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
Gleicon Moraes
 
Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014
Gleicon Moraes
 
A closer look to locaweb IaaS
A closer look to locaweb IaaS
Gleicon Moraes
 
Semi Automatic Sentiment Analysis
Semi Automatic Sentiment Analysis
Gleicon Moraes
 
L'esprit de l'escalier
L'esprit de l'escalier
Gleicon Moraes
 
OSCon - Performance vs Scalability
OSCon - Performance vs Scalability
Gleicon Moraes
 
Architectural Anti Patterns - Notes on Data Distribution and Handling Failures
Architectural Anti Patterns - Notes on Data Distribution and Handling Failures
Gleicon Moraes
 
Architecture by Accident
Architecture by Accident
Gleicon Moraes
 
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Gleicon Moraes
 
Ad

Recently uploaded (20)

CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
The Future of Product Management in AI ERA.pdf
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
Python Conference Singapore - 19 Jun 2025
Python Conference Singapore - 19 Jun 2025
ninefyi
 
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
The Future of Product Management in AI ERA.pdf
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
Python Conference Singapore - 19 Jun 2025
Python Conference Singapore - 19 Jun 2025
ninefyi
 
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Fwdays
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 

NoSQL and SQL Anti Patterns

  • 1. NoSQL SQL anti patterns and NoSQL alternatives Gleicon Moraes http://zenmachine.wordpress.com http://github.com/gleicon
  • 2. Doing it wrong, Junior !
  • 3. SQL Anti patterns and related stuff The eternal tree (rows refer to the table itself - think threaded discussion) Dynamic table creation (and dynamic query building) Table as cache (lets save it in another table) Table as queue (wtf) Table as log file (table cleaning slave required) Stoned Procedures (living la vida business) Row Alignment (the careful gentleman) Extreme JOINs (app requires a warmed up cache) Your scheme must be printed in an A3 sheet. Your ORM issue full queries for Dataset iterations
  • 4. The eternal tree Problem: Most threaded discussion example uses something like a table which contains all threads and answers, relating to each other by an id. Usually the developer will come up with his own binary-tree version to manage this mess. id - parent_id -author - text 1 - 0 - gleicon - hello world 2 - 1 - elvis - shout ! NoSQL alternative: Document storage: { thread_id:1, title: 'the meeting', author: 'gleicon', replies:[ { 'author': elvis, text:'shout', replies:[{...}] } ] }
  • 5. Dynamic table creation Problem: To avoid huge tables, one must come with a "dynamic schema". For example, lets think about a document management company, which is adding new facilities over the country. For each storage facility, a new table is created: item_id - row - column - stuff 1 - 10 - 20 - cat food 2 - 12 - 32 - trout Now you have to come up with "dynamic queries", which will probably query a "central storage" table and issue a huge join to check if you have enough cat food over the country. NoSQL alternative: - Document storage, modeling a facility as a document - Key/Value, modeling each facility as a SET
  • 6. Table as cache Problem: Complex queries demand that a result be stored in a separated table, so it can be queried quickly. Worst than views NoSQL alternative: - Really ? - Memcached - Redis + AOF + EXPIRE - Denormalization
  • 7. Table as queue Problem: A table which holds messages to be completed. Worse, they must be ordered. NoSQL alternative: - RestMQ, Resque - Any other message broker - Redis (LISTS - LPUSH + RPOP) - Use the right tool
  • 8. Table as log file Problem: A table in which data gets written as a log file. From time to time it needs to be purged. Truncating this table once a day usually is the first task assigned to new DBAs. NoSQL alternative: - MongoDB capped collection - Redis, and a RRD pattern - RIAK
  • 9. Stoned procedures Problem: Stored procedures hold most of your applications logic. Also, some triggers are used to - well - trigger important data events. SP and triggers has the magic property of vanishing of our memories and being impossible to keep versioned. NoSQL alternative: - Now be careful so you dont use map/reduce as stoned procedures. - Use your preferred language for business stuff, and let event handling to pub/sub or message queues.
  • 10. Row Alignment Problem: Extra rows are created but not used, just in case. Usually they are named as a1, a2, a3, a4 and called padding. There's good will behind that, specially when version 1 of the software needed an extra column in a 150M lines database and it took 2 days to run an ALTER TABLE. NoSQL alternative: - Document based databases as MongoDB and CouchDB, where new atributes are local to the document. Also, having no schema helps - Column based databases may be not the best choice if column creation need restart/migrations
  • 11. Extreme JOINs Problem: Business stuff modeled as tables. Table inheritance (Product -> SubProduct_A). To find the complete data for a user plan, one must issue gigantic queries with lots of JOINs. NoSQL alternative: - Document storage, as MongoDB - Denormalization - Serialized objects
  • 12. Your scheme fits in an A3 sheet Problem: Huge data schemes are difficult to manage. Extreme specialization creates tables which converges to key/value model. The normal form get priority over common sense. Product_A Product_B id - desc id - desc NoSQL alternative: - Denormalization - Another scheme ? - Document store for flattening model - Key/Value
  • 13. Your ORM ... Problem: Your ORM issue full queries for dataset iterations, your ORM maps and creates tables which mimics your classes, even the inheritance, and the performance is bad because the queries are huge, etc, etc NoSQL alternative: Apart from denormalization and good old common sense, ORMs are trying to bridge two things with distinct impedance. There is nothing to relational models which maps cleanly to classes and objects. Not even the basic unit which is the domain(set) of each column. Black Magic ?
  • 14. No silver bullet - Consider alternatives - Think outside the norm - Denormalize - Simplify
  • 15. Cycle of changes - Product A 1. There was the database model 2. Then, the cache was needed. Performance was no good. 3. Cache key: query, value: resultset 4. High or inexistent expiration time [w00t] (Now there's a turning point. Data didn't need to change often. Denormalization was a given with cache) 5. The cache needs to be warmed or the app wont work. 6. Key/Value storage was a natural choice. No data on MySQL anymore.
  • 16. Cycle of changes - Product B 1. Postgres DB storing crawler results. 2. There was a counter in each row, and updating this counter caused contention errors. 3. Memcache for reads. Performance is better. 4. First MongoDB test, no more deadlocks from counter update. 5. Data model was simplified, the entire crawled doc was stored.
  • 17. Stuff to think about Think if the data you use aren't denormalized (cached) Most of the anti-patterns contain signs that the NoSQL route (or at least a partial NoSQL route) may simplify. Are you dependent on cache ? Does your application fails when there is no cache ? Does it just slows down ? Are you ready to think more about your data ? Think about the way to put and to get back your data from the database (be it SQL or NoSQL).
  • 18. Extra - MongoDB and Redis The next two slides are here to show what is like to use MongoDB and Redis for the same task. There is more to managing your data than stuffing it inside a database. You gotta plan ahead for searches and migrations. This example is about storing books and searching between them. MongoDB makes it simpler, just liek using its query language. Redis requires that you keep track of tags and ids to use SET operations to recover which books you want. Check http://rediscookbook.org and http://cookbook.mongodb. org/ for recipes on data handling.
  • 19. MongoDB/Redis recap - Books MongoDB Redis { 'id': 1, SET book:1 {'title' : 'Diving into Python', 'title' : 'Diving into Python', 'author': 'Mark Pilgrim'} 'author': 'Mark Pilgrim', SET book:2 { 'title' : 'Programing Erlang', 'tags': ['python','programming', 'computing'] 'author': 'Joe Armstrong'} } SET book:3 { 'title' : 'Programing in Haskell', 'author': 'Graham Hutton'} { 'id':2, 'title' : 'Programing Erlang', SADD tag:python 1 'author': 'Joe Armstrong', SADD tag:erlang 2 'tags': ['erlang','programming', 'computing', SADD tag:haskell 3 'distributedcomputing', 'FP'] SADD tag:programming 1 2 3 } SADD tag computing 1 2 3 SADD tag:distributedcomputing 2 { SADD tag:FP 2 3 'id':3, 'title' : 'Programing in Haskell', 'author': 'Graham Hutton', 'tags': ['haskell','programming', 'computing', 'FP'] }
  • 20. MongoDB/Redis recap - Books MongoDB Redis Search tags for erlang or haskell: SINTER 'tag:erlang' 'tag:haskell' db.books.find({"tags": 0 results { $in: ['erlang', 'haskell'] } SINTER 'tag:programming' 'tag:computing' }) 3 results: 1, 2, 3 Search tags for erlang AND haskell (no SUNION 'tag:erlang' 'tag:haskell' results) 2 results: 2 and 3 db.books.find({"tags": SDIFF 'tag:programming' 'tag:haskell' { $all: ['erlang', 'haskell'] 2 results: 1 and 2 (haskell is excluded) } }) This search yields 3 results db.books.find({"tags": { $all: ['programming', 'computing'] } })