SlideShare a Scribd company logo
How to choose
a database
Vsevolod Solovyov
Why bother
• It's the most fundamental thing about the
project
• Even programming languages are switched
more often
• With wrong choice you'll suffer. But why?
Different cases
• I certainly know
what I need
• I'm not sure,
something will do
• This tech is cool!
• Serious business:
risk/benefit
management
• Small project:

??
• Pet project:

fun/learning
management
I know what I need
• Huge time series DB
• Cross-datacenter replication
• Petabytes of data
• ...
• This talk is not for you
This tech is COOL
• Pet projects
• Experiments
• Beware otherwise
Consider this
• Data correctness (ACID, enforced schema)
• Easy data modeling
• Operational complexity
• Migrations
• Scaling
• Project use cases — no one knows them
yet!
Data correctness
• Silently losing data is not much fun
• Schema-less is a lie
• Heterogeneous data is hard to analyze,
change, display
• Especially true in data-heavy projects
Data modeling
• In document-oriented DB (e.g. MongoDB)
we need to specially craft "tables"
according to anticipated queries.

And re-craft them when queries change!
• Much easier in RDBMS, just dump it in
adequate tables and slap some indexes
• Datomic is best here: 

Entity-Attribute-Value
Migrations
• Often overlooked part
• Keep them in repository!
• Transactional DDL
• Developing migrations in REPL
• Downgrade migrations are useless
Migration tools
• Native migrations (SQL, CQL, Datalog, etc)
are best
• Don't do auto-migrations ever
• Don't use tools that give you auto-
migrations
• Use something like nomad, migrate
Effort
Migration complexity
ORM migration tools Native (SQL, etc)
Tools learning curve
Complex migrations
• Create new column/table/database
• Read from the old place and write to both
• Migrate all old data to the new place
• Read from both places and compare
• Clean up old code and data
Performance
Data model matters here
Read documentation!
Low-level details
• Pages
• Rows, columns, column families
• MVCC
• TOAST
• Index types: B-Tree, hash, GIN, GiST, BRIN
• Sharding hash/range, sharding key
Know your tools!
• Log slow queries (pg_stat, slowlog, etc)
• EXPLAIN
• EXPLAIN (ANALYZE,VERBOSE, BUFFERS)
• htop, iotop, perf, etc
ORMs considered
harmful
• They provoke massive data over-fetch
• Easy-to-miss 1+N queries
• Hard to refactor and move parts of data to
other DBs
• Very leaky abstraction
Over-fetch
clustering = Clustering.query.get(33)
depth2cid = defaultdict(list)
for cl in clustering.clusters:
depth2cid[cl.level].append(cl.id)
Over-fetch
clustering = Clustering.query.get(33)
depth2cid = defaultdict(list)
for cl in clustering.clusters:
depth2cid[cl.level].append(cl.id)
11.4 Gb RAM
50 seconds
Proper-fetch
query = (
db.session
.query(Cluster.id, Cluster.level)
.filter(Cluster.clustering_id == 33))
depth2cid = defaultdict(list)
for cid, level in query:
depth2cid[level].append(cid)
54 Mb RAM
1 second
Properer-fetch
7 Mb RAM
1.2 seconds
query = (
db.session
.query(Cluster.id, Cluster.level)
.filter(Cluster.clustering_id == 33)
.yield_per(1000))
depth2cid = defaultdict(list)
for cid, level in query:
depth2cid[level].append(cid)
Scale
Distributed FUD
• http://jepsen.io/
• CORDS: Redundancy does not imply fault
tolerance - the morning paper
• Do you need it really? RAM is plentiful
Buy bigger server
Scale
Buy bigger
server
Scale
Well...
By that time you will
know what you need
Horizontally scalable
• Citus
• VoltDB
• Cassandra
• CockroachDB
• ...
Pull out data bit-by-bit
A
AAvailability
Availability
• Services mostly die from other problems
• Untested "available" DB can be a problem
• Properly available system (CAP-available) is
a pain and resource sink
NoDB
• Images
• Machine learning models
• ...
• File system, S3, B2, etc
At Cap'n Obvious
• Experiment at home
• Don't bring new DB in a big project just
because it's interesting
• Not sure? Postgres to the rescue
Ad

More Related Content

Similar to How to choose a database (20)

Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
Tony Tam
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
Thang Bui (Bob)
 
FP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit HoleFP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit Hole
Christophe Grand
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
Ike Ellis
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)
guest0f8e278
 
Lag Sucks! GDC 2012
Lag Sucks! GDC 2012Lag Sucks! GDC 2012
Lag Sucks! GDC 2012
realjenius
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
Ike Ellis
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
Shree Shree
 
Stig: Social Graphs & Discovery at Scale
Stig: Social Graphs & Discovery at ScaleStig: Social Graphs & Discovery at Scale
Stig: Social Graphs & Discovery at Scale
DATAVERSITY
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
Yan Cui
 
cours database pour etudiant NoSQL (1).pptx
cours database pour etudiant NoSQL (1).pptxcours database pour etudiant NoSQL (1).pptx
cours database pour etudiant NoSQL (1).pptx
ssuser1fde9c
 
From ddd to DDD : My journey from data-driven development to Domain-Driven De...
From ddd to DDD : My journey from data-driven development to Domain-Driven De...From ddd to DDD : My journey from data-driven development to Domain-Driven De...
From ddd to DDD : My journey from data-driven development to Domain-Driven De...
Thibaud Desodt
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
Zohar Elkayam
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
Don Demcsak
 
Scaling etl with hadoop shapira 3
Scaling etl with hadoop   shapira 3Scaling etl with hadoop   shapira 3
Scaling etl with hadoop shapira 3
Gwen (Chen) Shapira
 
Introduction to Data Science NoSQL.pptx
Introduction to Data Science  NoSQL.pptxIntroduction to Data Science  NoSQL.pptx
Introduction to Data Science NoSQL.pptx
tarakesh7199
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
Bhupesh Bansal
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
RithikRaj25
 
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
Using JPA applications in the era of NoSQL: Introducing Hibernate OGMUsing JPA applications in the era of NoSQL: Introducing Hibernate OGM
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
PT.JUG
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
George Stathis
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
Tony Tam
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
Thang Bui (Bob)
 
FP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit HoleFP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit Hole
Christophe Grand
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
Ike Ellis
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)
guest0f8e278
 
Lag Sucks! GDC 2012
Lag Sucks! GDC 2012Lag Sucks! GDC 2012
Lag Sucks! GDC 2012
realjenius
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
Ike Ellis
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
Shree Shree
 
Stig: Social Graphs & Discovery at Scale
Stig: Social Graphs & Discovery at ScaleStig: Social Graphs & Discovery at Scale
Stig: Social Graphs & Discovery at Scale
DATAVERSITY
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
Yan Cui
 
cours database pour etudiant NoSQL (1).pptx
cours database pour etudiant NoSQL (1).pptxcours database pour etudiant NoSQL (1).pptx
cours database pour etudiant NoSQL (1).pptx
ssuser1fde9c
 
From ddd to DDD : My journey from data-driven development to Domain-Driven De...
From ddd to DDD : My journey from data-driven development to Domain-Driven De...From ddd to DDD : My journey from data-driven development to Domain-Driven De...
From ddd to DDD : My journey from data-driven development to Domain-Driven De...
Thibaud Desodt
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
Don Demcsak
 
Scaling etl with hadoop shapira 3
Scaling etl with hadoop   shapira 3Scaling etl with hadoop   shapira 3
Scaling etl with hadoop shapira 3
Gwen (Chen) Shapira
 
Introduction to Data Science NoSQL.pptx
Introduction to Data Science  NoSQL.pptxIntroduction to Data Science  NoSQL.pptx
Introduction to Data Science NoSQL.pptx
tarakesh7199
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
Bhupesh Bansal
 
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
Using JPA applications in the era of NoSQL: Introducing Hibernate OGMUsing JPA applications in the era of NoSQL: Introducing Hibernate OGM
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
PT.JUG
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
George Stathis
 

More from Vsevolod Solovyov (6)

Data science: з печі до столу
Data science: з печі до столуData science: з печі до столу
Data science: з печі до столу
Vsevolod Solovyov
 
How to debug
How to debugHow to debug
How to debug
Vsevolod Solovyov
 
Data science from the trenches
Data science from the trenchesData science from the trenches
Data science from the trenches
Vsevolod Solovyov
 
Будни data science/NLP стартапа
Будни data science/NLP стартапаБудни data science/NLP стартапа
Будни data science/NLP стартапа
Vsevolod Solovyov
 
Introduction to information retrieval
Introduction to information retrievalIntroduction to information retrieval
Introduction to information retrieval
Vsevolod Solovyov
 
Apache Kafka and stream processing peculiarities [ru]
Apache Kafka and stream processing peculiarities [ru]Apache Kafka and stream processing peculiarities [ru]
Apache Kafka and stream processing peculiarities [ru]
Vsevolod Solovyov
 
Data science: з печі до столу
Data science: з печі до столуData science: з печі до столу
Data science: з печі до столу
Vsevolod Solovyov
 
Data science from the trenches
Data science from the trenchesData science from the trenches
Data science from the trenches
Vsevolod Solovyov
 
Будни data science/NLP стартапа
Будни data science/NLP стартапаБудни data science/NLP стартапа
Будни data science/NLP стартапа
Vsevolod Solovyov
 
Introduction to information retrieval
Introduction to information retrievalIntroduction to information retrieval
Introduction to information retrieval
Vsevolod Solovyov
 
Apache Kafka and stream processing peculiarities [ru]
Apache Kafka and stream processing peculiarities [ru]Apache Kafka and stream processing peculiarities [ru]
Apache Kafka and stream processing peculiarities [ru]
Vsevolod Solovyov
 
Ad

Recently uploaded (20)

Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Ad

How to choose a database