SlideShare a Scribd company logo
ELASTICSEARCH –
SCALABILITY AND
MULTITENANCY
Bozhidar Bozhanov
ABOUT ME
• Founder at LogSentinel, an information security startup
• LogSentinel SIEM – product that indexes billions of logs with Elasticsearch
• https://techblog.bozho.net
• https://twitter.com/bozhobg
SCALABILITY AND MULTITENANCY
• Scalability – how to process millions (billions) of documents on multiple machines
• Multitenancy – how to have our system support multiple users/organizations while
segregating their data
• One can exist without the other
• Both are architectural and implementation tasks, not (just) work for Ops.
• „We’ ll push the data in whatever form and Ops will take care of the scaling “
ELASTICSEARCH BSICS
• “You know, for search”
• Indexing documents (document = anything)
• Full-text search and keyword search
• Allows for large clusters
• Licensing issues
USE-CASE: TIME-SERIES DATA
• Indexing events (logs, metrics, etc.)
• Wide-spread and widely applicable scenario
• Documents almost always have a timestamp
SHARDS
ZOOM-IN
LIMITING FACTORS
• One shard shouldn’t be to large
• Ideally between 10 and 50 GB; otherwise recovery after failure may not work
• The number of shards on a node is limited by RAM
• Lucene segments are append-only
• A large number of segments reduce performance
MULTITENANCY
• Cluster-per-tenant
• Heavy for administrations
• No real multitenancy
• Expensive
• Index-per-tenant
• Also heave for administration
• Doesn’t scale well
• Tenant-based routing
• Recommended in most cases
TENANT-BASED ROUTING
• _routing=<tenantId> or _routing=<tenantOwnedResourceId>
• E.g.. userId or dataSourceId
• Routing parameter designates which shard to be used for storing the document
• _routing for search requests tells Elasticsearch where to look for the data =>
faster search
• shard_num = hash(_routing) % num_primary_shards
• mappings._routing.required: true
STRUCTURE OF INDEXED DATA
• One field can have only one type
• The type is determined on index creation or on first indexed document with that
field
• User1 creates custom param “duration” of type String
• User2 wants to create “duration” of a numeric type -> error
• Solution: custom parameter hierarchies by type: params, numericParams,
dateParams, …
SCALABILITY
• „We add more machines and it’s good“?
• Recommended shard size (10-50 GB)
• We can’t change shards on a running index
• Lucene Segments are read-only:
• Deleting a document = bad
• Updating a document = bad
OPTIONS FOR STRUCTURING INDEXES
• We need a structure to allow indexing and searching in an arbitrarily large amount
of data
• One big, ever-growing index
• Convenient for small amounts of data, but faces all scalability problems
• Index-per-day / index-per-week / index-per-size
• Index-per-day-per-retention
• Rollover
• Deletion should be done by deleting whole indexes, not individual documents
MANY INDEXES FOR SEARCH, ONE FOR
INDEXING
• One search query can be directed to many indexes based on an index alias
• Supporting one (or several) active indexes for ingesting documents
• All other indexes– read-only
• This solves the problem with:
• Growing data and growing size of shards
• Deleting old data
EFFECTIVE INDEXING
• In real time (problem: too many requests to Elasticsearch)
• Storing in a database and indexing with a batch job
• Message queue (complex to implement) (we use Kafka)
• In-memory queue (might lose data)
• Batch-indexing when a given size or time threshold is reached
• Hybrid: bulk processing + database
• Quick indexing with in-memory queue + subsequent check based on the data in the database
• Avoid updates (=delete + insert)
CONCLUSION
• Elasticsearch is easy to get running
• …and complex for scaling
• Changes to a production setup are hard
• We must not throw scalability and multitenancy tasks to the Ops teams – they are
application problems
• Elasticsearch internals impose unintuitive limitations (“The law of leaky
abstractions”)
THANK YOU
Contacts: https://www.linkedin.com/in/bozhi
dar-bozhanov/
https://techblog.bozho.net
https://twitter.com/bozhobg
RESOURCES
• https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html
• https://techblog.bozho.net/elasticsearch-multitenancy-with-routing/
• https://techblog.bozho.net/near-real-time-indexing-with-elasticsearch/
• https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-
speed.html
• https://www.loggly.com/blog/nine-tips-configuring-elasticsearch-for-high-performance/
• https://tech.ebayinc.com/engineering/elasticsearch-performance-tuning-practice-at-ebay/
Ad

More Related Content

What's hot (20)

Semi Structured Data
Semi Structured DataSemi Structured Data
Semi Structured Data
MariaDB plc
 
Securing Passwords
Securing PasswordsSecuring Passwords
Securing Passwords
Mandeep Singh
 
Big Data Overview Part 1
Big Data Overview Part 1Big Data Overview Part 1
Big Data Overview Part 1
William Simms
 
Active directory 101
Active directory 101Active directory 101
Active directory 101
Utkarsh Agrawal
 
Securing data and preventing data breaches
Securing data and preventing data breachesSecuring data and preventing data breaches
Securing data and preventing data breaches
MariaDB plc
 
MongoDB meetup at Hike
MongoDB meetup at HikeMongoDB meetup at Hike
MongoDB meetup at Hike
Bharvi Dixit
 
Market Trends in Microsoft Azure
Market Trends in Microsoft AzureMarket Trends in Microsoft Azure
Market Trends in Microsoft Azure
GlobalLogic Ukraine
 
Fast, Powerful and Scalable Analytics
Fast, Powerful and Scalable AnalyticsFast, Powerful and Scalable Analytics
Fast, Powerful and Scalable Analytics
MariaDB plc
 
Elasticsearch tuning
Elasticsearch tuningElasticsearch tuning
Elasticsearch tuning
NIKHIL DUBEY
 
Introduction to Fauna
Introduction to FaunaIntroduction to Fauna
Introduction to Fauna
alialaei7
 
Building Advanced RESTFul services
Building Advanced RESTFul servicesBuilding Advanced RESTFul services
Building Advanced RESTFul services
Ortus Solutions, Corp
 
FaunaDB security
FaunaDB securityFaunaDB security
FaunaDB security
alialaei7
 
Internet of Things Cologne 2015: MongoDB Technical Presentation
Internet of Things Cologne 2015: MongoDB Technical PresentationInternet of Things Cologne 2015: MongoDB Technical Presentation
Internet of Things Cologne 2015: MongoDB Technical Presentation
MongoDB
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
Andrew Siemer
 
Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010
Eric Shupps
 
Securing private keys
Securing private keysSecuring private keys
Securing private keys
Ahsan Habib
 
Survey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeSurvey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data Landscape
Ike Ellis
 
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
NoSQLmatters
 
Análisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic StackAnálisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic Stack
Elasticsearch
 
Getting Started with SQLite
Getting Started with SQLiteGetting Started with SQLite
Getting Started with SQLite
Mindfire Solutions
 
Semi Structured Data
Semi Structured DataSemi Structured Data
Semi Structured Data
MariaDB plc
 
Big Data Overview Part 1
Big Data Overview Part 1Big Data Overview Part 1
Big Data Overview Part 1
William Simms
 
Securing data and preventing data breaches
Securing data and preventing data breachesSecuring data and preventing data breaches
Securing data and preventing data breaches
MariaDB plc
 
MongoDB meetup at Hike
MongoDB meetup at HikeMongoDB meetup at Hike
MongoDB meetup at Hike
Bharvi Dixit
 
Market Trends in Microsoft Azure
Market Trends in Microsoft AzureMarket Trends in Microsoft Azure
Market Trends in Microsoft Azure
GlobalLogic Ukraine
 
Fast, Powerful and Scalable Analytics
Fast, Powerful and Scalable AnalyticsFast, Powerful and Scalable Analytics
Fast, Powerful and Scalable Analytics
MariaDB plc
 
Elasticsearch tuning
Elasticsearch tuningElasticsearch tuning
Elasticsearch tuning
NIKHIL DUBEY
 
Introduction to Fauna
Introduction to FaunaIntroduction to Fauna
Introduction to Fauna
alialaei7
 
FaunaDB security
FaunaDB securityFaunaDB security
FaunaDB security
alialaei7
 
Internet of Things Cologne 2015: MongoDB Technical Presentation
Internet of Things Cologne 2015: MongoDB Technical PresentationInternet of Things Cologne 2015: MongoDB Technical Presentation
Internet of Things Cologne 2015: MongoDB Technical Presentation
MongoDB
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
Andrew Siemer
 
Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010
Eric Shupps
 
Securing private keys
Securing private keysSecuring private keys
Securing private keys
Ahsan Habib
 
Survey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeSurvey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data Landscape
Ike Ellis
 
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
NoSQLmatters
 
Análisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic StackAnálisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic Stack
Elasticsearch
 

Similar to Elasticsearch - Scalability and Multitenancy (20)

Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
Anubhav Kale
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
Tomas Sirny
 
Elastic & Azure & Episever, Case Evira
Elastic & Azure & Episever, Case EviraElastic & Azure & Episever, Case Evira
Elastic & Azure & Episever, Case Evira
Mikko Huilaja
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
Vinay Kumar
 
Episerver and search engines
Episerver and search enginesEpiserver and search engines
Episerver and search engines
Mikko Huilaja
 
Basic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB MeetupBasic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB Meetup
Johannes Moser
 
An intro to Azure Data Lake
An intro to Azure Data LakeAn intro to Azure Data Lake
An intro to Azure Data Lake
Rick van den Bosch
 
Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04
marc_harrison
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
Sanura Hettiarachchi
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
SATOSHI TAGOMORI
 
Introduction to Data Science NoSQL.pptx
Introduction to Data Science  NoSQL.pptxIntroduction to Data Science  NoSQL.pptx
Introduction to Data Science NoSQL.pptx
tarakesh7199
 
Revision
RevisionRevision
Revision
David Sherlock
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
David Smelker
 
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Continuent
 
Module 2.2 Introduction to NoSQL Databases.pptx
Module 2.2 Introduction to NoSQL Databases.pptxModule 2.2 Introduction to NoSQL Databases.pptx
Module 2.2 Introduction to NoSQL Databases.pptx
NiramayKolalle
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on Everything
David Phillips
 
Database Technologies
Database TechnologiesDatabase Technologies
Database Technologies
Michel de Goede
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
datastack
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
Jurriaan Persyn
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
Anubhav Kale
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
Tomas Sirny
 
Elastic & Azure & Episever, Case Evira
Elastic & Azure & Episever, Case EviraElastic & Azure & Episever, Case Evira
Elastic & Azure & Episever, Case Evira
Mikko Huilaja
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
Vinay Kumar
 
Episerver and search engines
Episerver and search enginesEpiserver and search engines
Episerver and search engines
Mikko Huilaja
 
Basic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB MeetupBasic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB Meetup
Johannes Moser
 
Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04
marc_harrison
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
SATOSHI TAGOMORI
 
Introduction to Data Science NoSQL.pptx
Introduction to Data Science  NoSQL.pptxIntroduction to Data Science  NoSQL.pptx
Introduction to Data Science NoSQL.pptx
tarakesh7199
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
David Smelker
 
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Continuent
 
Module 2.2 Introduction to NoSQL Databases.pptx
Module 2.2 Introduction to NoSQL Databases.pptxModule 2.2 Introduction to NoSQL Databases.pptx
Module 2.2 Introduction to NoSQL Databases.pptx
NiramayKolalle
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on Everything
David Phillips
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
datastack
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
Jurriaan Persyn
 
Ad

More from Bozhidar Bozhanov (20)

Откриване на фалшиви клетки за подслушване
Откриване на фалшиви клетки за подслушванеОткриване на фалшиви клетки за подслушване
Откриване на фалшиви клетки за подслушване
Bozhidar Bozhanov
 
Wiretap Detector - detecting cell-site simulators
Wiretap Detector - detecting cell-site simulatorsWiretap Detector - detecting cell-site simulators
Wiretap Detector - detecting cell-site simulators
Bozhidar Bozhanov
 
Антикорупционен софтуер
Антикорупционен софтуерАнтикорупционен софтуер
Антикорупционен софтуер
Bozhidar Bozhanov
 
Nothing is secure.pdf
Nothing is secure.pdfNothing is secure.pdf
Nothing is secure.pdf
Bozhidar Bozhanov
 
Blockchain overview - types, use-cases, security and usabilty
Blockchain overview - types, use-cases, security and usabiltyBlockchain overview - types, use-cases, security and usabilty
Blockchain overview - types, use-cases, security and usabilty
Bozhidar Bozhanov
 
Електронна държава
Електронна държаваЕлектронна държава
Електронна държава
Bozhidar Bozhanov
 
Blockchain - what is it good for?
Blockchain - what is it good for?Blockchain - what is it good for?
Blockchain - what is it good for?
Bozhidar Bozhanov
 
Algorithmic and technological transparency
Algorithmic and technological transparencyAlgorithmic and technological transparency
Algorithmic and technological transparency
Bozhidar Bozhanov
 
Scaling horizontally on AWS
Scaling horizontally on AWSScaling horizontally on AWS
Scaling horizontally on AWS
Bozhidar Bozhanov
 
Alternatives for copyright protection online
Alternatives for copyright protection onlineAlternatives for copyright protection online
Alternatives for copyright protection online
Bozhidar Bozhanov
 
GDPR for developers
GDPR for developersGDPR for developers
GDPR for developers
Bozhidar Bozhanov
 
Политики, основани на данни
Политики, основани на данниПолитики, основани на данни
Политики, основани на данни
Bozhidar Bozhanov
 
Отворено законодателство
Отворено законодателствоОтворено законодателство
Отворено законодателство
Bozhidar Bozhanov
 
Overview of Message Queues
Overview of Message QueuesOverview of Message Queues
Overview of Message Queues
Bozhidar Bozhanov
 
Electronic governance steps in the right direction?
Electronic governance   steps in the right direction?Electronic governance   steps in the right direction?
Electronic governance steps in the right direction?
Bozhidar Bozhanov
 
Сигурност на електронното управление
Сигурност на електронното управлениеСигурност на електронното управление
Сигурност на електронното управление
Bozhidar Bozhanov
 
Opensource government
Opensource governmentOpensource government
Opensource government
Bozhidar Bozhanov
 
Биометрична идентификация
Биометрична идентификацияБиометрична идентификация
Биометрична идентификация
Bozhidar Bozhanov
 
Biometric identification
Biometric identificationBiometric identification
Biometric identification
Bozhidar Bozhanov
 
Регулации и технологии
Регулации и технологииРегулации и технологии
Регулации и технологии
Bozhidar Bozhanov
 
Откриване на фалшиви клетки за подслушване
Откриване на фалшиви клетки за подслушванеОткриване на фалшиви клетки за подслушване
Откриване на фалшиви клетки за подслушване
Bozhidar Bozhanov
 
Wiretap Detector - detecting cell-site simulators
Wiretap Detector - detecting cell-site simulatorsWiretap Detector - detecting cell-site simulators
Wiretap Detector - detecting cell-site simulators
Bozhidar Bozhanov
 
Антикорупционен софтуер
Антикорупционен софтуерАнтикорупционен софтуер
Антикорупционен софтуер
Bozhidar Bozhanov
 
Blockchain overview - types, use-cases, security and usabilty
Blockchain overview - types, use-cases, security and usabiltyBlockchain overview - types, use-cases, security and usabilty
Blockchain overview - types, use-cases, security and usabilty
Bozhidar Bozhanov
 
Електронна държава
Електронна държаваЕлектронна държава
Електронна държава
Bozhidar Bozhanov
 
Blockchain - what is it good for?
Blockchain - what is it good for?Blockchain - what is it good for?
Blockchain - what is it good for?
Bozhidar Bozhanov
 
Algorithmic and technological transparency
Algorithmic and technological transparencyAlgorithmic and technological transparency
Algorithmic and technological transparency
Bozhidar Bozhanov
 
Alternatives for copyright protection online
Alternatives for copyright protection onlineAlternatives for copyright protection online
Alternatives for copyright protection online
Bozhidar Bozhanov
 
Политики, основани на данни
Политики, основани на данниПолитики, основани на данни
Политики, основани на данни
Bozhidar Bozhanov
 
Отворено законодателство
Отворено законодателствоОтворено законодателство
Отворено законодателство
Bozhidar Bozhanov
 
Electronic governance steps in the right direction?
Electronic governance   steps in the right direction?Electronic governance   steps in the right direction?
Electronic governance steps in the right direction?
Bozhidar Bozhanov
 
Сигурност на електронното управление
Сигурност на електронното управлениеСигурност на електронното управление
Сигурност на електронното управление
Bozhidar Bozhanov
 
Биометрична идентификация
Биометрична идентификацияБиометрична идентификация
Биометрична идентификация
Bozhidar Bozhanov
 
Регулации и технологии
Регулации и технологииРегулации и технологии
Регулации и технологии
Bozhidar Bozhanov
 
Ad

Recently uploaded (20)

Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Image processinglab image processing image processing
Image processinglab image processing  image processingImage processinglab image processing  image processing
Image processinglab image processing image processing
RaghadHany
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Automation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From AnywhereAutomation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From Anywhere
Lynda Kane
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtBuckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Lynda Kane
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Image processinglab image processing image processing
Image processinglab image processing  image processingImage processinglab image processing  image processing
Image processinglab image processing image processing
RaghadHany
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Automation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From AnywhereAutomation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From Anywhere
Lynda Kane
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtBuckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Lynda Kane
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 

Elasticsearch - Scalability and Multitenancy

  • 2. ABOUT ME • Founder at LogSentinel, an information security startup • LogSentinel SIEM – product that indexes billions of logs with Elasticsearch • https://techblog.bozho.net • https://twitter.com/bozhobg
  • 3. SCALABILITY AND MULTITENANCY • Scalability – how to process millions (billions) of documents on multiple machines • Multitenancy – how to have our system support multiple users/organizations while segregating their data • One can exist without the other • Both are architectural and implementation tasks, not (just) work for Ops. • „We’ ll push the data in whatever form and Ops will take care of the scaling “
  • 4. ELASTICSEARCH BSICS • “You know, for search” • Indexing documents (document = anything) • Full-text search and keyword search • Allows for large clusters • Licensing issues
  • 5. USE-CASE: TIME-SERIES DATA • Indexing events (logs, metrics, etc.) • Wide-spread and widely applicable scenario • Documents almost always have a timestamp
  • 8. LIMITING FACTORS • One shard shouldn’t be to large • Ideally between 10 and 50 GB; otherwise recovery after failure may not work • The number of shards on a node is limited by RAM • Lucene segments are append-only • A large number of segments reduce performance
  • 9. MULTITENANCY • Cluster-per-tenant • Heavy for administrations • No real multitenancy • Expensive • Index-per-tenant • Also heave for administration • Doesn’t scale well • Tenant-based routing • Recommended in most cases
  • 10. TENANT-BASED ROUTING • _routing=<tenantId> or _routing=<tenantOwnedResourceId> • E.g.. userId or dataSourceId • Routing parameter designates which shard to be used for storing the document • _routing for search requests tells Elasticsearch where to look for the data => faster search • shard_num = hash(_routing) % num_primary_shards • mappings._routing.required: true
  • 11. STRUCTURE OF INDEXED DATA • One field can have only one type • The type is determined on index creation or on first indexed document with that field • User1 creates custom param “duration” of type String • User2 wants to create “duration” of a numeric type -> error • Solution: custom parameter hierarchies by type: params, numericParams, dateParams, …
  • 12. SCALABILITY • „We add more machines and it’s good“? • Recommended shard size (10-50 GB) • We can’t change shards on a running index • Lucene Segments are read-only: • Deleting a document = bad • Updating a document = bad
  • 13. OPTIONS FOR STRUCTURING INDEXES • We need a structure to allow indexing and searching in an arbitrarily large amount of data • One big, ever-growing index • Convenient for small amounts of data, but faces all scalability problems • Index-per-day / index-per-week / index-per-size • Index-per-day-per-retention • Rollover • Deletion should be done by deleting whole indexes, not individual documents
  • 14. MANY INDEXES FOR SEARCH, ONE FOR INDEXING • One search query can be directed to many indexes based on an index alias • Supporting one (or several) active indexes for ingesting documents • All other indexes– read-only • This solves the problem with: • Growing data and growing size of shards • Deleting old data
  • 15. EFFECTIVE INDEXING • In real time (problem: too many requests to Elasticsearch) • Storing in a database and indexing with a batch job • Message queue (complex to implement) (we use Kafka) • In-memory queue (might lose data) • Batch-indexing when a given size or time threshold is reached • Hybrid: bulk processing + database • Quick indexing with in-memory queue + subsequent check based on the data in the database • Avoid updates (=delete + insert)
  • 16. CONCLUSION • Elasticsearch is easy to get running • …and complex for scaling • Changes to a production setup are hard • We must not throw scalability and multitenancy tasks to the Ops teams – they are application problems • Elasticsearch internals impose unintuitive limitations (“The law of leaky abstractions”)
  • 18. RESOURCES • https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html • https://techblog.bozho.net/elasticsearch-multitenancy-with-routing/ • https://techblog.bozho.net/near-real-time-indexing-with-elasticsearch/ • https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing- speed.html • https://www.loggly.com/blog/nine-tips-configuring-elasticsearch-for-high-performance/ • https://tech.ebayinc.com/engineering/elasticsearch-performance-tuning-practice-at-ebay/