SlideShare a Scribd company logo
Is your Elasticsearch Cluster
Production Ready?
Itamar Syn-Hershko
http://code972.com | @synhershko
http://BigDataBoutique.co.il
Me?
http://bdbq.co.il
What does it take?
• Cluster deployed using best
practices
• Thorough monitoring
• Inspect. Fix. Repeat.
• Good capacity planning
• Memory management
• Indexing and sharding strategy
• Security
Cluster Topology
Master-eligible
nodes (3)
Data nodes
(sizing by data)
Client nodes, aka
coordinating nodes
(scalable, sizing by
traffic)
Deployments
• Prefer immutable images & scripted deployments
• For AWS see https://github.com/synhershko/elasticsearch-
cloud-deploy/
• GCP coming soon
Backups
• Very efficient
• Very important
• Several storages supported
• To a shared file system
• HDFS
• Azure / GCP / AWS repositories via plugins
What to monitor (on the cluster, per
host)?
• CPU load
• Memory utilization
• Heap utilization
• GC time
• Disk utilization
• Disk IOPs
• Merges
• Deleted docs
• Requests per sec (indexing, search)
• Load average < number of cores
• Network in / out
• Thread pool rejections
• Number of nodes
• Cache sizes
• Cache evictions
• Cluster state / health
• Number of shards per type
X-Pack monitoring (aka Marvel)
Grafana
dashboards
• More fine-grained, cluster-wide view
• Provided with metrics polling script (Python)
https://github.com/synhershko/elasticsearch-grafana-monitoring
Monitoring Destination
• To the same cluster
• To a different cluster (Recommended)
• External systems (e.g. graphite) – only if already in org
• X-Pack subscribers can now send metrics to Elastic Cloud
Typical garbage collection sawtooth
CPU
monitoring
Correlating metrics
• Shards on the same node have issues?
• During merges?
• CPU and GC
• HTTP traffic and indexing or search operations
Threadpools & Throughput
Boosting slow operations
• Search or Indexing heavy?
• Measure operations also from applications side!
• Slow searches
• Queries need optimization
• Scoring (not using filters)
• Numeric ranges pre-5
• Scripts
• Slow indexing
• Sharding strategy
• Use bulk indexing (optimize for 10-15MB of data, regardless of
number of documents / operations)
• Slow analyzers affects both! (e.g. n-grams)
Don’t use NGrams!
• Being used for “contains” search
• You ain’t gonna need it, use WordDelimiter Token Filter instead
• Useful for fuzzy search / auto-correction
• Best used via Elasticsearch’s Suggesters
• Useful for languages without spaces, or with compound
words
• min_gram , max_gram
Caches
• Query cache
• Request cache
• Measure evictions rate & cache usage
Memory Allocation
• ES_HEAP_SIZE
• DocValues used?
• Fielddata usage
• Query cache (for queries in filter context)
• Request cache (for aggregations and count queries)
• Never over 32GB!
• Default cache sizes not always fit usage
• Set appropriate static configs in elasticsearch.yml
• At least 50% of memory to file-system cache
• Usually more
Server Sizing
• Master nodes
• 1-2 cores, 2-4 GB memory, 50% ES_HEAP_SIZE
• Data nodes
• > 4 cores, measure and preserve disk/mem ratio (can start with
1/24)
• ES_HEAP_SIZE as per previous slide
• Client nodes
• CPU and network heavy, 4GB memory should be enough for most
use cases
Index Management Patterns
• A Monolith Index
• Search façade on top of your data
• Record linkage
• Anomaly detection
• Rolling indexes (time based events)
• Centralized logging
• Auditing
• IoT
logs-2016.11.20 logs-2016.11.21 logs-2016.11.22 logs-2016.11.23logs-2016.11.19
Optimal shard size
• Few millions in document size, for search performance
• A bit more if only doing aggregations
• 5-8GB on disk max, for startup times and network
reallocation
• doc_values are enabled by default, turn off for non-aggs fields to
save space
Sharding
• Index Shards
• Resharding / auto-sharding not supported
• Index-level sharding
• Avoid using types (deprecated > 6.x)
• Multi-tenancy
• Rollover API (> 5.x)
• Cluster level
• Cluster per project
• Cross-cluster search capability
Multitenancy
• Silos – Every tenant get their own index
• Index sizes vary
• Potentially wasting resources
• Pool – All tenants are in one big index
• Sharding isn’t dynamic
• Effects on tf/idf, aggregations, throughput
• Hybrid – Big tenants in their own index, pool(s) for small
ones
Use Explicit Mapping
(aka Avoid Schemaless)
• In one of two ways:
• Disable dynamic mapping in settings (index.mapper.dynamic: false). Will
refuse indexing.
• Create catch-all dynamic template with enabled:false mapping
• Why?
• Avoids hundreds of fields by mistake
• Saves effort on indexing and disk space
• Defaults are bad anyhow, don’t rely on them
• Prefer using index templates (especially for rolling indices)
Re-balancing is your enemy
• Lock down shard rebalancing
• cluster.routing.rebalance.enable
• none
• cluster.routing.allocation.enable
• primaries
• new_primaries
• none
More safe configs
• action.disable_delete_all_indices: true
• action.auto_create_index: false
Deep paging (don’t!)
• Don’t from-size
• search_after (> 5.x)
• Scroll and sliced-scroll (> 5.x)
• Not for normal operation
Deletions
• Deletions have an overhead
• Slow searches
• Segmentation
• More work on segment merging
• Non-exact tf/idf
• Every document update is a deletion
• No need to avoid it completely, just design accordingly
Geographic Distribution
• Never with the same cluster!
• Cross-cluster search (formerly Tribe Node)
• For geographic sharding
• Different indexes in different regions
• xDCR for HA / DR
• Can be solved by infra – replicating queues (Kafka), DBs
• Solution coming in X-Pack
Your ingestion architecture?
• Favor external ingestion, relieve Elastic from that responsibility
• Upgrade Logstash to 5.x
• Consider using FileBeat instead of logstash for log-tailing
• Prefer logstash machines over ingest nodes
• Use queues (Kafka, Redis) to protect against surges
Security
Protecting your cluster
• Don’t bind to a public IP
• Use only private IP/DNSs, preferably in subnets (e.g. AWS VPC)
• network.host in elasticsearch.yml
• Proxy all client requests to ES
• Disable HTTP where not needed
• + Don’t use default ports
• Secure publicly available client nodes
• Access via VPN only
• At the very least SSL + authentication if VPN not an option
• Disable dynamic scripting (pre-5.x)
Securing Indexes and Documents
• Heavy Kibana user?
• Authentication and authorization
• Index, Document and Field level security
• Requires X-Pack Security
• Application level authentication and authorization
• Application filtering of content (fields, documents)
• Index level (e.g. index per tenant)
• Document level (using permissions)
• Inter-node comms, encryption at rest (X-Pack only)
Upcoming in ES land
• Elasticsearch 6
• Machine Learning
• Anomaly detection on time series data
• Enterprise Cloud
• Elastic Cloud deployed on-premise
• Any plugin authors in the crowd?
Elasticsearch Training
Elasticsearch for Developers &
Maintaining Elasticsearch in Production
• September (10,11,17/9)
• November (12,13,16/11)
http://bdbq.co.il/courses
Consultancy and Development services
http://bdbq.co.il/services/elasticsearch
Questions?
@synhershko on social (Twitter, github, …)
Blog at http://code972.com
Training and consultancy at
http://BigDataBoutique.co.il
Ad

More Related Content

What's hot (19)

Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
ScyllaDB
 
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive QueryInteractive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Ashish Thapliyal
 
Five essential new enhancements in azure HDnsight
Five essential new enhancements in azure HDnsightFive essential new enhancements in azure HDnsight
Five essential new enhancements in azure HDnsight
Ashish Thapliyal
 
Webinar: Solr & Fusion for Big Data
Webinar: Solr & Fusion for Big DataWebinar: Solr & Fusion for Big Data
Webinar: Solr & Fusion for Big Data
Lucidworks
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
Niko Neugebauer
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
Avinash Ramineni
 
Move your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudMove your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in Cloud
CAMMS
 
Drupal performance
Drupal performanceDrupal performance
Drupal performance
Piyuesh Kumar
 
Persistent Storage for Containerized Applications
Persistent Storage for Containerized ApplicationsPersistent Storage for Containerized Applications
Persistent Storage for Containerized Applications
Colleen Corrice
 
Azure CosmosDB
Azure CosmosDBAzure CosmosDB
Azure CosmosDB
Fernando Mejía
 
Selecting the right persistent storage options for apps in containers Open So...
Selecting the right persistent storage options for apps in containers Open So...Selecting the right persistent storage options for apps in containers Open So...
Selecting the right persistent storage options for apps in containers Open So...
bipin kunal
 
Elasticsearch in production Boston Meetup October 2014
Elasticsearch in production Boston Meetup October 2014Elasticsearch in production Boston Meetup October 2014
Elasticsearch in production Boston Meetup October 2014
beiske
 
Azure CosmosDB the new frontier of big data and nosql
Azure CosmosDB the new frontier of big data and nosqlAzure CosmosDB the new frontier of big data and nosql
Azure CosmosDB the new frontier of big data and nosql
Riccardo Cappello
 
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Alluxio, Inc.
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
lucenerevolution
 
InfluxDB Internals
InfluxDB InternalsInfluxDB Internals
InfluxDB Internals
InfluxData
 
NoSQL benchmarking
NoSQL benchmarkingNoSQL benchmarking
NoSQL benchmarking
Prasoon Kumar
 
Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDB
Amar Das
 
Operationalizing MongoDB at AOL
Operationalizing MongoDB at AOLOperationalizing MongoDB at AOL
Operationalizing MongoDB at AOL
radiocats
 
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
ScyllaDB
 
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive QueryInteractive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Ashish Thapliyal
 
Five essential new enhancements in azure HDnsight
Five essential new enhancements in azure HDnsightFive essential new enhancements in azure HDnsight
Five essential new enhancements in azure HDnsight
Ashish Thapliyal
 
Webinar: Solr & Fusion for Big Data
Webinar: Solr & Fusion for Big DataWebinar: Solr & Fusion for Big Data
Webinar: Solr & Fusion for Big Data
Lucidworks
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
Niko Neugebauer
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
Avinash Ramineni
 
Move your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudMove your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in Cloud
CAMMS
 
Persistent Storage for Containerized Applications
Persistent Storage for Containerized ApplicationsPersistent Storage for Containerized Applications
Persistent Storage for Containerized Applications
Colleen Corrice
 
Selecting the right persistent storage options for apps in containers Open So...
Selecting the right persistent storage options for apps in containers Open So...Selecting the right persistent storage options for apps in containers Open So...
Selecting the right persistent storage options for apps in containers Open So...
bipin kunal
 
Elasticsearch in production Boston Meetup October 2014
Elasticsearch in production Boston Meetup October 2014Elasticsearch in production Boston Meetup October 2014
Elasticsearch in production Boston Meetup October 2014
beiske
 
Azure CosmosDB the new frontier of big data and nosql
Azure CosmosDB the new frontier of big data and nosqlAzure CosmosDB the new frontier of big data and nosql
Azure CosmosDB the new frontier of big data and nosql
Riccardo Cappello
 
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Alluxio, Inc.
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
lucenerevolution
 
InfluxDB Internals
InfluxDB InternalsInfluxDB Internals
InfluxDB Internals
InfluxData
 
Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDB
Amar Das
 
Operationalizing MongoDB at AOL
Operationalizing MongoDB at AOLOperationalizing MongoDB at AOL
Operationalizing MongoDB at AOL
radiocats
 

Similar to Is your Elastic Cluster Stable and Production Ready? (20)

Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
Joe Alex
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
acelyc1112009
 
Elasticsearch 5.0
Elasticsearch 5.0Elasticsearch 5.0
Elasticsearch 5.0
Matias Cascallares
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
John D Almon
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
Ceph Community
 
Apache Geode Meetup, London
Apache Geode Meetup, LondonApache Geode Meetup, London
Apache Geode Meetup, London
Apache Geode
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Cloudera, Inc.
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
hypertable
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Hazelcast 101
Hazelcast 101Hazelcast 101
Hazelcast 101
Emrah Kocaman
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
RahulBhole12
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Emprovise
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 
Chaptor 2- Big Data Processing in big data technologies
Chaptor 2- Big Data Processing in big data technologiesChaptor 2- Big Data Processing in big data technologies
Chaptor 2- Big Data Processing in big data technologies
GulbakshiDharmale
 
Casual mass parallel computing
Casual mass parallel computingCasual mass parallel computing
Casual mass parallel computing
aragozin
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
Joe Alex
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
acelyc1112009
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
John D Almon
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
Ceph Community
 
Apache Geode Meetup, London
Apache Geode Meetup, LondonApache Geode Meetup, London
Apache Geode Meetup, London
Apache Geode
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Cloudera, Inc.
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
hypertable
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
RahulBhole12
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Emprovise
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 
Chaptor 2- Big Data Processing in big data technologies
Chaptor 2- Big Data Processing in big data technologiesChaptor 2- Big Data Processing in big data technologies
Chaptor 2- Big Data Processing in big data technologies
GulbakshiDharmale
 
Casual mass parallel computing
Casual mass parallel computingCasual mass parallel computing
Casual mass parallel computing
aragozin
 
Ad

More from DoiT International (19)

Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules Restructured
DoiT International
 
GAN training with Tensorflow and Tensor Cores
GAN training with Tensorflow and Tensor CoresGAN training with Tensorflow and Tensor Cores
GAN training with Tensorflow and Tensor Cores
DoiT International
 
Orchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsOrchestrating Redis & K8s Operators
Orchestrating Redis & K8s Operators
DoiT International
 
K8s best practices from the field!
K8s best practices from the field!K8s best practices from the field!
K8s best practices from the field!
DoiT International
 
An Open-Source Platform to Connect, Manage, and Secure Microservices
An Open-Source Platform to Connect, Manage, and Secure MicroservicesAn Open-Source Platform to Connect, Manage, and Secure Microservices
An Open-Source Platform to Connect, Manage, and Secure Microservices
DoiT International
 
Applying ML for Log Analysis
Applying ML for Log AnalysisApplying ML for Log Analysis
Applying ML for Log Analysis
DoiT International
 
GCP for AWS Professionals
GCP for AWS ProfessionalsGCP for AWS Professionals
GCP for AWS Professionals
DoiT International
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
AWS Cyber Security Best Practices
AWS Cyber Security Best PracticesAWS Cyber Security Best Practices
AWS Cyber Security Best Practices
DoiT International
 
Google Cloud Spanner Preview
Google Cloud Spanner PreviewGoogle Cloud Spanner Preview
Google Cloud Spanner Preview
DoiT International
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On Workshop
DoiT International
 
AWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL QueriesAWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL Queries
DoiT International
 
Google BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewGoogle BigQuery 101 & What’s New
Google BigQuery 101 & What’s New
DoiT International
 
Running Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWSRunning Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWS
DoiT International
 
Scaling Jenkins with Kubernetes by Ami Mahloof
Scaling Jenkins with Kubernetes by Ami MahloofScaling Jenkins with Kubernetes by Ami Mahloof
Scaling Jenkins with Kubernetes by Ami Mahloof
DoiT International
 
CI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar DemriCI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar Demri
DoiT International
 
Kubernetes @ Nanit by Chen Fisher
Kubernetes @ Nanit by Chen FisherKubernetes @ Nanit by Chen Fisher
Kubernetes @ Nanit by Chen Fisher
DoiT International
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)
DoiT International
 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules Restructured
DoiT International
 
GAN training with Tensorflow and Tensor Cores
GAN training with Tensorflow and Tensor CoresGAN training with Tensorflow and Tensor Cores
GAN training with Tensorflow and Tensor Cores
DoiT International
 
Orchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsOrchestrating Redis & K8s Operators
Orchestrating Redis & K8s Operators
DoiT International
 
K8s best practices from the field!
K8s best practices from the field!K8s best practices from the field!
K8s best practices from the field!
DoiT International
 
An Open-Source Platform to Connect, Manage, and Secure Microservices
An Open-Source Platform to Connect, Manage, and Secure MicroservicesAn Open-Source Platform to Connect, Manage, and Secure Microservices
An Open-Source Platform to Connect, Manage, and Secure Microservices
DoiT International
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
AWS Cyber Security Best Practices
AWS Cyber Security Best PracticesAWS Cyber Security Best Practices
AWS Cyber Security Best Practices
DoiT International
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On Workshop
DoiT International
 
AWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL QueriesAWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL Queries
DoiT International
 
Google BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewGoogle BigQuery 101 & What’s New
Google BigQuery 101 & What’s New
DoiT International
 
Running Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWSRunning Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWS
DoiT International
 
Scaling Jenkins with Kubernetes by Ami Mahloof
Scaling Jenkins with Kubernetes by Ami MahloofScaling Jenkins with Kubernetes by Ami Mahloof
Scaling Jenkins with Kubernetes by Ami Mahloof
DoiT International
 
CI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar DemriCI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar Demri
DoiT International
 
Kubernetes @ Nanit by Chen Fisher
Kubernetes @ Nanit by Chen FisherKubernetes @ Nanit by Chen Fisher
Kubernetes @ Nanit by Chen Fisher
DoiT International
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)
DoiT International
 
Ad

Recently uploaded (19)

APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC
 
White and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptxWhite and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptx
canumatown
 
DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)
APNIC
 
OSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description fOSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description f
cbr49917
 
highend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptxhighend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptx
elhadjcheikhdiop
 
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
DataProvider1
 
Determining Glass is mechanical textile
Determining  Glass is mechanical textileDetermining  Glass is mechanical textile
Determining Glass is mechanical textile
Azizul Hakim
 
Perguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolhaPerguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolha
socaslev
 
project_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptxproject_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptx
redzuriel13
 
5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx
andani26
 
IT Services Workflow From Request to Resolution
IT Services Workflow From Request to ResolutionIT Services Workflow From Request to Resolution
IT Services Workflow From Request to Resolution
mzmziiskd
 
Understanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep WebUnderstanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep Web
nabilajabin35
 
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHostingTop Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
steve198109
 
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry SweetserAPNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC
 
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 SupportReliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
steve198109
 
(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security
aluacharya169
 
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation TemplateSmart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
yojeari421237
 
Computers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers NetworksComputers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers Networks
Tito208863
 
Best web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you businessBest web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you business
steve198109
 
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC
 
White and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptxWhite and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptx
canumatown
 
DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)
APNIC
 
OSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description fOSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description f
cbr49917
 
highend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptxhighend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptx
elhadjcheikhdiop
 
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
DataProvider1
 
Determining Glass is mechanical textile
Determining  Glass is mechanical textileDetermining  Glass is mechanical textile
Determining Glass is mechanical textile
Azizul Hakim
 
Perguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolhaPerguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolha
socaslev
 
project_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptxproject_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptx
redzuriel13
 
5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx
andani26
 
IT Services Workflow From Request to Resolution
IT Services Workflow From Request to ResolutionIT Services Workflow From Request to Resolution
IT Services Workflow From Request to Resolution
mzmziiskd
 
Understanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep WebUnderstanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep Web
nabilajabin35
 
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHostingTop Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
steve198109
 
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry SweetserAPNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC
 
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 SupportReliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
steve198109
 
(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security
aluacharya169
 
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation TemplateSmart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
yojeari421237
 
Computers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers NetworksComputers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers Networks
Tito208863
 
Best web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you businessBest web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you business
steve198109
 

Is your Elastic Cluster Stable and Production Ready?

  • 1. Is your Elasticsearch Cluster Production Ready? Itamar Syn-Hershko http://code972.com | @synhershko http://BigDataBoutique.co.il
  • 3. What does it take? • Cluster deployed using best practices • Thorough monitoring • Inspect. Fix. Repeat. • Good capacity planning • Memory management • Indexing and sharding strategy • Security
  • 4. Cluster Topology Master-eligible nodes (3) Data nodes (sizing by data) Client nodes, aka coordinating nodes (scalable, sizing by traffic)
  • 5. Deployments • Prefer immutable images & scripted deployments • For AWS see https://github.com/synhershko/elasticsearch- cloud-deploy/ • GCP coming soon
  • 6. Backups • Very efficient • Very important • Several storages supported • To a shared file system • HDFS • Azure / GCP / AWS repositories via plugins
  • 7. What to monitor (on the cluster, per host)? • CPU load • Memory utilization • Heap utilization • GC time • Disk utilization • Disk IOPs • Merges • Deleted docs • Requests per sec (indexing, search) • Load average < number of cores • Network in / out • Thread pool rejections • Number of nodes • Cache sizes • Cache evictions • Cluster state / health • Number of shards per type
  • 9. Grafana dashboards • More fine-grained, cluster-wide view • Provided with metrics polling script (Python) https://github.com/synhershko/elasticsearch-grafana-monitoring
  • 10. Monitoring Destination • To the same cluster • To a different cluster (Recommended) • External systems (e.g. graphite) – only if already in org • X-Pack subscribers can now send metrics to Elastic Cloud
  • 13. Correlating metrics • Shards on the same node have issues? • During merges? • CPU and GC • HTTP traffic and indexing or search operations
  • 15. Boosting slow operations • Search or Indexing heavy? • Measure operations also from applications side! • Slow searches • Queries need optimization • Scoring (not using filters) • Numeric ranges pre-5 • Scripts • Slow indexing • Sharding strategy • Use bulk indexing (optimize for 10-15MB of data, regardless of number of documents / operations) • Slow analyzers affects both! (e.g. n-grams)
  • 16. Don’t use NGrams! • Being used for “contains” search • You ain’t gonna need it, use WordDelimiter Token Filter instead • Useful for fuzzy search / auto-correction • Best used via Elasticsearch’s Suggesters • Useful for languages without spaces, or with compound words • min_gram , max_gram
  • 17. Caches • Query cache • Request cache • Measure evictions rate & cache usage
  • 18. Memory Allocation • ES_HEAP_SIZE • DocValues used? • Fielddata usage • Query cache (for queries in filter context) • Request cache (for aggregations and count queries) • Never over 32GB! • Default cache sizes not always fit usage • Set appropriate static configs in elasticsearch.yml • At least 50% of memory to file-system cache • Usually more
  • 19. Server Sizing • Master nodes • 1-2 cores, 2-4 GB memory, 50% ES_HEAP_SIZE • Data nodes • > 4 cores, measure and preserve disk/mem ratio (can start with 1/24) • ES_HEAP_SIZE as per previous slide • Client nodes • CPU and network heavy, 4GB memory should be enough for most use cases
  • 20. Index Management Patterns • A Monolith Index • Search façade on top of your data • Record linkage • Anomaly detection • Rolling indexes (time based events) • Centralized logging • Auditing • IoT logs-2016.11.20 logs-2016.11.21 logs-2016.11.22 logs-2016.11.23logs-2016.11.19
  • 21. Optimal shard size • Few millions in document size, for search performance • A bit more if only doing aggregations • 5-8GB on disk max, for startup times and network reallocation • doc_values are enabled by default, turn off for non-aggs fields to save space
  • 22. Sharding • Index Shards • Resharding / auto-sharding not supported • Index-level sharding • Avoid using types (deprecated > 6.x) • Multi-tenancy • Rollover API (> 5.x) • Cluster level • Cluster per project • Cross-cluster search capability
  • 23. Multitenancy • Silos – Every tenant get their own index • Index sizes vary • Potentially wasting resources • Pool – All tenants are in one big index • Sharding isn’t dynamic • Effects on tf/idf, aggregations, throughput • Hybrid – Big tenants in their own index, pool(s) for small ones
  • 24. Use Explicit Mapping (aka Avoid Schemaless) • In one of two ways: • Disable dynamic mapping in settings (index.mapper.dynamic: false). Will refuse indexing. • Create catch-all dynamic template with enabled:false mapping • Why? • Avoids hundreds of fields by mistake • Saves effort on indexing and disk space • Defaults are bad anyhow, don’t rely on them • Prefer using index templates (especially for rolling indices)
  • 25. Re-balancing is your enemy • Lock down shard rebalancing • cluster.routing.rebalance.enable • none • cluster.routing.allocation.enable • primaries • new_primaries • none
  • 26. More safe configs • action.disable_delete_all_indices: true • action.auto_create_index: false
  • 27. Deep paging (don’t!) • Don’t from-size • search_after (> 5.x) • Scroll and sliced-scroll (> 5.x) • Not for normal operation
  • 28. Deletions • Deletions have an overhead • Slow searches • Segmentation • More work on segment merging • Non-exact tf/idf • Every document update is a deletion • No need to avoid it completely, just design accordingly
  • 29. Geographic Distribution • Never with the same cluster! • Cross-cluster search (formerly Tribe Node) • For geographic sharding • Different indexes in different regions • xDCR for HA / DR • Can be solved by infra – replicating queues (Kafka), DBs • Solution coming in X-Pack
  • 30. Your ingestion architecture? • Favor external ingestion, relieve Elastic from that responsibility • Upgrade Logstash to 5.x • Consider using FileBeat instead of logstash for log-tailing • Prefer logstash machines over ingest nodes • Use queues (Kafka, Redis) to protect against surges
  • 32. Protecting your cluster • Don’t bind to a public IP • Use only private IP/DNSs, preferably in subnets (e.g. AWS VPC) • network.host in elasticsearch.yml • Proxy all client requests to ES • Disable HTTP where not needed • + Don’t use default ports • Secure publicly available client nodes • Access via VPN only • At the very least SSL + authentication if VPN not an option • Disable dynamic scripting (pre-5.x)
  • 33. Securing Indexes and Documents • Heavy Kibana user? • Authentication and authorization • Index, Document and Field level security • Requires X-Pack Security • Application level authentication and authorization • Application filtering of content (fields, documents) • Index level (e.g. index per tenant) • Document level (using permissions) • Inter-node comms, encryption at rest (X-Pack only)
  • 34. Upcoming in ES land • Elasticsearch 6 • Machine Learning • Anomaly detection on time series data • Enterprise Cloud • Elastic Cloud deployed on-premise • Any plugin authors in the crowd?
  • 35. Elasticsearch Training Elasticsearch for Developers & Maintaining Elasticsearch in Production • September (10,11,17/9) • November (12,13,16/11) http://bdbq.co.il/courses Consultancy and Development services http://bdbq.co.il/services/elasticsearch
  • 36. Questions? @synhershko on social (Twitter, github, …) Blog at http://code972.com Training and consultancy at http://BigDataBoutique.co.il