Elastic meetup june16

1
Miguel Bosin
Support Engineer, @miguelbosin
Hot/Warm
Architecture + Sizing

2
Intro
Int
• Miguel Bosin
– Support engineer
– Joined in 2015
– Interested in techonology
– Passion about support
• Elastic
– Founded in 2012
– Distributed company
– Elasticsearch: What’s it?
– Open-source:
ES,LS,Kibana and Beats
– Commercial:
X-Pack

3
Intro
• Miguel Bosin
– Support engineer
– Joined in 2015
– Interested in techonology
– Passion about support
• Elastic
– Founded in 2012
– Distributed company
– Elasticsearch: What’s it
– Open-source:
ES,LS,Kibana and Beats
– Commercial:
X-Pack

4
What is it?
 Open source
 Distributed-scalable
 Highly available
 Document-oriented (JSON)
 RESTful
 FT search engine with real-
time search and analytics
capabilities

5
Agenda
Elastic overview1
Sizing introduction3
Hot/Warm architecture4
Elasticsearch basic architecture2

6
Elastic current’s products overview

7
Agenda
Elastic overview
Sizing introduction3
Elasticsearch basic architecture
1
2

8
Elasticsearch terminology
 A node is a single Elasticsearch instance, a single JVM
 Multiple nodes can form a cluster
 A cluster or a node can manage multiple indices
 An index is a container for data
 A shard is a single piece of an Elasticsearch index
 A shard is either a primary or a replica

9
Elasticsearch terminology II

10
Elasticsearch terminology III

11
Elasticsearch Architecture: Node roles
Master node:
 coordinates the cluster
 only node able to apply changes to cluster state
 publishes updated cluster state to all nodes
Data node:
 performs indexing
 can allocate shards locally
 knows cluster state

12
Elasticsearch Architecture: Node roles II
Client node:
 does NOT perform indexing or allocate shards locally
 does NOT perform cluster management operations
 knows cluster state
 smart load balancer (load balancing Kibana searches i.e.)
 redirect operations to the nodes that holds the relevant data
 calculate aggregations results

13
Nodes roles are set in the elasticsearch.yml
Elasticsearch Architecture: Node roles III

16
Architecture special case: dedicated master nodes

17
Dedicated master nodes –Why / minimum_master_nodes
 Indexing and searching data is CPU-, memory-, and I/O-intensive work which can
put pressure on a node’s resources
 Avoiding split brain: 2 current master nodes on the same cluster DATA LOSS
 Set this setting discovery.zen.minimum_master_nodes to the quorum:
(master_eligible_nodes / 2) + 1

18
Agenda
Elastic overview
Sizing introduction
1
3
2

19
Sizing: general factors (server capacity)
• Disks (SSD vs. HD)
• RAM
-1/2 total RAM for ES
-ES heap size max: 30.5Gb
• # CPU cores
-ES threadpools concept
**1 shard—>gets 1 thread—>1 java process—>1core**

20
Sizing: Elasticsearch factors (logging case)
 Size of shards
 Number of shards on each node
 Retention period of data
 Mapping configuration
 -Which fields are searchable, _source enabled or
not,etc…
 Size (average) of the documents

21
Sizing: Capacity planning test I
 FIRST: testing on a single node with a single index with one shard
and no replica
 THEN: insert as many documents as you can and run some typical
queries
 At some point, queries will start to slow down to a threshold, which
no longer meet your requirements
 This is the ideal number of documents a single shard is able to
hold
 NEXT: Find the ideal number your primary shards (by dividing your
dataset size by the ideal shard size)
 FINALLY: Add replicas for HA and improve the read throughput

22
Sizing: Capacity planning test II
Each experiment tries to accomplish a discreet goal and build upon previous
22
Determine various
disk utilization
1 2 3 4
Determine breaking
point of a shard
Determine
saturation point of a
node
Test desired
configuration on
two node cluster

23
Agenda
Elastic overview
Sizing introduction
Hot/Warm architecture
3
1
2
4

24
Hot / Warm architecture
When using it?
 Elasticsearch for larger time-data analytics use cases
 Using time-based indices
 Able to run an architecture with 3 different types of nodes

25
Hot / Warm architecture: Type of nodes
Master, Hot and Warm nodes:
 Master nodes: 3 dedicated master nodes
 Hot data nodes: perform all indexing and also hold the most recent daily
(data to be queried most frequently). Powerful machines with SSD storage
 Warm data nodes: handle a large amount of read-only indices that are not
queried frequently. Very large attached spinning disks

26
Hot / Warm architecture: tagging
Which node is doing what?
 ES needs to know which servers contain the hot nodes and which servers
contain the warm nodes
 This can be achieved by assigning arbitrary tags to each server (Hot/Warm)
 Tag the node with node.box_type: xxx in elasticsearch.yml
 OR start a node using ./bin/elasticsearch --node.box_type xxx

27
Hot / Warm architecture: Force Merge API
Optimizing your indices in the Warm Node
 The force merge API allows to force merging of one or more indices
through an API. Optimizes the index for faster search operation
 The merge relates to the number of segments a Lucene index holds within
each shard
 The force merge operation allows to reduce the number of segments by
merging them:
$ curl -XPOST 'http://localhost:9200/my_index/_forcemerge'

28
Hot / Warm architecture: Demo time!!
DEMO

Elastic meetup june16

Recommended

More Related Content

What's hot (20)

Viewers also liked (14)

Similar to Elastic meetup june16 (20)

Recently uploaded (20)

Elastic meetup june16