SlideShare a Scribd company logo
Montreal Elasticsearch Meetup
Loïc Bertron
Director of Research & Development @Cedrom-SNI
!
Working on Big Data for Cedrom-SNI : social media, tv & radio aggregation
Introduced Elasticsearch at Cedrom-Sni
!
Cedrom-Sni
!
10k+ different sources, 750k+ new docs/days
Our job : Ingesting, enriching, extracting analytics and intelligence from docs
loic.bertron@cedrom-sni.com
linkedin.com/in/loicbertron
@loicbertron
Who am I ?
ElasticSearch is offering advanced search features to any application or
website easily, scaling on a large amount of data.
«
»
ElasticSearch
Simple : Plug & Play - Schema free - RESTful API
!
Elastic : Automatically discover all others instances
!
Strong : Replication & Load balancing - Scales massively - Lucene based
!
Fast : Requests executed in parallel - Real Time
!
Full featured : Search, Analytics, Facets, Percolator, Geo search, Suggest, Plugins …
What is ElasticSearch ?
Document as JSON
• Object representing your data
• Grouped in an index
• One index can have multiples types of documents
{
    "message": "Introducing #ElasticSearch",
"post_date": "2014-03-12T18:30:00",
    "author": {
"first_name" : "Loïc",
"email" : "loic.bertron@cedrom-sni.com"
},
"employee_at_Cedrom" : true,
"Tags" : ["Meetup","Montreal"]
}
• API REST : http://host:port/[index]/[type]/[_action/id]

HTTP Methods: GET, POST, PUT, DELETE
• Documents
• http://node1:9200/twitter/tweet/1 (POST)
• http://node1:9200/twitter/tweet/1 (GET)
• http://node1:9200/twitter/tweet/1 (DELETE)
• Search
• http://node1:9200/twitter/tweet/_search (GET)
• http://node1:9200/twitter/_search (GET)
• http://node1:9200/_search (GET)
• Metadata
• http://node1:9200/twitter/_status (GET)
• http://node1:9200/_shutdown (POST)
API
Index a document
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:30:00",
    "message": "Introducing #ElasticSearch"
}'
{
"ok":true,
"_index":"twitter",
"_type":"tweet",
"_id":"1"
"_version":"1"
}
Index a document
Update a document
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:40:00",
    "message": "Introducing #ElasticSearch to the #Community"
}'
{
"ok":true,
"_index":"twitter",
"_type":"tweet",
"_id":"1"
"_version":"2"
}
Update a document
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "query": {
    "term": { "message": "ElasticSearch" }
}
}'
Search for documents
$ curl -XGET http://node1:9200/twitter/tweet/_search?q=elasticsearch
Search for documents
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hits" : {
"total" : 1,
"max_score" : 0.227,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_score" : 0.227, "_source" : {
"user": "loicbertron",
    "post_date": "2014-03-12T18:40:00",
    "message": "Introducing #ElasticSearch to the #Community"
}
} ]
}
}
Search for documents
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hits" : {
"total" : 1,
"max_score" : 0.227,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_score" : 0.227, "_source" : {
"user": "loicbertron",
    "post_date": "2014-03-12T18:40:00",
    "message": "Introducing #ElasticSearch to the #Community"
}
} ]
}
}
Execution
time
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hits" : {
"total" : 1,
"max_score" : 0.227,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_score" : 0.227, "_source" : {
"user": "loicbertron",
    "post_date": "2014-03-12T18:40:00",
    "message": "Introducing #ElasticSearch to the #Community"
}
} ]
}
}
# of documents
matching
Search for documents
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hits" : {
"total" : 1,
"max_score" : 0.227,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_score" : 0.227, "_source" : {
"user": "loicbertron",
    "post_date": "2014-03-12T18:40:00",
    "message": "Introducing #ElasticSearch to the #Community"
}
} ]
}
}
Infos
Search for documents
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hits" : {
"total" : 1,
"max_score" : 0.227,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_score" : 0.227, "_source" : {
"user": "loicbertron",
    "post_date": "2014-03-12T18:40:00",
    "message": "Introducing #ElasticSearch to the #Community"
}
} ]
}
}
Score
Search for documents
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hits" : {
"total" : 1,
"max_score" : 0.227,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_score" : 0.227, "_source" : {
"user": "loicbertron",
    "post_date": "2014-03-12T18:40:00",
    "message": "Introducing #ElasticSearch to the #Community"
}
} ]
}
}
Document
Search for documents
Search operand
Terms quebec
quebec ontario
Phrases "city of montréal"
Proximity "montreal collusion" ~5
Fuzzy schwarzenegger ~0.8
Wildcards queb*
Boosting Quebec^5 montreal
Range [2011/03/12 TO 2014/03/12]
[java to json]
Boolean quebec AND NOT montreal
+quebec -montreal
(quebec OR ottawa) AND NOT toronto
Fields title:montreal^10 OR body:montreal
$ curl -XGET http://node1:9200/twitter/tweet/_search?q=<Your Query>
$ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{
"query": {
"filtered" : {
"query" : {
"bool" : {
!
"must" : {
"match" : {
"author.first_name" : {
"query" : "loic",
"fuzziness" : 0.1
}
}
},
!
"must" : {
"multi_match" : {
"query" : "elasticsearch",
"fields" : ["title^10","body"]
}
}
}
},
!
"filter": {
"and" : [
{"terms" : { "tags" : ["search","scale","store"] } },
{"range" : { "created_at" : {"from": "2013" } } } ,
{"term": { "featured" : true } }
]
}
}
}
}’
Query DSL
$ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{
"query": {
"filtered" : {
"query" : {
"bool" : {
!
"must" : {
"match" : {
"author.first_name" : {
"query" : "loic",
"fuzziness" : 0.1
}
}
},
!
"must" : {
"multi_match" : {
"query" : "elasticsearch",
"fields" : ["title^10","body"]
}
}
}
},
!
"filter": {
"and" : [
{"terms" : { "tags" : ["search","scale","store"] } },
{"range" : { "created_at" : {"from": "2013" } } } ,
{"term": { "featured" : true } }
]
}
}
}
}’
Query DSL
"must" : {
"match" : {
"author.first_name" : {
"query" : "loic",
"fuzziness" : 0.1
}
}
$ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{
"query": {
"filtered" : {
"query" : {
"bool" : {
!
"must" : {
"match" : {
"author.first_name" : {
"query" : "loic",
"fuzziness" : 0.1
}
}
},
!
"must" : {
"multi_match" : {
"query" : "elasticsearch",
"fields" : ["title^10","body"]
}
}
}
},
!
"filter": {
"and" : [
{"terms" : { "tags" : ["search","scale","store"] } },
{"range" : { "created_at" : {"from": "2013" } } } ,
{"term": { "featured" : true } }
]
}
}
}
}’
Query DSL
"must" : {
"multi_match" : {
"query" : "elasticsearch",
"fields" : ["title^10","body"]
}
}
$ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{
"query": {
"filtered" : {
"query" : {
"bool" : {
!
"must" : {
"match" : {
"author.first_name" : {
"query" : "loic",
"fuzziness" : 0.1
}
}
},
!
"must" : {
"multi_match" : {
"query" : "elasticsearch",
"fields" : ["title^10","body"]
}
}
}
},
!
"filter": {
"and" : [
{"terms" : { "tags" : ["search","scale","store"] } },
{"range" : { "created_at" : {"from": "2013" } } } ,
{"term": { "featured" : true } }
]
}
}
}
}’
Query DSL
"filter": {
"and" : [
{"terms" : { "tags" : ["search","scale","store"] } },
{"range" : { "created_at" : {"from": "2013" } } } ,
{"term": { "featured" : true } }
]
}
Facets
Ranges
Term
Term
Ranges
Facets
$ curl -XPOST http://node1:9200/articles/_search -d '{
    "aggregations" : {
"tag_cloud" : { "terms" : {"field" : "tags"} }
}
}'
Tag Cloud
"aggregations" : {
"tag_cloud" :[
{"terms": "Quebec", "count" : 5},
{"terms": "Montréal", "count" : 3},
...
]
}
$ curl -XPOST http://node1:9200/students/_search?search_type=count -d '{
    "facets": {
"scores-per-subject" : {
"terms_stats" : {
"key_field" : "subject",
"value_field" : "score"
}
}
}
}'
Stats
"facets" : {
"scores-per-subject" : {
"_type" : "terms_stats",
"missing" : 0,
"terms" : [ {
"term" : "math",
"count" : 4,
"total_count" : 4,
"min" : 25.0,
"max" : 92.0,
"total" : 267.0,
"mean" : 66.75
}, […]
}
}
Advanced facets : Aggregations
{
"rank": "21",
"city": "Boston",
"state": "MA",
"population2012": "636479",
"population2010": "617594",
"land_area": "48.277",
"density": "12793",
"ansi": "619463",
"location": {
"lat": "42.332",
"lon": "71.0202"
}
}
curl -XGET "node1:9200/cities/_search?pretty" -d '{
"aggs" : {
"mean_density_by_state" : {
"terms" : {
"field" : "state"
},
"aggs": {
"mean_density": {
"avg" : {
"field" : "density"
}
}
}
}
}
}'
Advanced facets : Aggregations
"aggregations" : {
"mean_density_by_state" : {
"terms" : [ {
"term" : "CA",
"doc_count" : 69,
"mean_density" : {
"value" : 5558.623188405797
}
}, {
"term" : "TX",
"doc_count" : 32,
"mean_density" : {
"value" : 2496.625
}
}, {
"term" : "FL",
"doc_count" : 20,
"mean_density" : {
"value" : 4006.6
}
}, {
"term" : "CO",
"doc_count" : 11,
Advanced facets : Aggregations
Ranges
Term
Facets
Facets
Terms
Terms Stats
Statistical
Range
Histogram
Date Histogram
Filter
Query
Geo Distance
Noeud 1
Cluster
État du cluster : Vert
Node 1
Cluster
Shard 0
Shard 1
cluster state : Yellow
Architecture
$ curl -XPUT localhost:9200/twitter -d '{
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 1
}
}'
Noeud 1
Cluster
État du cluster : Vert
Noeud 1
Cluster
Shard 0
Shard 1
État du cluster : Jaune
Node 1
Cluster
Shard 0
Shard 1
cluster state : Green
Node 2
Shard 0
Shard 1
adding a second node
Architecture
Node 1
Cluster
Shard 0
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:30:00",
    "message": "Introducing #ElasticSearch"
}'
Architecture
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
Doc 1
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:30:00",
    "message": "Introducing #ElasticSearch"
}'
Architecture
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
Doc 1
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:30:00",
    "message": "Introducing #ElasticSearch"
}'
Architecture
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:30:00",
    "message": "Introducing #ElasticSearch"
}'
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
{
"ok":true,
"_index":"twitter",
"_type":"tweet",
"_id":"1"
"_version":"1"
}
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Architecture
Node 1 Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:45:00",
    "message": "The crowd is on fire #ElasticSearch"
}'
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2
Architecture
Node 1 Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:45:00",
    "message": "The crowd is on fire #ElasticSearch"
}'
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2
Architecture
Node 1 Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:45:00",
    "message": "The crowd is on fire #ElasticSearch"
}'
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
Architecture
Node 1 Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:45:00",
    "message": "The crowd is on fire #ElasticSearch"
}'
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
{
"ok":true,
"_index":"twitter",
"_type":"tweet",
"_id":"2"
"_version":"1"
}
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "query": {
    "term": { "message": "ElasticSearch" }
}
}'
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "query": {
    "term": { "message": "ElasticSearch" }
}
}'
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "query": {
    "term": { "message": "ElasticSearch" }
}
}'
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1
Doc 1
Doc 2
Doc 2
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "query": {
    "term": { "message": "ElasticSearch" }
}
}'
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1
Doc 1
Doc 2
Doc 2
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "query": {
    "term": { "message": "ElasticSearch" }
}
}'
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 1Shard 1
Shard 0
Doc 1
Doc 2 Doc 2
Architecture
Node 2 Node 3 Node 4
Cluster
Shard 1
Node 2
Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Architecture
Node 3 Node 4
Shard 0
Doc 1
Cluster
Shard 1
Node 2
Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Architecture
Node 3 Node 4
Shard 0
Doc 1
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Architecture
Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T19:00:00",
    "message": "A third message about #ElasticSearch"
}'
Shard 0
Doc 1
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Doc 3
Architecture
Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T19:00:00",
    "message": "A third message about #ElasticSearch"
}'
Shard 0
Doc 1
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Doc 3
Architecture
Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T19:00:00",
    "message": "A third message about #ElasticSearch"
}'
Shard 0
Doc 1
Doc 3
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Doc 3
{
"ok":true,
"_index":"twitter",
"_type":"tweet",
"_id":"3"
"_version":"1"
}
Architecture
Node 2 Node 3 Node 4
Shard 0
Doc 1
Doc 3
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Doc 3
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "query": {
    "term": { "message": "ElasticSearch" }
}
}'
Architecture
Node 2 Node 3 Node 4
Shard 0
Doc 1
Doc 3
Cluster
Shard 1Shard 1
Doc 2
Doc 2
Shard 0
Doc 1Doc 3
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "query": {
    "term": { "message": "ElasticSearch" }
}
}'
Architecture
Node 2 Node 3 Node 4
Shard 0
Doc 1
Doc 3
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Architecture
Node 2 Node 4
How users see search ?
ResultUser Query List of results
How search engine works?
1. Fetch document field
2. Pick configured anlyser
3. Parse text inot tokens
4. Apply token filters
5. Store into index
Analyzer
curl -XGET "http://localhost:9200/docs/_analyze?
analyzer=standard&pretty=1" -d "Édith Piaf vedette du feu d'artifice"
Analyzer
{
"tokens" : [ {
"token" : "édith",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "piaf",
"start_offset" : 6,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "vedette",
"start_offset" : 11,
"end_offset" : 18,
"type" : "<ALPHANUM>",
"position" : 3
}, {
"token" : "du",
"start_offset" : 19,
"end_offset" : 21,
"type" : "<ALPHANUM>",
"position" : 4
}, {
"token" : "feu",
"start_offset" : 22,
"end_offset" : 25,
"type" : "<ALPHANUM>",
"position" : 5
}, {
"token" : "d'artifice",
"start_offset" : 26,
"end_offset" : 36,
"type" : "<ALPHANUM>",
"position" : 6
} ]
}
composed of a single tokenizer and zero or more filters
Analyzer
Cutting out a string of words & transforming :
!
Whitespace tokenizer :
«Édith piaf» -> «Édith», «Piaf»
!
Standard tokenizer :
«Édith piaf!» -> «édith», «piaf»
Tokenizer
Modify, delete or add tokens
!
Asciifolding filter :
«Édith Piaf» -> «Edith Piaf»
!
Stemmer filter (english) :
«stemming» -> «stem»
«fishing», «fished», «fisher» -> «fish»
«cats,catlike» -> «cat»
!
Phonetic :
«quick» -> «Q200»
«quik» -> «Q200»
!
Edge nGram :
«Montreal» -> [«Mon», «Mont», «Montr»]
Filters
Analyzer
{
"tokens" : [ {
"token" : "edith",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "piaf",
"start_offset" : 6,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "vedet",
"start_offset" : 11,
"end_offset" : 18,
"type" : "<ALPHANUM>",
"position" : 3
}, {
"token" : "feu",
"start_offset" : 22,
"end_offset" : 25,
"type" : "<ALPHANUM>",
"position" : 5
},
!
!
{
"token" : "artific",
"start_offset" : 26,
"end_offset" : 36,
"type" : "<ALPHANUM>",
"position" : 6
} ]
}
1.Documents get indexed
2.I come back often on the search page to run my request
3.I hope that my document will be well ranked to be on top of the results page
4.if not, i won’t never see my document
Regular search engine usage
1. Register my query
2. When document get indexed, the percolator look for a match again registered queries
Percolator
Real Time Updates !
Percolator
Percolator
curl -XPUT 'http://node1:9200/twitter/.percolator/elasticsearch' -d '{
"query" : {
"match" : {
"message" : "elasticsearch"
}
}
}'
Percolator
$ curl -X GET http://node1:9200/twitter/tweet/_percolate -d '{
"doc" : {
    "user": "loicbertron",
    "post_date": "2014-03-12T19:00:00",
    "message": "A third message about #ElasticSearch"
}
}'
Percolator
{
    "took" : 19,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    },
    "total" : 1,
    "matches" : [
        {
             "_index" : "twitter",
             "_id" : "elasticsearch"
        }
    ]
}
{
"name": "Jules Verne",
"biography": "One of the greatest author",
!
"books": [
{
"title": "Vingt mille lieues sous les mers",
"genre": "Novel",
"publisher": "Hetzel"
}
{
"title": "Les Châteaux en Californie",
"genre": "Drama",
"publisher": "Marc Soriano"
}
]
}
Inner objects
curl -XPUT node1:9200/authors/bare_author/1 -d'{
"name": "Jules Verne",
"biography": « One of the greets author"
}'
curl -XPOST node1:9200/authors/book/1?parent=1 -d '{
"title": "Les Châteaux en Californie",
"genre": "Drama",
"publisher": "Marc Soriano"
}'
!
curl -XPOST node1:9200/authors/book/2?parent=1 -d '{
"title": "Vingt mille lieues sous les mers",
"genre": "Novel",
"publisher": "Hetzel"
!
}'
Parents / Childs
Others features
• Suggest API : Did you mean ?, Autocomplete, …
• Results Highlight
• More like this
• Backup Data : Snapshot / Restore
• File System
• Amazon S3
• HDFS
• Google Compute Engine
• Microsoft Azure
• Hadoop connector
Clients
• Perl
• Python
• Ruby
• Php
• Javascript
• Java
• .Net
• Scala
• Clojure
• Erlang
• Eventmachine
• Bash
• Ocaml
• Smalltalk
• Cold Fusion
Who’s using it ?
Questions
Thank you
Thank you David Pilato for his presentation : https://speakerdeck.com/dadoonet/tours-jug-elasticsearch
Thank you Kevin Kluge for his presentation : https://speakerdeck.com/elasticsearch/elasticsearch-in-20-minutes
Bonus :)
Suggest
curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{
  "suggest" : {
    "my-title-suggestions-1" : {
      "text" : "devloping",
      "term" : {
        "size" : 3,
        "field" : "title"  
      }
    }
  }
}'
Suggest
"suggest": {
    "my-title-suggestions-1": [
      {
        "text": "devloping",
        "offset": 0,
        "length": 9,
        "options": [
          {
            "text": "developing",
            "freq": 77,
            "score": 0.8888889
          },
          {
            "text": "deloping",
            "freq": 1,
            "score": 0.875
          },
          {
            "text": "deploying",
            "freq": 2,
            "score": 0.7777778
          }
        ]
      }
More Like This
curl -XGET 'http://node1:9200/twitter/tweet/1/_mlt?mlt_fields=tag,content&min_doc_freq=1'
{
    "more_like_this" : {
        "fields" : ["name.first", "name.last"],
        "like_text" : "text like this one",
        "min_term_freq" : 1,
        "max_query_terms" : 12,
        "percent_terms_to_match" : 0.95
    }
}
Highlight
{
    "query" : {...},
    "highlight" : {
        "number_of_fragments" : 3,
        "fragment_size" : 150,
        "tag_schema" : "styled",
        "fields" : {
            "_all" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"] },
            "bio.title" : { "number_of_fragments" : 0 },
            "bio.author" : { "number_of_fragments" : 0 },
            "bio.content" : { "number_of_fragments" : 5, "order" : "score" }
        }
    }
}
Highlight
Hadoop
Hadoop
• Java library for integrating Elasticsearch and Hadoop
• Pig, Hive, Cascading, MapReduce
• Search and Real Time Analytics with Elasticsearch, Hadoop as Data Lake
• Scales with Hadoop
Ad

More Related Content

What's hot (20)

MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
MongoDB
 
Mongo db presentation
Mongo db presentationMongo db presentation
Mongo db presentation
Julie Sommerville
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and JavaBuilding a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Java
antoinegirbal
 
Webinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaWebinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and Java
MongoDB
 
ElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to AggregationsElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to Aggregations
enterprisesearchmeetup
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
MongoDB
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
MongoDB
 
Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Sy...
Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Sy...Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Sy...
Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Sy...
André Ricardo Barreto de Oliveira
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
MongoDB
 
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, GermanyHarnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
André Ricardo Barreto de Oliveira
 
Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop
Natasha Wilson
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in Documents
MongoDB
 
elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practice
Jano Suchal
 
MongoDB .local Chicago 2019: Using Client Side Encryption in MongoDB 4.2
MongoDB .local Chicago 2019: Using Client Side Encryption in MongoDB 4.2MongoDB .local Chicago 2019: Using Client Side Encryption in MongoDB 4.2
MongoDB .local Chicago 2019: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
Webinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation FrameworkWebinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation Framework
MongoDB
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
MongoDB
 
MongoDB In Production At Sailthru
MongoDB In Production At SailthruMongoDB In Production At Sailthru
MongoDB In Production At Sailthru
ibwhite
 
Curiosity, outil de recherche open source par PagesJaunes
Curiosity, outil de recherche open source par PagesJaunesCuriosity, outil de recherche open source par PagesJaunes
Curiosity, outil de recherche open source par PagesJaunes
PagesJaunes
 
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based ShardingWebinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding
MongoDB
 
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
MongoDB
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and JavaBuilding a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Java
antoinegirbal
 
Webinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaWebinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and Java
MongoDB
 
ElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to AggregationsElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to Aggregations
enterprisesearchmeetup
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
MongoDB
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
MongoDB
 
Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Sy...
Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Sy...Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Sy...
Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Sy...
André Ricardo Barreto de Oliveira
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
MongoDB
 
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, GermanyHarnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
André Ricardo Barreto de Oliveira
 
Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop Online | MongoDB Atlas on GCP Workshop
Online | MongoDB Atlas on GCP Workshop
Natasha Wilson
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in Documents
MongoDB
 
elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practice
Jano Suchal
 
MongoDB .local Chicago 2019: Using Client Side Encryption in MongoDB 4.2
MongoDB .local Chicago 2019: Using Client Side Encryption in MongoDB 4.2MongoDB .local Chicago 2019: Using Client Side Encryption in MongoDB 4.2
MongoDB .local Chicago 2019: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
Webinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation FrameworkWebinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation Framework
MongoDB
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
MongoDB
 
MongoDB In Production At Sailthru
MongoDB In Production At SailthruMongoDB In Production At Sailthru
MongoDB In Production At Sailthru
ibwhite
 
Curiosity, outil de recherche open source par PagesJaunes
Curiosity, outil de recherche open source par PagesJaunesCuriosity, outil de recherche open source par PagesJaunes
Curiosity, outil de recherche open source par PagesJaunes
PagesJaunes
 
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based ShardingWebinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding
MongoDB
 

Similar to Montreal Elasticsearch Meetup (20)

SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
South Tyrol Free Software Conference
 
Query log analytics - using logstash, elasticsearch and kibana 28.11.2013
Query log analytics - using logstash, elasticsearch and kibana 28.11.2013Query log analytics - using logstash, elasticsearch and kibana 28.11.2013
Query log analytics - using logstash, elasticsearch and kibana 28.11.2013
Niels Henrik Hagen
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
Karel Minarik
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
Alexei Gorobets
 
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
confluent
 
DRUPAL AND ELASTICSEARCH
DRUPAL AND ELASTICSEARCHDRUPAL AND ELASTICSEARCH
DRUPAL AND ELASTICSEARCH
DrupalCamp Kyiv
 
Elasticmeetup curiosity 20141113
Elasticmeetup curiosity 20141113Elasticmeetup curiosity 20141113
Elasticmeetup curiosity 20141113
Erwan Pigneul
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcases
Andrii Gakhov
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
Sematext Group, Inc.
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
Codemotion
 
Tracking and visualizing COVID-19 with Elastic stack
Tracking and visualizing COVID-19 with Elastic stackTracking and visualizing COVID-19 with Elastic stack
Tracking and visualizing COVID-19 with Elastic stack
Anna Ossowski
 
Real-time search in Drupal. Meet Elasticsearch
Real-time search in Drupal. Meet ElasticsearchReal-time search in Drupal. Meet Elasticsearch
Real-time search in Drupal. Meet Elasticsearch
Alexei Gorobets
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
Philips Kokoh Prasetyo
 
Peggy elasticsearch應用
Peggy elasticsearch應用Peggy elasticsearch應用
Peggy elasticsearch應用
LearningTech
 
Looking at Content Recommendations through a Search Lens - Extended Version
Looking at Content Recommendations through a Search Lens - Extended VersionLooking at Content Recommendations through a Search Lens - Extended Version
Looking at Content Recommendations through a Search Lens - Extended Version
Sonya Liberman
 
Elasticsearch intro output
Elasticsearch intro outputElasticsearch intro output
Elasticsearch intro output
Tom Chen
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Ricardo Peres
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
Amit Juneja
 
Internet of things
Internet of thingsInternet of things
Internet of things
Bryan Reinero
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
Matteo Moci
 
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
South Tyrol Free Software Conference
 
Query log analytics - using logstash, elasticsearch and kibana 28.11.2013
Query log analytics - using logstash, elasticsearch and kibana 28.11.2013Query log analytics - using logstash, elasticsearch and kibana 28.11.2013
Query log analytics - using logstash, elasticsearch and kibana 28.11.2013
Niels Henrik Hagen
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
Karel Minarik
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
Alexei Gorobets
 
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
confluent
 
DRUPAL AND ELASTICSEARCH
DRUPAL AND ELASTICSEARCHDRUPAL AND ELASTICSEARCH
DRUPAL AND ELASTICSEARCH
DrupalCamp Kyiv
 
Elasticmeetup curiosity 20141113
Elasticmeetup curiosity 20141113Elasticmeetup curiosity 20141113
Elasticmeetup curiosity 20141113
Erwan Pigneul
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcases
Andrii Gakhov
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
Codemotion
 
Tracking and visualizing COVID-19 with Elastic stack
Tracking and visualizing COVID-19 with Elastic stackTracking and visualizing COVID-19 with Elastic stack
Tracking and visualizing COVID-19 with Elastic stack
Anna Ossowski
 
Real-time search in Drupal. Meet Elasticsearch
Real-time search in Drupal. Meet ElasticsearchReal-time search in Drupal. Meet Elasticsearch
Real-time search in Drupal. Meet Elasticsearch
Alexei Gorobets
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
Philips Kokoh Prasetyo
 
Peggy elasticsearch應用
Peggy elasticsearch應用Peggy elasticsearch應用
Peggy elasticsearch應用
LearningTech
 
Looking at Content Recommendations through a Search Lens - Extended Version
Looking at Content Recommendations through a Search Lens - Extended VersionLooking at Content Recommendations through a Search Lens - Extended Version
Looking at Content Recommendations through a Search Lens - Extended Version
Sonya Liberman
 
Elasticsearch intro output
Elasticsearch intro outputElasticsearch intro output
Elasticsearch intro output
Tom Chen
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
Amit Juneja
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
Matteo Moci
 
Ad

Recently uploaded (20)

Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Ad

Montreal Elasticsearch Meetup

  • 2. Loïc Bertron Director of Research & Development @Cedrom-SNI ! Working on Big Data for Cedrom-SNI : social media, tv & radio aggregation Introduced Elasticsearch at Cedrom-Sni ! Cedrom-Sni ! 10k+ different sources, 750k+ new docs/days Our job : Ingesting, enriching, extracting analytics and intelligence from docs [email protected] linkedin.com/in/loicbertron @loicbertron Who am I ?
  • 3. ElasticSearch is offering advanced search features to any application or website easily, scaling on a large amount of data. « » ElasticSearch
  • 4. Simple : Plug & Play - Schema free - RESTful API ! Elastic : Automatically discover all others instances ! Strong : Replication & Load balancing - Scales massively - Lucene based ! Fast : Requests executed in parallel - Real Time ! Full featured : Search, Analytics, Facets, Percolator, Geo search, Suggest, Plugins … What is ElasticSearch ?
  • 5. Document as JSON • Object representing your data • Grouped in an index • One index can have multiples types of documents {     "message": "Introducing #ElasticSearch", "post_date": "2014-03-12T18:30:00",     "author": { "first_name" : "Loïc", "email" : "[email protected]" }, "employee_at_Cedrom" : true, "Tags" : ["Meetup","Montreal"] }
  • 6. • API REST : http://host:port/[index]/[type]/[_action/id]
 HTTP Methods: GET, POST, PUT, DELETE • Documents • http://node1:9200/twitter/tweet/1 (POST) • http://node1:9200/twitter/tweet/1 (GET) • http://node1:9200/twitter/tweet/1 (DELETE) • Search • http://node1:9200/twitter/tweet/_search (GET) • http://node1:9200/twitter/_search (GET) • http://node1:9200/_search (GET) • Metadata • http://node1:9200/twitter/_status (GET) • http://node1:9200/_shutdown (POST) API
  • 7. Index a document $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:30:00",     "message": "Introducing #ElasticSearch" }'
  • 9. Update a document $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" }'
  • 11. $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Search for documents $ curl -XGET http://node1:9200/twitter/tweet/_search?q=elasticsearch
  • 12. Search for documents { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } }
  • 13. Search for documents { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } } Execution time
  • 14. { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } } # of documents matching Search for documents
  • 15. { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } } Infos Search for documents
  • 16. { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } } Score Search for documents
  • 17. { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } } Document Search for documents
  • 18. Search operand Terms quebec quebec ontario Phrases "city of montréal" Proximity "montreal collusion" ~5 Fuzzy schwarzenegger ~0.8 Wildcards queb* Boosting Quebec^5 montreal Range [2011/03/12 TO 2014/03/12] [java to json] Boolean quebec AND NOT montreal +quebec -montreal (quebec OR ottawa) AND NOT toronto Fields title:montreal^10 OR body:montreal $ curl -XGET http://node1:9200/twitter/tweet/_search?q=<Your Query>
  • 19. $ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{ "query": { "filtered" : { "query" : { "bool" : { ! "must" : { "match" : { "author.first_name" : { "query" : "loic", "fuzziness" : 0.1 } } }, ! "must" : { "multi_match" : { "query" : "elasticsearch", "fields" : ["title^10","body"] } } } }, ! "filter": { "and" : [ {"terms" : { "tags" : ["search","scale","store"] } }, {"range" : { "created_at" : {"from": "2013" } } } , {"term": { "featured" : true } } ] } } } }’ Query DSL
  • 20. $ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{ "query": { "filtered" : { "query" : { "bool" : { ! "must" : { "match" : { "author.first_name" : { "query" : "loic", "fuzziness" : 0.1 } } }, ! "must" : { "multi_match" : { "query" : "elasticsearch", "fields" : ["title^10","body"] } } } }, ! "filter": { "and" : [ {"terms" : { "tags" : ["search","scale","store"] } }, {"range" : { "created_at" : {"from": "2013" } } } , {"term": { "featured" : true } } ] } } } }’ Query DSL "must" : { "match" : { "author.first_name" : { "query" : "loic", "fuzziness" : 0.1 } }
  • 21. $ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{ "query": { "filtered" : { "query" : { "bool" : { ! "must" : { "match" : { "author.first_name" : { "query" : "loic", "fuzziness" : 0.1 } } }, ! "must" : { "multi_match" : { "query" : "elasticsearch", "fields" : ["title^10","body"] } } } }, ! "filter": { "and" : [ {"terms" : { "tags" : ["search","scale","store"] } }, {"range" : { "created_at" : {"from": "2013" } } } , {"term": { "featured" : true } } ] } } } }’ Query DSL "must" : { "multi_match" : { "query" : "elasticsearch", "fields" : ["title^10","body"] } }
  • 22. $ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{ "query": { "filtered" : { "query" : { "bool" : { ! "must" : { "match" : { "author.first_name" : { "query" : "loic", "fuzziness" : 0.1 } } }, ! "must" : { "multi_match" : { "query" : "elasticsearch", "fields" : ["title^10","body"] } } } }, ! "filter": { "and" : [ {"terms" : { "tags" : ["search","scale","store"] } }, {"range" : { "created_at" : {"from": "2013" } } } , {"term": { "featured" : true } } ] } } } }’ Query DSL "filter": { "and" : [ {"terms" : { "tags" : ["search","scale","store"] } }, {"range" : { "created_at" : {"from": "2013" } } } , {"term": { "featured" : true } } ] }
  • 25. $ curl -XPOST http://node1:9200/articles/_search -d '{     "aggregations" : { "tag_cloud" : { "terms" : {"field" : "tags"} } } }' Tag Cloud "aggregations" : { "tag_cloud" :[ {"terms": "Quebec", "count" : 5}, {"terms": "Montréal", "count" : 3}, ... ] }
  • 26. $ curl -XPOST http://node1:9200/students/_search?search_type=count -d '{     "facets": { "scores-per-subject" : { "terms_stats" : { "key_field" : "subject", "value_field" : "score" } } } }' Stats "facets" : { "scores-per-subject" : { "_type" : "terms_stats", "missing" : 0, "terms" : [ { "term" : "math", "count" : 4, "total_count" : 4, "min" : 25.0, "max" : 92.0, "total" : 267.0, "mean" : 66.75 }, […] } }
  • 27. Advanced facets : Aggregations { "rank": "21", "city": "Boston", "state": "MA", "population2012": "636479", "population2010": "617594", "land_area": "48.277", "density": "12793", "ansi": "619463", "location": { "lat": "42.332", "lon": "71.0202" } }
  • 28. curl -XGET "node1:9200/cities/_search?pretty" -d '{ "aggs" : { "mean_density_by_state" : { "terms" : { "field" : "state" }, "aggs": { "mean_density": { "avg" : { "field" : "density" } } } } } }' Advanced facets : Aggregations
  • 29. "aggregations" : { "mean_density_by_state" : { "terms" : [ { "term" : "CA", "doc_count" : 69, "mean_density" : { "value" : 5558.623188405797 } }, { "term" : "TX", "doc_count" : 32, "mean_density" : { "value" : 2496.625 } }, { "term" : "FL", "doc_count" : 20, "mean_density" : { "value" : 4006.6 } }, { "term" : "CO", "doc_count" : 11, Advanced facets : Aggregations
  • 32. Noeud 1 Cluster État du cluster : Vert Node 1 Cluster Shard 0 Shard 1 cluster state : Yellow Architecture $ curl -XPUT localhost:9200/twitter -d '{ "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }'
  • 33. Noeud 1 Cluster État du cluster : Vert Noeud 1 Cluster Shard 0 Shard 1 État du cluster : Jaune Node 1 Cluster Shard 0 Shard 1 cluster state : Green Node 2 Shard 0 Shard 1 adding a second node Architecture
  • 34. Node 1 Cluster Shard 0 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  • 35. Node 1 Cluster Shard 0 Node 3 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  • 36. Node 1 Cluster Shard 0 Node 3 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  • 37. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  • 38. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  • 39. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  • 40. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:30:00",     "message": "Introducing #ElasticSearch" }' Architecture
  • 41. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 Doc 1 $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:30:00",     "message": "Introducing #ElasticSearch" }' Architecture
  • 42. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 Doc 1 $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:30:00",     "message": "Introducing #ElasticSearch" }' Architecture
  • 43. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:30:00",     "message": "Introducing #ElasticSearch" }' Architecture Node 1 Node 2 Node 3 Node 4
  • 44. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 { "ok":true, "_index":"twitter", "_type":"tweet", "_id":"1" "_version":"1" } Architecture Node 1 Node 2 Node 3 Node 4
  • 45. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Architecture Node 1 Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:45:00",     "message": "The crowd is on fire #ElasticSearch" }'
  • 46. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Architecture Node 1 Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:45:00",     "message": "The crowd is on fire #ElasticSearch" }'
  • 47. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Architecture Node 1 Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:45:00",     "message": "The crowd is on fire #ElasticSearch" }'
  • 48. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 Architecture Node 1 Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:45:00",     "message": "The crowd is on fire #ElasticSearch" }'
  • 49. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 { "ok":true, "_index":"twitter", "_type":"tweet", "_id":"2" "_version":"1" } Architecture Node 1 Node 2 Node 3 Node 4
  • 50. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 1 Node 2 Node 3 Node 4
  • 51. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 1 Node 2 Node 3 Node 4
  • 52. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 1 Node 2 Node 3 Node 4
  • 53. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 1 Node 2 Node 3 Node 4
  • 54. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 1 Node 2 Node 3 Node 4
  • 55. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 Architecture Node 1 Node 2 Node 3 Node 4
  • 56. Cluster Shard 1Shard 1 Shard 0 Doc 1 Doc 2 Doc 2 Architecture Node 2 Node 3 Node 4
  • 57. Cluster Shard 1 Node 2 Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Architecture Node 3 Node 4 Shard 0 Doc 1
  • 58. Cluster Shard 1 Node 2 Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Architecture Node 3 Node 4 Shard 0 Doc 1
  • 59. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Architecture Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T19:00:00",     "message": "A third message about #ElasticSearch" }' Shard 0 Doc 1
  • 60. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Doc 3 Architecture Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T19:00:00",     "message": "A third message about #ElasticSearch" }' Shard 0 Doc 1
  • 61. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Doc 3 Architecture Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T19:00:00",     "message": "A third message about #ElasticSearch" }' Shard 0 Doc 1 Doc 3
  • 62. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Doc 3 { "ok":true, "_index":"twitter", "_type":"tweet", "_id":"3" "_version":"1" } Architecture Node 2 Node 3 Node 4 Shard 0 Doc 1 Doc 3
  • 63. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Doc 3 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 2 Node 3 Node 4 Shard 0 Doc 1 Doc 3
  • 64. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1Doc 3 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 2 Node 3 Node 4 Shard 0 Doc 1 Doc 3
  • 65. Cluster Shard 1Shard 1 Doc 2 Doc 2 Architecture Node 2 Node 4
  • 66. How users see search ? ResultUser Query List of results
  • 67. How search engine works? 1. Fetch document field 2. Pick configured anlyser 3. Parse text inot tokens 4. Apply token filters 5. Store into index
  • 69. Analyzer { "tokens" : [ { "token" : "édith", "start_offset" : 0, "end_offset" : 5, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "piaf", "start_offset" : 6, "end_offset" : 10, "type" : "<ALPHANUM>", "position" : 2 }, { "token" : "vedette", "start_offset" : 11, "end_offset" : 18, "type" : "<ALPHANUM>", "position" : 3 }, { "token" : "du", "start_offset" : 19, "end_offset" : 21, "type" : "<ALPHANUM>", "position" : 4 }, { "token" : "feu", "start_offset" : 22, "end_offset" : 25, "type" : "<ALPHANUM>", "position" : 5 }, { "token" : "d'artifice", "start_offset" : 26, "end_offset" : 36, "type" : "<ALPHANUM>", "position" : 6 } ] }
  • 70. composed of a single tokenizer and zero or more filters Analyzer
  • 71. Cutting out a string of words & transforming : ! Whitespace tokenizer : «Édith piaf» -> «Édith», «Piaf» ! Standard tokenizer : «Édith piaf!» -> «édith», «piaf» Tokenizer
  • 72. Modify, delete or add tokens ! Asciifolding filter : «Édith Piaf» -> «Edith Piaf» ! Stemmer filter (english) : «stemming» -> «stem» «fishing», «fished», «fisher» -> «fish» «cats,catlike» -> «cat» ! Phonetic : «quick» -> «Q200» «quik» -> «Q200» ! Edge nGram : «Montreal» -> [«Mon», «Mont», «Montr»] Filters
  • 73. Analyzer { "tokens" : [ { "token" : "edith", "start_offset" : 0, "end_offset" : 5, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "piaf", "start_offset" : 6, "end_offset" : 10, "type" : "<ALPHANUM>", "position" : 2 }, { "token" : "vedet", "start_offset" : 11, "end_offset" : 18, "type" : "<ALPHANUM>", "position" : 3 }, { "token" : "feu", "start_offset" : 22, "end_offset" : 25, "type" : "<ALPHANUM>", "position" : 5 }, ! ! { "token" : "artific", "start_offset" : 26, "end_offset" : 36, "type" : "<ALPHANUM>", "position" : 6 } ] }
  • 74. 1.Documents get indexed 2.I come back often on the search page to run my request 3.I hope that my document will be well ranked to be on top of the results page 4.if not, i won’t never see my document Regular search engine usage
  • 75. 1. Register my query 2. When document get indexed, the percolator look for a match again registered queries Percolator
  • 76. Real Time Updates ! Percolator
  • 77. Percolator curl -XPUT 'http://node1:9200/twitter/.percolator/elasticsearch' -d '{ "query" : { "match" : { "message" : "elasticsearch" } } }'
  • 78. Percolator $ curl -X GET http://node1:9200/twitter/tweet/_percolate -d '{ "doc" : {     "user": "loicbertron",     "post_date": "2014-03-12T19:00:00",     "message": "A third message about #ElasticSearch" } }'
  • 79. Percolator {     "took" : 19,     "_shards" : {         "total" : 5,         "successful" : 5,         "failed" : 0     },     "total" : 1,     "matches" : [         {              "_index" : "twitter",              "_id" : "elasticsearch"         }     ] }
  • 80. { "name": "Jules Verne", "biography": "One of the greatest author", ! "books": [ { "title": "Vingt mille lieues sous les mers", "genre": "Novel", "publisher": "Hetzel" } { "title": "Les Châteaux en Californie", "genre": "Drama", "publisher": "Marc Soriano" } ] } Inner objects
  • 81. curl -XPUT node1:9200/authors/bare_author/1 -d'{ "name": "Jules Verne", "biography": « One of the greets author" }' curl -XPOST node1:9200/authors/book/1?parent=1 -d '{ "title": "Les Châteaux en Californie", "genre": "Drama", "publisher": "Marc Soriano" }' ! curl -XPOST node1:9200/authors/book/2?parent=1 -d '{ "title": "Vingt mille lieues sous les mers", "genre": "Novel", "publisher": "Hetzel" ! }' Parents / Childs
  • 82. Others features • Suggest API : Did you mean ?, Autocomplete, … • Results Highlight • More like this • Backup Data : Snapshot / Restore • File System • Amazon S3 • HDFS • Google Compute Engine • Microsoft Azure • Hadoop connector
  • 83. Clients • Perl • Python • Ruby • Php • Javascript • Java • .Net • Scala • Clojure • Erlang • Eventmachine • Bash • Ocaml • Smalltalk • Cold Fusion
  • 86. Thank you Thank you David Pilato for his presentation : https://speakerdeck.com/dadoonet/tours-jug-elasticsearch Thank you Kevin Kluge for his presentation : https://speakerdeck.com/elasticsearch/elasticsearch-in-20-minutes
  • 88. Suggest curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{   "suggest" : {     "my-title-suggestions-1" : {       "text" : "devloping",       "term" : {         "size" : 3,         "field" : "title"         }     }   } }'
  • 89. Suggest "suggest": {     "my-title-suggestions-1": [       {         "text": "devloping",         "offset": 0,         "length": 9,         "options": [           {             "text": "developing",             "freq": 77,             "score": 0.8888889           },           {             "text": "deloping",             "freq": 1,             "score": 0.875           },           {             "text": "deploying",             "freq": 2,             "score": 0.7777778           }         ]       }
  • 90. More Like This curl -XGET 'http://node1:9200/twitter/tweet/1/_mlt?mlt_fields=tag,content&min_doc_freq=1' {     "more_like_this" : {         "fields" : ["name.first", "name.last"],         "like_text" : "text like this one",         "min_term_freq" : 1,         "max_query_terms" : 12,         "percent_terms_to_match" : 0.95     } }
  • 92. {     "query" : {...},     "highlight" : {         "number_of_fragments" : 3,         "fragment_size" : 150,         "tag_schema" : "styled",         "fields" : {             "_all" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"] },             "bio.title" : { "number_of_fragments" : 0 },             "bio.author" : { "number_of_fragments" : 0 },             "bio.content" : { "number_of_fragments" : 5, "order" : "score" }         }     } } Highlight
  • 94. Hadoop • Java library for integrating Elasticsearch and Hadoop • Pig, Hive, Cascading, MapReduce • Search and Real Time Analytics with Elasticsearch, Hadoop as Data Lake • Scales with Hadoop