SlideShare a Scribd company logo
Présentation ElasticSearch
1
Indexation d’un annuaire de restaurant
● Titre
● Description
● Prix
● Adresse
● Type
2
Création d’un index sans mapping
PUT restaurant
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
}
3
Indexation sans mapping
PUT restaurant/restaurant/1
{
"title": 42,
"description": "Un restaurant gastronomique où tout plat coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
4
Risque de l’indexation sans mapping
PUT restaurant/restaurant/2
{
"title": "Pizza de l'ormeau",
"description": "Dans cette pizzeria on trouve
des pizzas très bonnes et très variés",
"price": 10,
"adresse": "1 place de l'ormeau, 31400
TOULOUSE",
"type": "italien"
}
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "failed to parse [title]"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse [title]",
"caused_by": {
"type": "number_format_exception",
"reason": "For input string: "Pizza de
l'ormeau""
}
},
"status": 400
} 5
Mapping inféré
GET /restaurant/_mapping
{
"restaurant": {
"mappings": {
"restaurant": {
"properties": {
"adresse": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"prix": {
"type": "long"
},
"title": {
"type": "long"
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
6
Création d’un mapping
PUT :url/restaurant
{
"settings": {
"index": {"number_of_shards": 3, "number_of_replicas": 2}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text"},
"description": {"type": "text"},
"price": {"type": "integer"},
"adresse": {"type": "text"},
"type": { "type": "keyword"}
}
}
}
}
7
Indexation de quelques restaurants
POST :url/restaurant/restaurant/_bulk
{"index": {"_id": 1}}
{"title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse":
"10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"}
{"index": {"_id": 2}}
{"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très
variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"}
{"index": {"_id": 3}}
{"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux", "price": 14, "adresse": "13
route de labège, 31400 TOULOUSE", "type": "asiatique"}
8
Recherche basique
GET :url/restaurant/_search
{
"query": {
"match": {
"description": "asiatique"
}
}
}
{
"hits": {
"total": 1,
"max_score": 0.6395861,
"hits": [
{
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux pour un prix
contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}
]
}
}
9
Mise en défaut de notre mapping
GET :url/restaurant/_search
{
"query": {
"match": {
"description": "asiatiques"
}
}
}
{
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
10
Qu’est ce qu’un analyseur
● Transforme une chaîne de caractères en token
○ Ex: “Le chat est rouge” -> [“le”, “chat”, “est”, “rouge”]
● Les tokens permettent de construire un index inversé
11
Qu’est ce qu’un index inversé
12
Explication: analyseur par défaut
GET /_analyze
{
"analyzer": "standard",
"text": "Un restaurant asiatique très copieux"
}
{
"tokens": [{
"token": "un",
"start_offset": 0, "end_offset": 2,
"type": "<ALPHANUM>", "position": 0
},{
"token": "restaurant",
"start_offset": 3, "end_offset": 13,
"type": "<ALPHANUM>", "position": 1
},{
"token": "asiatique",
"start_offset": 14, "end_offset": 23,
"type": "<ALPHANUM>", "position": 2
},{
"token": "très",
"start_offset": 24, "end_offset": 28,
"type": "<ALPHANUM>", "position": 3
},{
"token": "copieux",
"start_offset": 29, "end_offset": 36,
"type": "<ALPHANUM>", "position": 4
}
]
}
13
Explication: analyseur “french”
GET /_analyze
{
"analyzer": "french",
"text": "Un restaurant asiatique très copieux"
}
{
"tokens": [
{
"token": "restaurant",
"start_offset": 3, "end_offset": 13,
"type": "<ALPHANUM>", "position": 1
},{
"token": "asiat",
"start_offset": 14, "end_offset": 23,
"type": "<ALPHANUM>", "position": 2
},{
"token": "trè",
"start_offset": 24, "end_offset": 28,
"type": "<ALPHANUM>", "position": 3
},{
"token": "copieu",
"start_offset": 29, "end_offset": 36,
"type": "<ALPHANUM>", "position": 4
}
]
} 14
Décomposition d’un analyseur
Elasticsearch décompose l’analyse en trois étapes:
● Filtrage des caractères (ex: suppression de balises html)
● Découpage en “token”
● Filtrage des tokens:
○ Suppression de token (mot vide de sens “un”, “le”, “la”)
○ Transformation (lemmatisation...)
○ Ajout de tokens (synonyme)
15
Décomposition de l’analyseur french
GET /_analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "elision",
"articles_case": true,
"articles": [
"l", "m", "t", "qu", "n", "s", "j", "d", "c",
"jusqu", "quoiqu", "lorsqu", "puisqu"
]
}, {
"type": "stop", "stopwords": "_french_"
}, {
"type": "stemmer", "language": "french"
}
],
"text": "ce n'est qu'un restaurant asiatique très copieux"
}
“ce n’est qu’un restaurant asiatique très
copieux”
[“ce”, “n’est”, “qu’un”, “restaurant”, “asiatique”,
“très”, “copieux”]
[“ce”, “est”, “un”, “restaurant”, “asiatique”,
“très”, “copieux”]
[“restaurant”, “asiatique”, “très”, “copieux”]
[“restaurant”, “asiat”, “trè”, “copieu”]
elision
standard tokenizer
stopwords
french stemming
16
Spécification de l’analyseur dans le mapping
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {fields: {"type": "text", "analyzer": "french"}},
"description": {"type": "text", "analyzer": "french"},
"price": {"type": "integer"},
"adresse": {"type": "text", "analyzer": "french"},
"type": { "type": "keyword"}
}
}
}
}
17
Recherche résiliente aux erreurs de frappe
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": "asiatuques"
}
}
}
{
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
18
Une solution le ngram token filter
GET /_analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "ngram",
"min_gram": 3,
"max_gram": 7
}
],
"text": "asiatuque"
}
[
"asi",
"asia",
"asiat",
"asiatu",
"asiatuq",
"sia",
"siat",
"siatu",
"siatuq",
"siatuqu",
"iat",
"iatu",
"iatuq",
"iatuqu",
"iatuque",
"atu",
"atuq",
"atuqu",
"atuque",
"tuq",
"tuqu",
"tuque",
"uqu",
"uque",
"que"
]
19
Création d’un analyseur custom pour utiliser le ngram filter
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}},
"analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]}}
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "ngram_analyzer"},
"description": {"type": "text", "analyzer": "ngram_analyzer"},
"price": {"type": "integer"},
"adresse": {"type": "text", "analyzer": "ngram_analyzer"},
"type": {"type": "keyword"}
}
}
}
20
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": "asiatuques"
}
}
}
{
"hits": {
"hits": [
{
"_score": 0.60128295,
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux
pour un prix contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}, {
"_score": 0.46237043,
"_source": {
"title": 42,
"description": "Un restaurant gastronomique où
tout plat coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000
TOULOUSE",
"type": "gastronomie"
21
Bruit induit par le ngram
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": "gastronomique"
}
}
}
{
"hits": {
"hits": [
{
"_score": 0.6277555,
"_source": {
"title": 42,
"description": "Un restaurant gastronomique où tout plat
coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
},{
"_score": 0.56373334,
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux pour un
prix contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
},
22
Spécifier plusieurs analyseurs pour un champs
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}},
"analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"description": {
"type": "text", "analyzer": "french",
"fields": {
"ngram": { "type": "text", "analyzer": "ngram_analyzer"}
},
"price": {"type": "integer"},
23
Utilisation de plusieurs champs lors d’une recherche
GET /restaurant/restaurant/_search
{
"query": {
"multi_match": {
"query": "gastronomique",
"fields": [
"description^4",
"description.ngram"
]
}
}
}
{
"hits": {
"hits": [
{
"_score": 2.0649285,
"_source": {
"title": 42,
"description": "Un restaurant gastronomique où tout plat coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
},
{
"_score": 0 .56373334,
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux pour un prix contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
},
{
"_index": "restaurant",
24
Ignorer ou ne pas ignorer les stopwords tel est la question
POST :url/restaurant/restaurant/_bulk
{"index": {"_id": 1}}
{"title": 42, "description": "Un restaurant gastronomique donc cher ou tout plat coûte cher (42 euros)", "price":
42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"}
{"index": {"_id": 2}}
{"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très
variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"}
{"index": {"_id": 3}}
{"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique"}
25
Les stopwords ne sont pas
forcément vide de sens
GET /restaurant/restaurant/_search
{
"query": {
"match_phrase": {
"description": "pas cher"
}
}
}
{
"hits": {
"hits": [
{
"_source": {
"title": 42,
"description": "Un restaurant gastronomique donc
cher ou tout plat coûte cher (42 euros)",
"price": 42,
"adresse": "10 rue de l'industrie, 31000
TOULOUSE",
"type": "gastronomie"
}
},{
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux
et pas cher",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}
26
Modification de l’analyser french
pour garder les stopwords
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {
"french_elision": {
"type": "elision",
"articles_case": true,
"articles": [“l", "m", "t", "qu", "n", "s","j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"]
},
"french_stemmer": {"type": "stemmer", "language": "light_french"}
},
"analyzer": {
"custom_french": {
"tokenizer": "standard",
"filter": [
"french_elision",
"lowercase",
"french_stemmer"
]
}
27
GET /restaurant/restaurant/_search
{
"query": {
"match_phrase": {
"description": "pas cher"
}
}
}
{
"hits": {
"hits": [
{
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant
asiatique très copieux et pas cher",
"price": 14,
"adresse": "13 route de labège,
31400 TOULOUSE",
"type": "asiatique"
}
}
]
}
}
28
Rechercher avec les stopwords sans diminuer les
performances
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": {
"query": "restaurant pas
cher",
"cutoff_frequency": 0.01
}
}
}
}
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"must": {
"bool": {
"should": [
{"term": {"description": "restaurant"}},
{"term": {"description": "cher"}}]
}
},
"should": [
{"match": {
"description": "pas"
}}
]
}
29
Personnaliser le “scoring”
GET /restaurant/restaurant/_search
{
"query": {
"function_score": {
"query": {
"match": {
"adresse": "toulouse"
}
},
"functions": [{
"filter": { "terms": { "type": ["asiatique", "italien"]}},
"weight": 2
}]
}
}
}
30
Personnaliser le “scoring”
GET /restaurant/restaurant/_search
{
"query": {
"function_score": {
"query": {
"match": {
"adresse": "toulouse"
}
},
"script_score": {
"script": {
"lang": "painless",
"inline": "_score * ( 1 + 10/doc['prix'].value)"
}
}
}
}
}
{
"hits": {
"hits": [
{
"_score": 0.53484553,
"_source": {
"title": "Pizza de l'ormeau",
"price": 10,
"adresse": "1 place de l'ormeau, 31400 TOULOUSE",
"type": "italien"
}
}, {
"_score": 0.26742277,
"_source": {
"title": 42,
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
}, {
"_score": 0.26742277,
"_source": {
"title": "Chez l'oncle chan",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}
]
}
}
31
Comment indexer les documents multilingues
Trois cas:
● Champs avec plusieurs langages (ex: {"message": "warning | attention | cuidado"})
○ Ngram
○ Analysé plusieurs fois le même champs avec un analyseur par langage
● Un champ par langue:
○ Facile car on peut spécifier un analyseur différent par langue
○ Attention de ne pas se retrouver avec un index parsemé
● Une version du document par langue (à favoriser)
○ Un index par document
○ Surtout ne pas utiliser des types pour chaque langue dans le même index (problème de statistique)
32
Gestion des synonymes
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {
"french_elision": {
"type": "elision", "articles_case": true,
"articles": ["l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"]
},
"french_stemmer": {"type": "stemmer", "language": "light_french"},
"french_synonym": {"type": "synonym", "synonyms": ["sou marin => sandwitch", "formul, menu"]}
},
"analyzer": {
"french_with_synonym": {
"tokenizer": "standard",
"filter": ["french_elision", "lowercase", "french_stemmer", "french_synonym"]
}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"description": { "type": "text", "analyzer": "french", "search_analyzer": "french_with_synonym"},
"price": {"type": "integer"},
"adresse": {"type": "text", "analyzer": "french"},
"coord": {"type": "geo_point"},
33
Gestions des synonymes
GET /restaurant/restaurant/_search
{
"query": {
"match": {"description": "sous-marins"}
}
}
{
"hits": {
"hits": [
{
"_source": {
"title:": "Subway",
"description": "service très rapide,
rapport qualité/prix médiocre mais on peut choisir la
composition de son sandwitch",
"price": 8,
"adresse": "211 route de narbonne,
31520 RAMONVILLE",
"type": "fastfood",
"coord": "43.5577519,1.4625753"
}
}
]
}
}
34
Données géolocalisées
PUT /restaurant
{
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"description": {"type": "text", "analyzer": "french"
},
"price": {"type": "integer"},
"adresse": {"type": "text","analyzer": "french"},
"coord": {"type": "geo_point"},
"type": { "type": "keyword"}
}
}
}
}
35
Données géolocalisées
POST restaurant/restaurant/_bulk
{"index": {"_id": 1}}
{"title": "bistronomique", "description": "Un restaurant bon mais un petit peu cher, les desserts sont excellents",
"price": 17, "adresse": "73 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.57417,1.4905748"}
{"index": {"_id": 2}}
{"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés",
"price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien", "coord": "43.579225,1.4835248"}
{"index": {"_id": 3}}
{"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14,
"adresse": "18 rue des cosmonautetes, 31400 TOULOUSE", "type": "asiatique", "coord": "43.5612759,1.4936073"}
{"index": {"_id": 4}}
{"title:": "Un fastfood très connu", "description": "service très rapide, rapport qualité/prix médiocre", "price": 8,
"adresse": "210 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5536343,1.476165"}
{"index": {"_id": 5}}
{"title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la
composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 RAMONVILLE", "type": "fastfood",
"coord": "43.5577519,1.4625753"}
{"index": {"_id": 6}}
{"title:": "L'évidence", "description": "restaurant copieux et pas cher, cependant c'est pas bon", "price": 12,
"adresse": "38 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.5770109,1.4846573"} 36
Filtrage et trie sur données
géolocalisées
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"filter": [
{"term": {"type":"français"}},
{"geo_distance": {
"distance": "1km",
"coord": {"lat": 43.5739329, "lon": 1.4893669}
}}
]
}
},
"sort": [{
"geo_distance": {
"coord": {"lat": 43.5739329, "lon": 1.4893669},
"unit": "km"
}
}]
{
"hits": {
"hits": [
{
"_source": {
"title": "bistronomique",
"description": "Un restaurant bon mais un petit peu cher, les desserts sont
"price": 17,
"adresse": "73 route de revel, 31400 TOULOUSE",
"type": "français",
"coord": "43.57417,1.4905748"
},
"sort": [0.10081529266640063]
},{
"_source": {
"title:": "L'évidence",
"description": "restaurant copieux et pas cher, cependant c'est pas bon",
"price": 12,
"adresse": "38 route de revel, 31400 TOULOUSE",
"type": "français",
"coord": "43.5770109,1.4846573"
},
"sort": [0.510960087579506]
},{
"_source": {
"title:": "Chez Ingalls",
"description": "Contemporain et rustique, ce restaurant avec cheminée sert
savoyardes et des grillades",
37
Explication de la requête Bool
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"must": {"match": {"description": "sandwitch"}},
"should" : [
{"match": {"description": "bon"}},
{"match": {"description": "excellent"}}
],
"must_not": [
{"match_phrase": {
"description": "pas bon"
}}
],
"filter": [
{"range": {"price": {
"lte": "20"
}}}
]
}
} 38
Explication de la requête Bool
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"should" : [
{"match": {"description": "bon"}},
{"match": {"description": "excellent"}},
{"match": {"description": "service rapide"}}
],
"minimum_number_should_match": 2
}
}
}
39
Proposer une recherche avancé
à vos utilisateurs
GET /restaurant/restaurant/_search
{
"query": {
"simple_query_string": {
"fields": ["description", "title^2", "adresse", "type"],
"query": "-"pas bon" +(pizzi~2 OR sandwitch)"
}
}
}
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"must_not": {
"multi_match": {
"fields": [ "description", , "title^2", "adresse", "type"],
"type": "phrase",
"query": "pas bon"
}
},
"should": [
{"multi_match": {
"fields": [ "description", , "title^2", "adresse", "type"],
"fuziness": 2,
"max_expansions": 50,
"query": "pizzi"
}
},
{"multi_match": {
"fields": [ "description", , "title^2", "adresse",
"type"],
"query": "sandwitch"
} 40
Alias: comment se donner des marges de manoeuvre
PUT /restaurant_v1/
{
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text"},
"lat": {"type": "double"},
"lon": {"type": "double"}
}
}
}
}
POST /_aliases
{
"actions": [
{"add": {"index": "restaurant_v1", "alias": "restaurant_search"}},
{"add": {"index": "restaurant_v1", "alias": "restaurant_write"}}
]
}
41
Alias, Pipeline et reindexion
PUT /restaurant_v2
{
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"position": {"type": "geo_point"}
}
}
}
}
PUT /_ingest/pipeline/fixing_position
{
"description": "move lat lon into position parameter",
"processors": [
{"rename": {"field": "lat", "target_field": "position.lat"}},
{"rename": {"field": "lon", "target_field": "position.lon"}}
]
}
POST /_aliases
{
"actions": [
{"remove": {"index": "restaurant_v1", "alias":
"restaurant_search"}},
{"remove": {"index": "restaurant_v1", "alias":
"restaurant_write"}},
{"add": {"index": "restaurant_v2", "alias":
"restaurant_search"}},
{"add": {"index": "restaurant_v2", "alias": "restaurant_write"}}
]
}
POST /_reindex
{
"source": {"index": "restaurant_v1"},
"dest": {"index": "restaurant_v2", "pipeline": "fixing_position"}
}
42
Analyse des données des interventions des pompiers
de 2005 à 2014
PUT /pompier
{
"mappings": {
"intervention": {
"properties": {
"date": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss"},
"type_incident": { "type": "keyword" },
"description_groupe": { "type": "keyword" },
"caserne": { "type": "integer"},
"ville": { "type": "keyword"},
"arrondissement": { "type": "keyword"},
"division": {"type": "integer"},
"position": {"type": "geo_point"},
"nombre_unites": {"type": "integer"}
}
}
}
}
43
Voir les différents incidents
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"type_incident": {
"terms": {"field": "type_incident", "size": 100}
}
}
}
{
"aggregations": {
"type_incident": {
"buckets": [
{"key": "Premier répondant", "doc_count": 437891},
{"key": "Appel de Cie de détection", "doc_count": 76157},
{"key": "Alarme privé ou locale", "doc_count": 60879},
{"key": "Ac.véh./1R/s.v./ext/29B/D", "doc_count": 41734},
{"key": "10-22 sans feu", "doc_count": 29283},
{"key": "Acc. sans victime sfeu - ext.", "doc_count": 27663},
{"key": "Inondation", "doc_count": 26801},
{"key": "Problèmes électriques", "doc_count": 23495},
{"key": "Aliments surchauffés", "doc_count": 23428},
{"key": "Odeur suspecte - gaz", "doc_count": 21158},
{"key": "Déchets en feu", "doc_count": 18007},
{"key": "Ascenseur", "doc_count": 12703},
{"key": "Feu de champ *", "doc_count": 11518},
{"key": "Structure dangereuse", "doc_count": 9958},
{"key": "10-22 avec feu", "doc_count": 9876},
{"key": "Alarme vérification", "doc_count": 8328},
{"key": "Aide à un citoyen", "doc_count": 7722},
{"key": "Fuite ext.:hydrocar. liq. div.", "doc_count": 7351},
{"key": "Ac.véh./1R/s.v./V.R./29B/D", "doc_count": 6232},
{"key": "Feu de véhicule extérieur", "doc_count": 5943},
{"key": "Fausse alerte 10-19", "doc_count": 4680},
{"key": "Acc. sans victime sfeu - v.r", "doc_count": 3494},
{"key": "Assistance serv. muni.", "doc_count": 3431},
{"key": "Avertisseur de CO", "doc_count": 2542},
{"key": "Fuite gaz naturel 10-22", "doc_count": 1928},
{"key": "Matières dangereuses / 10-22", "doc_count": 1905},
{"key": "Feu de bâtiment", "doc_count": 1880},
{"key": "Senteur de feu à l'extérieur", "doc_count": 1566},
{"key": "Surchauffe - véhicule", "doc_count": 1499},
{"key": "Feu / Agravation possible", "doc_count": 1281},
{"key": "Fuite gaz naturel 10-09", "doc_count": 1257},
{"key": "Acc.véh/1rép/vict/ext 29D04", "doc_count": 1015},
{"key": "Acc. véh victime sfeu - (ext.)", "doc_count": 971},
44
Agrégations imbriquées
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"ville": {
"terms": {"field": "ville"},
"aggs": {
"arrondissement": {
"terms": {"field": "arrondissement"}
}
}
}
}
}
{
"aggregations": {"ville": {"buckets": [
{
"key": "Montréal", "doc_count": 768955,
"arrondissement": {"buckets": [
{"key": "Ville-Marie", "doc_count": 83010},
{"key": "Mercier / Hochelaga-Maisonneuve", "doc_count": 67272},
{"key": "Côte-des-Neiges / Notre-Dame-de-Grâce", "doc_count": 65933},
{"key": "Villeray / St-Michel / Parc Extension", "doc_count": 60951},
{"key": "Rosemont / Petite-Patrie", "doc_count": 59213},
{"key": "Ahuntsic / Cartierville", "doc_count": 57721},
{"key": "Plateau Mont-Royal", "doc_count": 53344},
{"key": "Montréal-Nord", "doc_count": 40757},
{"key": "Sud-Ouest", "doc_count": 39936},
{"key": "Rivière-des-Prairies / Pointe-aux-Trembles", "doc_count": 38139}
]}
}, {
"key": "Dollard-des-Ormeaux", "doc_count": 17961,
"arrondissement": {"buckets": [
{"key": "Indéterminé", "doc_count": 13452},
{"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 4477},
{"key": "Pierrefonds / Senneville", "doc_count": 10},
{"key": "Dorval / Ile Dorval", "doc_count": 8},
{"key": "Pointe-Claire", "doc_count": 8},
{"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 6}
]}
}, {
"key": "Pointe-Claire", "doc_count": 17925,
"arrondissement": {"buckets": [
{"key": "Indéterminé", "doc_count": 13126},
{"key": "Pointe-Claire", "doc_count": 4766},
{"key": "Dorval / Ile Dorval", "doc_count": 12},
{"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 7},
{"key": "Kirkland", "doc_count": 7},
{"key": "Beaconsfield / Baie d'Urfé", "doc_count": 5},
{"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 1},
{"key": "St-Laurent", "doc_count": 1}
45
Calcul de moyenne et trie d'agrégation
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"avg_nombre_unites_general": {
"avg": {"field": "nombre_unites"}
},
"type_incident": {
"terms": {
"field": "type_incident",
"size": 5,
"order" : {"avg_nombre_unites": "desc"}
},
"aggs": {
"avg_nombre_unites": {
"avg": {"field": "nombre_unites"}
}
}
}
}
{
"aggregations": {
"type_incident": {
"buckets": [
{
"key": "Feu / 5e Alerte", "doc_count": 162,
"avg_nombre_unites": {"value": 70.9074074074074}
}, {
"key": "Feu / 4e Alerte", "doc_count": 100,
"avg_nombre_unites": {"value": 49.36}
}, {
"key": "Troisième alerte/autre que BAT", "doc_count": 1,
"avg_nombre_unites": {"value": 43.0}
}, {
"key": "Feu / 3e Alerte", "doc_count": 173,
"avg_nombre_unites": {"value": 41.445086705202314}
}, {
"key": "Deuxième alerte/autre que BAT", "doc_count": 8,
"avg_nombre_unites": {"value": 37.5}
}
]
},
"avg_nombre_unites_general": {"value": 2.1374461758713728}
}
} 46
Percentile
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"unites_percentile": {
"percentiles": {
"field": "nombre_unites",
"percents": [25, 50, 75, 100]
}
}
}
}
{
"aggregations": {
"unites_percentile": {
"values": {
"25.0": 1.0,
"50.0": 1.0,
"75.0": 3.0,
"100.0": 275.0
}
}
}
}
47
Histogram
GET /pompier/interventions/_search
{
"size": 0,
"query": {
"term": {"type_incident": "Inondation"}
},
"aggs": {
"unites_histogram": {
"histogram": {
"field": "nombre_unites",
"order": {"_key": "asc"},
"interval": 1
},
"aggs": {
"ville": {
"terms": {"field": "ville", "size": 1}
}
}
}
}
}
{
"aggregations": {
"unites_histogram": {
"buckets": [
{
"key": 1.0, "doc_count": 23507,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 19417}]}
},{
"key": 2.0, "doc_count": 1550,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 1229}]}
},{
"key": 3.0, "doc_count": 563,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 404}]}
},{
"key": 4.0, "doc_count": 449,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 334}]}
},{
"key": 5.0, "doc_count": 310,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 253}]}
},{
"key": 6.0, "doc_count": 215,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 173}]}
},{
"key": 7.0, "doc_count": 136,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 112}]}
},{
"key": 8.0, "doc_count": 35,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 30}]}
},{
"key": 9.0, "doc_count": 10,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]}
},{
"key": 10.0, "doc_count": 11,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]}
},{
"key": 11.0, "doc_count": 2,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 2}]}
48
“Significant term”
GET /pompier/interventions/_search
{
"size": 0,
"query": {
"term": {"type_incident": "Inondation"}
},
"aggs": {
"ville": {
"significant_terms": {"field": "ville", "size": 5, "percentage": {}}
}
}
}
{
"aggregations": {
"ville": {
"doc_count": 26801,
"buckets": [
{
"key": "Ile-Bizard",
"score": 0.10029498525073746,
"doc_count": 68, "bg_count": 678
},
{
"key": "Montréal-Nord",
"score": 0.0826544804291675,
"doc_count": 416, "bg_count": 5033
},
{
"key": "Roxboro",
"score": 0.08181818181818182,
"doc_count": 27, "bg_count": 330
},
{
"key": "Côte St-Luc",
"score": 0.07654825526563974,
"doc_count": 487, "bg_count": 6362
},
{
"key": "Saint-Laurent",
"score": 0.07317073170731707,
"doc_count": 465, "bg_count": 6355
49
Agrégation et données géolocalisées
GET :url/pompier/interventions/_search
{
"size": 0,
"query": {
"regexp": {"type_incident": "Feu.*"}
},
"aggs": {
"distance_from_here": {
"geo_distance": {
"field": "position",
"unit": "km",
"origin": {
"lat": 45.495902,
"lon": -73.554263
},
"ranges": [
{ "to": 2},
{"from":2, "to": 4},
{"from":4, "to": 6},
{"from": 6, "to": 8},
{"from": 8}]
}
}
}
{
"aggregations": {
"distance_from_here": {
"buckets": [
{
"key": "*-2.0",
"from": 0.0,
"to": 2.0,
"doc_count": 80
},
{
"key": "2.0-4.0",
"from": 2.0,
"to": 4.0,
"doc_count": 266
},
{
"key": "4.0-6.0",
"from": 4.0,
"to": 6.0,
"doc_count": 320
},
{
"key": "6.0-8.0",
"from": 6.0,
"to": 8.0,
"doc_count": 326
},
{
"key": "8.0-*",
"from": 8.0,
"doc_count": 1720
}
]
}
}
}
50
Il y a t-il des questions ?
? 51
Proposer une recherche avancé
à vos utilisateurs
GET /restaurant/restaurant/_search
{
"query": {
"simple_query_string": {
"fields": ["description", "title^2", "adresse", "type"],
"query": ""service rapide"~2"
}
}
}
"hits": {
"hits": [
{
"_source": {
"title:": "Un fastfood très connu",
"description": "service très rapide,
rapport qualité/prix médiocre",
"price": 8,
"adresse": "210 route de narbonne, 31520
RAMONVILLE",
"type": "fastfood",
"coord": "43.5536343,1.476165"
}
},{
"_source": {
"title:": "Subway",
"description": "service très rapide,
rapport qualité/prix médiocre mais on peut choisir la
composition de son sandwitch",
"price": 8,
"adresse": "211 route de narbonne, 31520
GET /restaurant/restaurant/_search
{
"query": {
"match_phrase": {
"description": {
"slop": 2,
"query": "service rapide"
}
}
}
52
Ad

More Related Content

What's hot (20)

Data pipeline with kafka
Data pipeline with kafkaData pipeline with kafka
Data pipeline with kafka
Mole Wong
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
Rich Lee
 
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Edureka!
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Ismaeel Enjreny
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdf
Alkin Tezuysal
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
SANG WON PARK
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes
 
Elk
Elk Elk
Elk
Caleb Wang
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
kiran palaka
 
Pulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarniPulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarni
StreamNative
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
HostedbyConfluent
 
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATSKubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
NATS
 
Architecture at Scale
Architecture at ScaleArchitecture at Scale
Architecture at Scale
Elasticsearch
 
Using Kafka in your python application - Python fwdays 2020
Using Kafka in your python application - Python fwdays 2020Using Kafka in your python application - Python fwdays 2020
Using Kafka in your python application - Python fwdays 2020
Oleksandr Tarasenko
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
Kostas Tzoumas
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use cases
Itamar
 
Elasticsearch를 활용한 GIS 검색
Elasticsearch를 활용한 GIS 검색Elasticsearch를 활용한 GIS 검색
Elasticsearch를 활용한 GIS 검색
ksdc2019
 
Data pipeline with kafka
Data pipeline with kafkaData pipeline with kafka
Data pipeline with kafka
Mole Wong
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
Rich Lee
 
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Edureka!
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Ismaeel Enjreny
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdf
Alkin Tezuysal
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
SANG WON PARK
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
kiran palaka
 
Pulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarniPulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarni
StreamNative
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
HostedbyConfluent
 
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATSKubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
NATS
 
Architecture at Scale
Architecture at ScaleArchitecture at Scale
Architecture at Scale
Elasticsearch
 
Using Kafka in your python application - Python fwdays 2020
Using Kafka in your python application - Python fwdays 2020Using Kafka in your python application - Python fwdays 2020
Using Kafka in your python application - Python fwdays 2020
Oleksandr Tarasenko
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
Kostas Tzoumas
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use cases
Itamar
 
Elasticsearch를 활용한 GIS 검색
Elasticsearch를 활용한 GIS 검색Elasticsearch를 활용한 GIS 검색
Elasticsearch를 활용한 GIS 검색
ksdc2019
 

More from LINAGORA (20)

Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels
LINAGORA
 
Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !
LINAGORA
 
ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques
LINAGORA
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - Meetup
LINAGORA
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS Meetup
LINAGORA
 
Call a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFICall a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFI
LINAGORA
 
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
LINAGORA
 
Angular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entrepriseAngular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entreprise
LINAGORA
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - Linagora
LINAGORA
 
Industrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec DrupalIndustrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec Drupal
LINAGORA
 
CapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivitésCapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivités
LINAGORA
 
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
LINAGORA
 
Offre de demat d'Adullact projet
Offre de demat d'Adullact projet Offre de demat d'Adullact projet
Offre de demat d'Adullact projet
LINAGORA
 
La dématérialisation du conseil minicipal
La dématérialisation du conseil minicipalLa dématérialisation du conseil minicipal
La dématérialisation du conseil minicipal
LINAGORA
 
Open stack @ sierra wireless
Open stack @ sierra wirelessOpen stack @ sierra wireless
Open stack @ sierra wireless
LINAGORA
 
OpenStack - open source au service du Cloud
OpenStack - open source au service du CloudOpenStack - open source au service du Cloud
OpenStack - open source au service du Cloud
LINAGORA
 
Architecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAPArchitecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAP
LINAGORA
 
Présentation offre LINID
Présentation offre LINIDPrésentation offre LINID
Présentation offre LINID
LINAGORA
 
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
LINAGORA
 
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
LINAGORA
 
Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels
LINAGORA
 
Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !
LINAGORA
 
ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques
LINAGORA
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - Meetup
LINAGORA
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS Meetup
LINAGORA
 
Call a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFICall a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFI
LINAGORA
 
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
LINAGORA
 
Angular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entrepriseAngular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entreprise
LINAGORA
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - Linagora
LINAGORA
 
Industrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec DrupalIndustrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec Drupal
LINAGORA
 
CapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivitésCapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivités
LINAGORA
 
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
LINAGORA
 
Offre de demat d'Adullact projet
Offre de demat d'Adullact projet Offre de demat d'Adullact projet
Offre de demat d'Adullact projet
LINAGORA
 
La dématérialisation du conseil minicipal
La dématérialisation du conseil minicipalLa dématérialisation du conseil minicipal
La dématérialisation du conseil minicipal
LINAGORA
 
Open stack @ sierra wireless
Open stack @ sierra wirelessOpen stack @ sierra wireless
Open stack @ sierra wireless
LINAGORA
 
OpenStack - open source au service du Cloud
OpenStack - open source au service du CloudOpenStack - open source au service du Cloud
OpenStack - open source au service du Cloud
LINAGORA
 
Architecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAPArchitecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAP
LINAGORA
 
Présentation offre LINID
Présentation offre LINIDPrésentation offre LINID
Présentation offre LINID
LINAGORA
 
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
LINAGORA
 
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
LINAGORA
 
Ad

Recently uploaded (20)

Sets theories and applications that can used to imporve knowledge
Sets theories and applications that can used to imporve knowledgeSets theories and applications that can used to imporve knowledge
Sets theories and applications that can used to imporve knowledge
saumyasl2020
 
Process Mining at Deutsche Bank - Journey
Process Mining at Deutsche Bank - JourneyProcess Mining at Deutsche Bank - Journey
Process Mining at Deutsche Bank - Journey
Process mining Evangelist
 
Automated Melanoma Detection via Image Processing.pptx
Automated Melanoma Detection via Image Processing.pptxAutomated Melanoma Detection via Image Processing.pptx
Automated Melanoma Detection via Image Processing.pptx
handrymaharjan23
 
Voice Control robotic arm hggyghghgjgjhgjg
Voice Control robotic arm hggyghghgjgjhgjgVoice Control robotic arm hggyghghgjgjhgjg
Voice Control robotic arm hggyghghgjgjhgjg
4mg22ec401
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 
Adopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use caseAdopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use case
Process mining Evangelist
 
real illuminati Uganda agent 0782561496/0756664682
real illuminati Uganda agent 0782561496/0756664682real illuminati Uganda agent 0782561496/0756664682
real illuminati Uganda agent 0782561496/0756664682
way to join real illuminati Agent In Kampala Call/WhatsApp+256782561496/0756664682
 
CS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docxCS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docx
nidarizvitit
 
Dynamics 365 Business Rules Dynamics Dynamics
Dynamics 365 Business Rules Dynamics DynamicsDynamics 365 Business Rules Dynamics Dynamics
Dynamics 365 Business Rules Dynamics Dynamics
heyoubro69
 
Improving Product Manufacturing Processes
Improving Product Manufacturing ProcessesImproving Product Manufacturing Processes
Improving Product Manufacturing Processes
Process mining Evangelist
 
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfjOral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
maitripatel5301
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
AWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdfAWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdf
philsparkshome
 
Fundamentals of Data Analysis, its types, tools, algorithms
Fundamentals of Data Analysis, its types, tools, algorithmsFundamentals of Data Analysis, its types, tools, algorithms
Fundamentals of Data Analysis, its types, tools, algorithms
priyaiyerkbcsc
 
Process Mining Machine Recoveries to Reduce Downtime
Process Mining Machine Recoveries to Reduce DowntimeProcess Mining Machine Recoveries to Reduce Downtime
Process Mining Machine Recoveries to Reduce Downtime
Process mining Evangelist
 
What is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdfWhat is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdf
SaikatBasu37
 
Process Mining at Dimension Data - Jan vermeulen
Process Mining at Dimension Data - Jan vermeulenProcess Mining at Dimension Data - Jan vermeulen
Process Mining at Dimension Data - Jan vermeulen
Process mining Evangelist
 
hersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distributionhersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distribution
hershtara1
 
Process Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBSProcess Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBS
Process mining Evangelist
 
Transforming health care with ai powered
Transforming health care with ai poweredTransforming health care with ai powered
Transforming health care with ai powered
gowthamarvj
 
Sets theories and applications that can used to imporve knowledge
Sets theories and applications that can used to imporve knowledgeSets theories and applications that can used to imporve knowledge
Sets theories and applications that can used to imporve knowledge
saumyasl2020
 
Automated Melanoma Detection via Image Processing.pptx
Automated Melanoma Detection via Image Processing.pptxAutomated Melanoma Detection via Image Processing.pptx
Automated Melanoma Detection via Image Processing.pptx
handrymaharjan23
 
Voice Control robotic arm hggyghghgjgjhgjg
Voice Control robotic arm hggyghghgjgjhgjgVoice Control robotic arm hggyghghgjgjhgjg
Voice Control robotic arm hggyghghgjgjhgjg
4mg22ec401
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 
Adopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use caseAdopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use case
Process mining Evangelist
 
CS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docxCS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docx
nidarizvitit
 
Dynamics 365 Business Rules Dynamics Dynamics
Dynamics 365 Business Rules Dynamics DynamicsDynamics 365 Business Rules Dynamics Dynamics
Dynamics 365 Business Rules Dynamics Dynamics
heyoubro69
 
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfjOral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
maitripatel5301
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
AWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdfAWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdf
philsparkshome
 
Fundamentals of Data Analysis, its types, tools, algorithms
Fundamentals of Data Analysis, its types, tools, algorithmsFundamentals of Data Analysis, its types, tools, algorithms
Fundamentals of Data Analysis, its types, tools, algorithms
priyaiyerkbcsc
 
Process Mining Machine Recoveries to Reduce Downtime
Process Mining Machine Recoveries to Reduce DowntimeProcess Mining Machine Recoveries to Reduce Downtime
Process Mining Machine Recoveries to Reduce Downtime
Process mining Evangelist
 
What is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdfWhat is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdf
SaikatBasu37
 
Process Mining at Dimension Data - Jan vermeulen
Process Mining at Dimension Data - Jan vermeulenProcess Mining at Dimension Data - Jan vermeulen
Process Mining at Dimension Data - Jan vermeulen
Process mining Evangelist
 
hersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distributionhersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distribution
hershtara1
 
Process Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBSProcess Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBS
Process mining Evangelist
 
Transforming health care with ai powered
Transforming health care with ai poweredTransforming health care with ai powered
Transforming health care with ai powered
gowthamarvj
 
Ad

Comment faire ses mappings ElasticSearch aux petits oignons ? - LINAGORA

  • 2. Indexation d’un annuaire de restaurant ● Titre ● Description ● Prix ● Adresse ● Type 2
  • 3. Création d’un index sans mapping PUT restaurant { "settings": { "index": { "number_of_shards": 3, "number_of_replicas": 2 } } 3
  • 4. Indexation sans mapping PUT restaurant/restaurant/1 { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } 4
  • 5. Risque de l’indexation sans mapping PUT restaurant/restaurant/2 { "title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien" } { "error": { "root_cause": [ { "type": "mapper_parsing_exception", "reason": "failed to parse [title]" } ], "type": "mapper_parsing_exception", "reason": "failed to parse [title]", "caused_by": { "type": "number_format_exception", "reason": "For input string: "Pizza de l'ormeau"" } }, "status": 400 } 5
  • 6. Mapping inféré GET /restaurant/_mapping { "restaurant": { "mappings": { "restaurant": { "properties": { "adresse": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "description": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "prix": { "type": "long" }, "title": { "type": "long" }, "type": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } } } 6
  • 7. Création d’un mapping PUT :url/restaurant { "settings": { "index": {"number_of_shards": 3, "number_of_replicas": 2} }, "mappings": { "restaurant": { "properties": { "title": {"type": "text"}, "description": {"type": "text"}, "price": {"type": "integer"}, "adresse": {"type": "text"}, "type": { "type": "keyword"} } } } } 7
  • 8. Indexation de quelques restaurants POST :url/restaurant/restaurant/_bulk {"index": {"_id": 1}} {"title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"} {"index": {"_id": 2}} {"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"} {"index": {"_id": 3}} {"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique"} 8
  • 9. Recherche basique GET :url/restaurant/_search { "query": { "match": { "description": "asiatique" } } } { "hits": { "total": 1, "max_score": 0.6395861, "hits": [ { "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } ] } } 9
  • 10. Mise en défaut de notre mapping GET :url/restaurant/_search { "query": { "match": { "description": "asiatiques" } } } { "hits": { "total": 0, "max_score": null, "hits": [] } } 10
  • 11. Qu’est ce qu’un analyseur ● Transforme une chaîne de caractères en token ○ Ex: “Le chat est rouge” -> [“le”, “chat”, “est”, “rouge”] ● Les tokens permettent de construire un index inversé 11
  • 12. Qu’est ce qu’un index inversé 12
  • 13. Explication: analyseur par défaut GET /_analyze { "analyzer": "standard", "text": "Un restaurant asiatique très copieux" } { "tokens": [{ "token": "un", "start_offset": 0, "end_offset": 2, "type": "<ALPHANUM>", "position": 0 },{ "token": "restaurant", "start_offset": 3, "end_offset": 13, "type": "<ALPHANUM>", "position": 1 },{ "token": "asiatique", "start_offset": 14, "end_offset": 23, "type": "<ALPHANUM>", "position": 2 },{ "token": "très", "start_offset": 24, "end_offset": 28, "type": "<ALPHANUM>", "position": 3 },{ "token": "copieux", "start_offset": 29, "end_offset": 36, "type": "<ALPHANUM>", "position": 4 } ] } 13
  • 14. Explication: analyseur “french” GET /_analyze { "analyzer": "french", "text": "Un restaurant asiatique très copieux" } { "tokens": [ { "token": "restaurant", "start_offset": 3, "end_offset": 13, "type": "<ALPHANUM>", "position": 1 },{ "token": "asiat", "start_offset": 14, "end_offset": 23, "type": "<ALPHANUM>", "position": 2 },{ "token": "trè", "start_offset": 24, "end_offset": 28, "type": "<ALPHANUM>", "position": 3 },{ "token": "copieu", "start_offset": 29, "end_offset": 36, "type": "<ALPHANUM>", "position": 4 } ] } 14
  • 15. Décomposition d’un analyseur Elasticsearch décompose l’analyse en trois étapes: ● Filtrage des caractères (ex: suppression de balises html) ● Découpage en “token” ● Filtrage des tokens: ○ Suppression de token (mot vide de sens “un”, “le”, “la”) ○ Transformation (lemmatisation...) ○ Ajout de tokens (synonyme) 15
  • 16. Décomposition de l’analyseur french GET /_analyze { "tokenizer": "standard", "filter": [ { "type": "elision", "articles_case": true, "articles": [ "l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu" ] }, { "type": "stop", "stopwords": "_french_" }, { "type": "stemmer", "language": "french" } ], "text": "ce n'est qu'un restaurant asiatique très copieux" } “ce n’est qu’un restaurant asiatique très copieux” [“ce”, “n’est”, “qu’un”, “restaurant”, “asiatique”, “très”, “copieux”] [“ce”, “est”, “un”, “restaurant”, “asiatique”, “très”, “copieux”] [“restaurant”, “asiatique”, “très”, “copieux”] [“restaurant”, “asiat”, “trè”, “copieu”] elision standard tokenizer stopwords french stemming 16
  • 17. Spécification de l’analyseur dans le mapping { "settings": { "index": { "number_of_shards": 3, "number_of_replicas": 2 } }, "mappings": { "restaurant": { "properties": { "title": {fields: {"type": "text", "analyzer": "french"}}, "description": {"type": "text", "analyzer": "french"}, "price": {"type": "integer"}, "adresse": {"type": "text", "analyzer": "french"}, "type": { "type": "keyword"} } } } } 17
  • 18. Recherche résiliente aux erreurs de frappe GET /restaurant/restaurant/_search { "query": { "match": { "description": "asiatuques" } } } { "hits": { "total": 0, "max_score": null, "hits": [] } } 18
  • 19. Une solution le ngram token filter GET /_analyze { "tokenizer": "standard", "filter": [ { "type": "ngram", "min_gram": 3, "max_gram": 7 } ], "text": "asiatuque" } [ "asi", "asia", "asiat", "asiatu", "asiatuq", "sia", "siat", "siatu", "siatuq", "siatuqu", "iat", "iatu", "iatuq", "iatuqu", "iatuque", "atu", "atuq", "atuqu", "atuque", "tuq", "tuqu", "tuque", "uqu", "uque", "que" ] 19
  • 20. Création d’un analyseur custom pour utiliser le ngram filter PUT /restaurant { "settings": { "analysis": { "filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}}, "analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]}} } }, "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "ngram_analyzer"}, "description": {"type": "text", "analyzer": "ngram_analyzer"}, "price": {"type": "integer"}, "adresse": {"type": "text", "analyzer": "ngram_analyzer"}, "type": {"type": "keyword"} } } } 20
  • 21. GET /restaurant/restaurant/_search { "query": { "match": { "description": "asiatuques" } } } { "hits": { "hits": [ { "_score": 0.60128295, "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } }, { "_score": 0.46237043, "_source": { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" 21
  • 22. Bruit induit par le ngram GET /restaurant/restaurant/_search { "query": { "match": { "description": "gastronomique" } } } { "hits": { "hits": [ { "_score": 0.6277555, "_source": { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } },{ "_score": 0.56373334, "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } }, 22
  • 23. Spécifier plusieurs analyseurs pour un champs PUT /restaurant { "settings": { "analysis": { "filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}}, "analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]} } } }, "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "description": { "type": "text", "analyzer": "french", "fields": { "ngram": { "type": "text", "analyzer": "ngram_analyzer"} }, "price": {"type": "integer"}, 23
  • 24. Utilisation de plusieurs champs lors d’une recherche GET /restaurant/restaurant/_search { "query": { "multi_match": { "query": "gastronomique", "fields": [ "description^4", "description.ngram" ] } } } { "hits": { "hits": [ { "_score": 2.0649285, "_source": { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } }, { "_score": 0 .56373334, "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } }, { "_index": "restaurant", 24
  • 25. Ignorer ou ne pas ignorer les stopwords tel est la question POST :url/restaurant/restaurant/_bulk {"index": {"_id": 1}} {"title": 42, "description": "Un restaurant gastronomique donc cher ou tout plat coûte cher (42 euros)", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"} {"index": {"_id": 2}} {"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"} {"index": {"_id": 3}} {"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique"} 25
  • 26. Les stopwords ne sont pas forcément vide de sens GET /restaurant/restaurant/_search { "query": { "match_phrase": { "description": "pas cher" } } } { "hits": { "hits": [ { "_source": { "title": 42, "description": "Un restaurant gastronomique donc cher ou tout plat coûte cher (42 euros)", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } },{ "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } 26
  • 27. Modification de l’analyser french pour garder les stopwords PUT /restaurant { "settings": { "analysis": { "filter": { "french_elision": { "type": "elision", "articles_case": true, "articles": [“l", "m", "t", "qu", "n", "s","j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"] }, "french_stemmer": {"type": "stemmer", "language": "light_french"} }, "analyzer": { "custom_french": { "tokenizer": "standard", "filter": [ "french_elision", "lowercase", "french_stemmer" ] } 27
  • 28. GET /restaurant/restaurant/_search { "query": { "match_phrase": { "description": "pas cher" } } } { "hits": { "hits": [ { "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } ] } } 28
  • 29. Rechercher avec les stopwords sans diminuer les performances GET /restaurant/restaurant/_search { "query": { "match": { "description": { "query": "restaurant pas cher", "cutoff_frequency": 0.01 } } } } GET /restaurant/restaurant/_search { "query": { "bool": { "must": { "bool": { "should": [ {"term": {"description": "restaurant"}}, {"term": {"description": "cher"}}] } }, "should": [ {"match": { "description": "pas" }} ] } 29
  • 30. Personnaliser le “scoring” GET /restaurant/restaurant/_search { "query": { "function_score": { "query": { "match": { "adresse": "toulouse" } }, "functions": [{ "filter": { "terms": { "type": ["asiatique", "italien"]}}, "weight": 2 }] } } } 30
  • 31. Personnaliser le “scoring” GET /restaurant/restaurant/_search { "query": { "function_score": { "query": { "match": { "adresse": "toulouse" } }, "script_score": { "script": { "lang": "painless", "inline": "_score * ( 1 + 10/doc['prix'].value)" } } } } } { "hits": { "hits": [ { "_score": 0.53484553, "_source": { "title": "Pizza de l'ormeau", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien" } }, { "_score": 0.26742277, "_source": { "title": 42, "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } }, { "_score": 0.26742277, "_source": { "title": "Chez l'oncle chan", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } ] } } 31
  • 32. Comment indexer les documents multilingues Trois cas: ● Champs avec plusieurs langages (ex: {"message": "warning | attention | cuidado"}) ○ Ngram ○ Analysé plusieurs fois le même champs avec un analyseur par langage ● Un champ par langue: ○ Facile car on peut spécifier un analyseur différent par langue ○ Attention de ne pas se retrouver avec un index parsemé ● Une version du document par langue (à favoriser) ○ Un index par document ○ Surtout ne pas utiliser des types pour chaque langue dans le même index (problème de statistique) 32
  • 33. Gestion des synonymes PUT /restaurant { "settings": { "analysis": { "filter": { "french_elision": { "type": "elision", "articles_case": true, "articles": ["l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"] }, "french_stemmer": {"type": "stemmer", "language": "light_french"}, "french_synonym": {"type": "synonym", "synonyms": ["sou marin => sandwitch", "formul, menu"]} }, "analyzer": { "french_with_synonym": { "tokenizer": "standard", "filter": ["french_elision", "lowercase", "french_stemmer", "french_synonym"] } } } }, "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "description": { "type": "text", "analyzer": "french", "search_analyzer": "french_with_synonym"}, "price": {"type": "integer"}, "adresse": {"type": "text", "analyzer": "french"}, "coord": {"type": "geo_point"}, 33
  • 34. Gestions des synonymes GET /restaurant/restaurant/_search { "query": { "match": {"description": "sous-marins"} } } { "hits": { "hits": [ { "_source": { "title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5577519,1.4625753" } } ] } } 34
  • 35. Données géolocalisées PUT /restaurant { "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "description": {"type": "text", "analyzer": "french" }, "price": {"type": "integer"}, "adresse": {"type": "text","analyzer": "french"}, "coord": {"type": "geo_point"}, "type": { "type": "keyword"} } } } } 35
  • 36. Données géolocalisées POST restaurant/restaurant/_bulk {"index": {"_id": 1}} {"title": "bistronomique", "description": "Un restaurant bon mais un petit peu cher, les desserts sont excellents", "price": 17, "adresse": "73 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.57417,1.4905748"} {"index": {"_id": 2}} {"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien", "coord": "43.579225,1.4835248"} {"index": {"_id": 3}} {"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "18 rue des cosmonautetes, 31400 TOULOUSE", "type": "asiatique", "coord": "43.5612759,1.4936073"} {"index": {"_id": 4}} {"title:": "Un fastfood très connu", "description": "service très rapide, rapport qualité/prix médiocre", "price": 8, "adresse": "210 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5536343,1.476165"} {"index": {"_id": 5}} {"title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5577519,1.4625753"} {"index": {"_id": 6}} {"title:": "L'évidence", "description": "restaurant copieux et pas cher, cependant c'est pas bon", "price": 12, "adresse": "38 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.5770109,1.4846573"} 36
  • 37. Filtrage et trie sur données géolocalisées GET /restaurant/restaurant/_search { "query": { "bool": { "filter": [ {"term": {"type":"français"}}, {"geo_distance": { "distance": "1km", "coord": {"lat": 43.5739329, "lon": 1.4893669} }} ] } }, "sort": [{ "geo_distance": { "coord": {"lat": 43.5739329, "lon": 1.4893669}, "unit": "km" } }] { "hits": { "hits": [ { "_source": { "title": "bistronomique", "description": "Un restaurant bon mais un petit peu cher, les desserts sont "price": 17, "adresse": "73 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.57417,1.4905748" }, "sort": [0.10081529266640063] },{ "_source": { "title:": "L'évidence", "description": "restaurant copieux et pas cher, cependant c'est pas bon", "price": 12, "adresse": "38 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.5770109,1.4846573" }, "sort": [0.510960087579506] },{ "_source": { "title:": "Chez Ingalls", "description": "Contemporain et rustique, ce restaurant avec cheminée sert savoyardes et des grillades", 37
  • 38. Explication de la requête Bool GET /restaurant/restaurant/_search { "query": { "bool": { "must": {"match": {"description": "sandwitch"}}, "should" : [ {"match": {"description": "bon"}}, {"match": {"description": "excellent"}} ], "must_not": [ {"match_phrase": { "description": "pas bon" }} ], "filter": [ {"range": {"price": { "lte": "20" }}} ] } } 38
  • 39. Explication de la requête Bool GET /restaurant/restaurant/_search { "query": { "bool": { "should" : [ {"match": {"description": "bon"}}, {"match": {"description": "excellent"}}, {"match": {"description": "service rapide"}} ], "minimum_number_should_match": 2 } } } 39
  • 40. Proposer une recherche avancé à vos utilisateurs GET /restaurant/restaurant/_search { "query": { "simple_query_string": { "fields": ["description", "title^2", "adresse", "type"], "query": "-"pas bon" +(pizzi~2 OR sandwitch)" } } } GET /restaurant/restaurant/_search { "query": { "bool": { "must_not": { "multi_match": { "fields": [ "description", , "title^2", "adresse", "type"], "type": "phrase", "query": "pas bon" } }, "should": [ {"multi_match": { "fields": [ "description", , "title^2", "adresse", "type"], "fuziness": 2, "max_expansions": 50, "query": "pizzi" } }, {"multi_match": { "fields": [ "description", , "title^2", "adresse", "type"], "query": "sandwitch" } 40
  • 41. Alias: comment se donner des marges de manoeuvre PUT /restaurant_v1/ { "mappings": { "restaurant": { "properties": { "title": {"type": "text"}, "lat": {"type": "double"}, "lon": {"type": "double"} } } } } POST /_aliases { "actions": [ {"add": {"index": "restaurant_v1", "alias": "restaurant_search"}}, {"add": {"index": "restaurant_v1", "alias": "restaurant_write"}} ] } 41
  • 42. Alias, Pipeline et reindexion PUT /restaurant_v2 { "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "position": {"type": "geo_point"} } } } } PUT /_ingest/pipeline/fixing_position { "description": "move lat lon into position parameter", "processors": [ {"rename": {"field": "lat", "target_field": "position.lat"}}, {"rename": {"field": "lon", "target_field": "position.lon"}} ] } POST /_aliases { "actions": [ {"remove": {"index": "restaurant_v1", "alias": "restaurant_search"}}, {"remove": {"index": "restaurant_v1", "alias": "restaurant_write"}}, {"add": {"index": "restaurant_v2", "alias": "restaurant_search"}}, {"add": {"index": "restaurant_v2", "alias": "restaurant_write"}} ] } POST /_reindex { "source": {"index": "restaurant_v1"}, "dest": {"index": "restaurant_v2", "pipeline": "fixing_position"} } 42
  • 43. Analyse des données des interventions des pompiers de 2005 à 2014 PUT /pompier { "mappings": { "intervention": { "properties": { "date": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss"}, "type_incident": { "type": "keyword" }, "description_groupe": { "type": "keyword" }, "caserne": { "type": "integer"}, "ville": { "type": "keyword"}, "arrondissement": { "type": "keyword"}, "division": {"type": "integer"}, "position": {"type": "geo_point"}, "nombre_unites": {"type": "integer"} } } } } 43
  • 44. Voir les différents incidents GET /pompier/interventions/_search { "size": 0, "aggs": { "type_incident": { "terms": {"field": "type_incident", "size": 100} } } } { "aggregations": { "type_incident": { "buckets": [ {"key": "Premier répondant", "doc_count": 437891}, {"key": "Appel de Cie de détection", "doc_count": 76157}, {"key": "Alarme privé ou locale", "doc_count": 60879}, {"key": "Ac.véh./1R/s.v./ext/29B/D", "doc_count": 41734}, {"key": "10-22 sans feu", "doc_count": 29283}, {"key": "Acc. sans victime sfeu - ext.", "doc_count": 27663}, {"key": "Inondation", "doc_count": 26801}, {"key": "Problèmes électriques", "doc_count": 23495}, {"key": "Aliments surchauffés", "doc_count": 23428}, {"key": "Odeur suspecte - gaz", "doc_count": 21158}, {"key": "Déchets en feu", "doc_count": 18007}, {"key": "Ascenseur", "doc_count": 12703}, {"key": "Feu de champ *", "doc_count": 11518}, {"key": "Structure dangereuse", "doc_count": 9958}, {"key": "10-22 avec feu", "doc_count": 9876}, {"key": "Alarme vérification", "doc_count": 8328}, {"key": "Aide à un citoyen", "doc_count": 7722}, {"key": "Fuite ext.:hydrocar. liq. div.", "doc_count": 7351}, {"key": "Ac.véh./1R/s.v./V.R./29B/D", "doc_count": 6232}, {"key": "Feu de véhicule extérieur", "doc_count": 5943}, {"key": "Fausse alerte 10-19", "doc_count": 4680}, {"key": "Acc. sans victime sfeu - v.r", "doc_count": 3494}, {"key": "Assistance serv. muni.", "doc_count": 3431}, {"key": "Avertisseur de CO", "doc_count": 2542}, {"key": "Fuite gaz naturel 10-22", "doc_count": 1928}, {"key": "Matières dangereuses / 10-22", "doc_count": 1905}, {"key": "Feu de bâtiment", "doc_count": 1880}, {"key": "Senteur de feu à l'extérieur", "doc_count": 1566}, {"key": "Surchauffe - véhicule", "doc_count": 1499}, {"key": "Feu / Agravation possible", "doc_count": 1281}, {"key": "Fuite gaz naturel 10-09", "doc_count": 1257}, {"key": "Acc.véh/1rép/vict/ext 29D04", "doc_count": 1015}, {"key": "Acc. véh victime sfeu - (ext.)", "doc_count": 971}, 44
  • 45. Agrégations imbriquées GET /pompier/interventions/_search { "size": 0, "aggs": { "ville": { "terms": {"field": "ville"}, "aggs": { "arrondissement": { "terms": {"field": "arrondissement"} } } } } } { "aggregations": {"ville": {"buckets": [ { "key": "Montréal", "doc_count": 768955, "arrondissement": {"buckets": [ {"key": "Ville-Marie", "doc_count": 83010}, {"key": "Mercier / Hochelaga-Maisonneuve", "doc_count": 67272}, {"key": "Côte-des-Neiges / Notre-Dame-de-Grâce", "doc_count": 65933}, {"key": "Villeray / St-Michel / Parc Extension", "doc_count": 60951}, {"key": "Rosemont / Petite-Patrie", "doc_count": 59213}, {"key": "Ahuntsic / Cartierville", "doc_count": 57721}, {"key": "Plateau Mont-Royal", "doc_count": 53344}, {"key": "Montréal-Nord", "doc_count": 40757}, {"key": "Sud-Ouest", "doc_count": 39936}, {"key": "Rivière-des-Prairies / Pointe-aux-Trembles", "doc_count": 38139} ]} }, { "key": "Dollard-des-Ormeaux", "doc_count": 17961, "arrondissement": {"buckets": [ {"key": "Indéterminé", "doc_count": 13452}, {"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 4477}, {"key": "Pierrefonds / Senneville", "doc_count": 10}, {"key": "Dorval / Ile Dorval", "doc_count": 8}, {"key": "Pointe-Claire", "doc_count": 8}, {"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 6} ]} }, { "key": "Pointe-Claire", "doc_count": 17925, "arrondissement": {"buckets": [ {"key": "Indéterminé", "doc_count": 13126}, {"key": "Pointe-Claire", "doc_count": 4766}, {"key": "Dorval / Ile Dorval", "doc_count": 12}, {"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 7}, {"key": "Kirkland", "doc_count": 7}, {"key": "Beaconsfield / Baie d'Urfé", "doc_count": 5}, {"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 1}, {"key": "St-Laurent", "doc_count": 1} 45
  • 46. Calcul de moyenne et trie d'agrégation GET /pompier/interventions/_search { "size": 0, "aggs": { "avg_nombre_unites_general": { "avg": {"field": "nombre_unites"} }, "type_incident": { "terms": { "field": "type_incident", "size": 5, "order" : {"avg_nombre_unites": "desc"} }, "aggs": { "avg_nombre_unites": { "avg": {"field": "nombre_unites"} } } } } { "aggregations": { "type_incident": { "buckets": [ { "key": "Feu / 5e Alerte", "doc_count": 162, "avg_nombre_unites": {"value": 70.9074074074074} }, { "key": "Feu / 4e Alerte", "doc_count": 100, "avg_nombre_unites": {"value": 49.36} }, { "key": "Troisième alerte/autre que BAT", "doc_count": 1, "avg_nombre_unites": {"value": 43.0} }, { "key": "Feu / 3e Alerte", "doc_count": 173, "avg_nombre_unites": {"value": 41.445086705202314} }, { "key": "Deuxième alerte/autre que BAT", "doc_count": 8, "avg_nombre_unites": {"value": 37.5} } ] }, "avg_nombre_unites_general": {"value": 2.1374461758713728} } } 46
  • 47. Percentile GET /pompier/interventions/_search { "size": 0, "aggs": { "unites_percentile": { "percentiles": { "field": "nombre_unites", "percents": [25, 50, 75, 100] } } } } { "aggregations": { "unites_percentile": { "values": { "25.0": 1.0, "50.0": 1.0, "75.0": 3.0, "100.0": 275.0 } } } } 47
  • 48. Histogram GET /pompier/interventions/_search { "size": 0, "query": { "term": {"type_incident": "Inondation"} }, "aggs": { "unites_histogram": { "histogram": { "field": "nombre_unites", "order": {"_key": "asc"}, "interval": 1 }, "aggs": { "ville": { "terms": {"field": "ville", "size": 1} } } } } } { "aggregations": { "unites_histogram": { "buckets": [ { "key": 1.0, "doc_count": 23507, "ville": {"buckets": [{"key": "Montréal", "doc_count": 19417}]} },{ "key": 2.0, "doc_count": 1550, "ville": {"buckets": [{"key": "Montréal", "doc_count": 1229}]} },{ "key": 3.0, "doc_count": 563, "ville": {"buckets": [{"key": "Montréal", "doc_count": 404}]} },{ "key": 4.0, "doc_count": 449, "ville": {"buckets": [{"key": "Montréal", "doc_count": 334}]} },{ "key": 5.0, "doc_count": 310, "ville": {"buckets": [{"key": "Montréal", "doc_count": 253}]} },{ "key": 6.0, "doc_count": 215, "ville": {"buckets": [{"key": "Montréal", "doc_count": 173}]} },{ "key": 7.0, "doc_count": 136, "ville": {"buckets": [{"key": "Montréal", "doc_count": 112}]} },{ "key": 8.0, "doc_count": 35, "ville": {"buckets": [{"key": "Montréal", "doc_count": 30}]} },{ "key": 9.0, "doc_count": 10, "ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]} },{ "key": 10.0, "doc_count": 11, "ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]} },{ "key": 11.0, "doc_count": 2, "ville": {"buckets": [{"key": "Montréal", "doc_count": 2}]} 48
  • 49. “Significant term” GET /pompier/interventions/_search { "size": 0, "query": { "term": {"type_incident": "Inondation"} }, "aggs": { "ville": { "significant_terms": {"field": "ville", "size": 5, "percentage": {}} } } } { "aggregations": { "ville": { "doc_count": 26801, "buckets": [ { "key": "Ile-Bizard", "score": 0.10029498525073746, "doc_count": 68, "bg_count": 678 }, { "key": "Montréal-Nord", "score": 0.0826544804291675, "doc_count": 416, "bg_count": 5033 }, { "key": "Roxboro", "score": 0.08181818181818182, "doc_count": 27, "bg_count": 330 }, { "key": "Côte St-Luc", "score": 0.07654825526563974, "doc_count": 487, "bg_count": 6362 }, { "key": "Saint-Laurent", "score": 0.07317073170731707, "doc_count": 465, "bg_count": 6355 49
  • 50. Agrégation et données géolocalisées GET :url/pompier/interventions/_search { "size": 0, "query": { "regexp": {"type_incident": "Feu.*"} }, "aggs": { "distance_from_here": { "geo_distance": { "field": "position", "unit": "km", "origin": { "lat": 45.495902, "lon": -73.554263 }, "ranges": [ { "to": 2}, {"from":2, "to": 4}, {"from":4, "to": 6}, {"from": 6, "to": 8}, {"from": 8}] } } } { "aggregations": { "distance_from_here": { "buckets": [ { "key": "*-2.0", "from": 0.0, "to": 2.0, "doc_count": 80 }, { "key": "2.0-4.0", "from": 2.0, "to": 4.0, "doc_count": 266 }, { "key": "4.0-6.0", "from": 4.0, "to": 6.0, "doc_count": 320 }, { "key": "6.0-8.0", "from": 6.0, "to": 8.0, "doc_count": 326 }, { "key": "8.0-*", "from": 8.0, "doc_count": 1720 } ] } } } 50
  • 51. Il y a t-il des questions ? ? 51
  • 52. Proposer une recherche avancé à vos utilisateurs GET /restaurant/restaurant/_search { "query": { "simple_query_string": { "fields": ["description", "title^2", "adresse", "type"], "query": ""service rapide"~2" } } } "hits": { "hits": [ { "_source": { "title:": "Un fastfood très connu", "description": "service très rapide, rapport qualité/prix médiocre", "price": 8, "adresse": "210 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5536343,1.476165" } },{ "_source": { "title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 GET /restaurant/restaurant/_search { "query": { "match_phrase": { "description": { "slop": 2, "query": "service rapide" } } } 52