Best Practices in Elasticsearch
Best Practices in Elasticsearch
Best Indexing:
When searching across multiple fields for a single concept, we want to look
for as many words as possible within the same field.
We figured out which is the best way to index in the following two approaches and
the second one turns out to be the best.
i.
Search across many fields within index OR
ii.
Consolidate all relevant keywords into a keyword field
i) Search across many fields within index OR:
We have search the text across multiple fields and we got the following
result.
Data Set:
"hits": [
{
"_index": "searchtest_28",
"_type": "test28",
"_id": "0",
"_score": 1,
"_source": {
"Keyword": "Chad Holan 10070723 1000110679 40ZS West Des
Moines PENDING 100001",
"Module": "Contract",
"Sub Moudule": "Contracts",
"First Name": "Chad",
"Last Name": "Holan",
"Customer Id": "10070723",
"Context": "ContractInfo",
"Date": "10/17/2016",
"Contract#": "1000110679",
"Description": "40ZS",
"Address": "West Des Moines",
"Status": "PENDING",
"PO#": "100001"
}
},
Data Set:
"hits": [
{
"_index": "searchtest_28",
"_type": "test28",
"_id": "0",
"_score": 1,
"_source": {
"Keyword": "Chad Holan 10070723 1000110679 40ZS West Des Moines
PENDING 100001",
"Module": "Contract",
"Sub Moudule": "Contracts",
"First Name": "Chad",
"Last Name": "Holan",
"Customer Id": "10070723",
"Context": "ContractInfo",
"Date": "10/17/2016",
"Contract#": "1000110679",
"Description": "40ZS",
"Address": "West Des Moines",
"Status": "PENDING",
"PO#": "100001"
}
},
Set explicit mappings, even for primitive types like float, boolean, decimal,
"my_doctype": {
"properties": {
"search": {
"type": "string",
"analyzer": "my_fulltext_analyzer"
},
"first_name": {
"type": "string",
"copy_to": "search"
},
"last_name": {
"type": "string",
"copy_to": "search"
}
}
We can specify on which field to search on with the default_field key. For
example for a query search with filters, we may have a query body like:
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "search",
"query": "foobar"
},
"term": {
"foo": true
}
}
]
}
}
Routing : Routing your document to a particular shard, e.g. for a ecommerce site, you can user category name as routing value.
ElasticSearch heap should have around 50% of the available memory on the
machine.
File Descriptors: Raise the number of available file descriptors to the user
running Elastic Search to 65535.