Search engine. Elasticsearch

Search engine.
Elasticsearch
Andriy S.

What is Search Engine?
Search Engine - a set of applications designed to search for information. Usually
is part of the search engine.
The main criteria for the quality of the search engine is the relevance (degree of
compliance with the request and found that the relevance of results), fullness
index, accounting morphology of the language.
Most search services: Sphinx, Solr, ElasticSearch, etc...

Elasticsearch
Elasticsearch - search engine from json rest api, uses Lucene and written in Java.
Apache Lucene - a free library of open-source full-text search. Implemented in
Java, supported by the Apache Software Foundation and is produced under
license Apache Software.
Libraries: Java, C #, Python, JavaScript, PHP, Perl, Ruby

Requirements
In developing heavy websites or corporate systems often have trouble developing
fast and easy search engine. The following are the most important, in my opinion,
the requirements for this service:
◆ Speed
◆ Easy installation and configuration
◆ Price (preferably free and open source)
◆ Information exchange format JSON (over HTTP)
◆ Indexing in real time
◆ Multi-tenancy (flexible settings for individual user)

Index
Index - a database, document - a table in it, by understandable terms.
The document is a document format JSON, which is stored in elasticsearch. It's like a row in
a relational database. Each document is stored in the index, and is the type and ID. The
document is a JSON object (also known in other languages as hash / HashMap / associative
array) that contains zero or more fields or key-value pairs. The original JSON document
indexing will be stored in the field _source that returns a default receipt or document search.

Analysis
Analysis is the process of converting text, like the body of any email, into tokens or
terms which are added to the inverted index for searching. Analysis is performed
by an analyzer which can be either a built-in analyzer or a custom analyzer
defined per index.

Elasticsearch Mapping
Mapping is the process of defining how a document, and the fields it contains, are
stored and indexed. For instance, use mappings to define:
◆which string fields should be treated as full text fields.
◆which fields contain numbers, dates, or geolocations.
◆whether the values of all fields in the document should be indexed into the catch-all _all field.
◆the format of date values.
◆custom rules to control the mapping for dynamically added fields.

Elasticsearch Mapping
Each field has a data type which can be:
◆a simple type like text, keyword, date, long, double, boolean or ip.
◆a type which supports the hierarchical nature of JSON such as object or nested.
◆or a specialised type like geo_point, geo_shape, or completion.

Documents CRUD
Often, we use the terms object and document interchangeably. However, there is a distinction. An object
is just a JSON object—similar to what is known as a hash, hashmap, dictionary, or associative array.
Objects may contain other objects. In Elasticsearch, the term document has a specific meaning. It refers
to the top-level, or root object that is serialized into JSON and stored in Elasticsearch under a unique ID.

Query and filter context
The behaviour of a query clause depends on whether it is used in query context or in filter context:
1. Query context
A query clause used in query context answers the question “How well does this document match this query clause?”
Besides deciding whether or not the document matches, the query clause also calculates a _score representing how well
the document matches, relative to other documents.
1. Filter context
In filter context, a query clause answers the question “Does this document match this query clause?” The answer is a
simple Yes or No — no scores are calculated. Filter context is mostly used for filtering structured data, e.g.

Search examples with DSL Builder

Geolocation Filter
Elasticsearch offers two ways of representing geolocations: latitude-longitude points using the geo_point field type, and
complex shapes defined in GeoJSON, using the geo_shape field type.
Geo-points allow you to find points within a certain distance of another point, to calculate distances between two points for
sorting or relevance scoring, or to aggregate into a grid to display on a map. Geo-shapes, on the other hand, are used
purely for filtering. They can be used to decide whether two shapes overlap, or whether one shape completely contains
other shapes.
Four geo-point filters can be used to include or exclude documents by geolocation:
● geo_bounding_box
Find geo-points that fall within the specified rectangle.
● geo_distance
Find geo-points within the specified distance of a central point.
● geo_distance_range
Find geo-points within a specified minimum and maximum distance from a central point.
● geo_polygon
Find geo-points that fall within the specified polygon. This filter is very expensive. If you find yourself wanting to use
it, you should be looking at geo-shapes instead.

Thank You!
Questions? Comments?

Search engine. Elasticsearch

Recommended

More Related Content

What's hot (19)

Similar to Search engine. Elasticsearch (20)

Recently uploaded (20)

Search engine. Elasticsearch