0% found this document useful (0 votes)

163 views8 pages

Subject: A Glance To Elasticsearch in The Era of Analytics and Machine Learning

Elasticsearch is a search and analytics engine that allows storing and searching of large amounts of data. It uses an inverted index to allow fast searching through documents. The ELK stack uses Elasticsearch for storage and indexing, Logstash for data processing, and Kibana for visualization and dashboards. This allows for log aggregation, transformation, and analysis across multiple systems and data sources.

Uploaded by

Suchismita Sahu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

163 views8 pages

Subject: A Glance To Elasticsearch in The Era of Analytics and Machine Learning

Uploaded by

Suchismita Sahu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Subject: A Glance to Elasticsearch in the era of Analytics and Machine

Learning:

What is Elasticsearch?

Think of a situation where I have huge amount of data having terabyte size and I need to search a
specific term in it.

Definitely, I have to use a tool for this. But unfortunately, most of the search engines available in
the market are not open source.

So, here Elasticsearch comes into picture.

 Elasticsearch is a full text search, readily-scalable, enterprise-grade, analytics engine for

all types of data such as textual, numerical, geospatial, structured, and unstructured. It is
accessible through RESTful web service interface and uses schema less JSON (JavaScript
Object Notation) documents to store data. It is platform independent, which enables users
to search from a very large amount of data in a very efficient way with a very high speed.
It supports a variety of use cases like allowing users to easily search through any portal,
collect and analyse log data, build business intelligence dashboards to quickly analyse and
visualize data.

Concept and Components:

Fig: 1: Elasticsearch Concept (Source: W3 school)

 Cluster: A cluster is a collection of one or more nodes (servers) that together holds the
entire data and provides federated indexing and search capabilities across all nodes, which
should be identified by an unique name. By default, it is ‘elasticsearch’. ‘Cluster Naming’ is
mandatory, because a node set up to join the cluster is possible through its name. We
should not reuse the same cluster names in different environments, otherwise we might
end up with nodes joining the wrong cluster. For instance we can name respective clusters
as logging-dev, logging-stage, and logging-prod for the development, staging, and
production clusters respectively.

 Node: A node is a single server that is part of our cluster where data can be stored and
this participates in the cluster’s indexing and search capabilities, which should be identified
by also a name. We can define any node name we want if we do not want the default.
Unique ‘Node Naming’ is important for administration purposes where we want to identify
which servers in our network correspond to which nodes in our Elasticsearch cluster. A
master node manages the entire cluster.

 Index: An index is a collection of documents that have somewhat similar characteristics. In

a single cluster, we can define as many indexes as we want, as per our requirement. An
index is an equivalent to the schema of a relational database. Elasticsearch, instead of
searching the text directly, it searches an index. So, it supports fast search responses.
Similar to retrieve, pages in a book related to a keyword by scanning the index at the back
of a book, as opposed to searching every word of every page of the book. This type of
index is called an ‘Inverted Index’, as it inverts a page-centric data structure (page-
>words) to a keyword-centric data structure (word->pages). Elasticsearch supports
Inverted Index for which it uses Apache Lucene to create and manage this inverted index.

 Type: Type is the Elasticsearch meta object where the mapping for an index is stored.

 Alias: Alias is a reference to an Elasticsearch index, which can be mapped to more than
one index.

 Document: A document is a basic unit of information that can be indexed, which can be
expressed in JavaScript Object Notation (JSON) format. Connected query returns the
parent and child rows.

 Shards and Replicas: Elasticsearch provides the ability to subdivide our index into multiple
pieces called shards. When we create an index, we have to define the number of shards
that we want. Each shard is a fully-functional and independent ‘index’, which can be
hosted on any node in the cluster. Elasticsearch allows us to make one or more copies of
our index’s shards into what are called replica shards, or replicas for short. After the index
is created, we can change the number of replicas dynamically anytime but we cannot
change the number of shards, once we configure it.

 REST API: Rest API are used by client to interact with the Elasticsearch through http
methods (GET, POST, PUT, DELETE).
 NRT(Near-Real-Time): Elasticsearch is one of the near-real-time search platforms. Once a
document is indexed, it becomes searchable within less than 1 second.

How Elasticsearch represents data?

In Elasticsearch, we search a document and an index consists of one or more Documents, and a

Document consists of one or more Fields, where we need to specify a schema before indexing
documents, it is necessary to add mapping declarations if we require anything but the most basic
fields and operations. In database terminology, a Document corresponds to a table row and a Field
corresponds to a table column.

Mapping is the process of defining how a document, and its fields are stored and indexed. When
mapping our data, we create a mapping definition, which contains a list of fields that are pertinent
to the document. In Elasticsearch, an index may store documents of different "mapping types". A
mapping type describes the way of separating the documents in an index into logical groups. To
create a mapping, we will need the ‘Put Mapping API’, or you can add multiple mappings when you
create an index. For more information, please visit:
https://elastic.co/guide/en/elasticsearch/reference/6.8/indices-put-mapping.html.

What is ELK Stack and its Components?

"ELK" is the acronym for three open source projects: Elasticsearch, Logstash, and Kibana.
Elasticsearch is a search and analytics engine. Logstash is a server-side data processing pipeline
that ingests data from multiple sources simultaneously, transforms it, and then sends it to a
"stash" like Elasticsearch. Kibana lets users to visualize data through charts and graphs in
Elasticsearch.

 Logs: Server logs that need to be analyzed.

 Logstash: Collect logs and events data. It even parses and transforms data
 ElasticSearch: The transformed data from Logstash is Stored, Searched, and Indexed.
 Kibana: Kibana uses Elasticsearch DB to Explore, Visualize, and Share the data stored.
Visualizations in Kibana can be categorized into following five different types:
o Basic Charts (Area, Heat Map, Horizontal Bar, Line, Pie, Vertical bar)
o Data (Date Table, Gauge, Goal, Metric)
o Maps (Coordinate Map, Region Map)
o Time series (Timelion, Visual Builder)
o Other (Controls, Markdown, Tag Cloud)
 Beats: The most important in modern architecture which are lightweight agents that are
installed on edge hosts to collect different types of data for forwarding into the stack. The
data collected by the different beats varies — log files in the case of Filebeat, system and
service metrics in case of metricbeat, network data in the case of Packetbeat, Windows
event logs in the case of Winlogbeat etc. Once data collected, we can configure our beat to
ship the data either directly into Elasticsearch or to Logstash for additional processing.

Together, these different components are most commonly used for monitoring,
troubleshooting and securing IT environments. Beats and Logstash take care of data
collection and processing, Elasticsearch indexes and stores the data, and Kibana provides a
user interface for querying the data and visualizing it.

 Kibana dashboards: Once we have a collection of visualizations ready, we can add them
all into one comprehensive visualization called a dashboard, which provides us the ability
to monitor an environment in easier event correlation and trend analysis. Dashboards are
highly dynamic, which can be edited, shared, played around with, opened in different
display modes and more.

Log management and analysis include the following key capabilities:

 Aggregation – To collect and ship logs from multiple data sources.
 Processing – To transform log messages into meaningful data for easier analysis.
 Storage – To store data for extended time periods and allow for monitoring, trend analysis,
and security use cases.
 Analysis – To dissect the data by querying it and creating visualizations and dashboards on
top of it.
 How to Use the ELK Stack for Log Analysis

Architecture of ELK Stack.

For small sized development environment, the ELK stack pipeline looks as follows:
Beats (Data Collection)->Redis, Kafka, Rabbit MQ(Buffering) -> Logstash (Data Aggregation and
Processing)->Elasticsearch (indexing and Storage)->Kibana (Analysis and Visualization)

But for complex scenarios, the pipeline looks as follows:

Beats (Data Collection)->Logstash (Data Aggregation and Processing)->Elasticsearch (indexing
and Storage)->Kibana (Analysis and Visualization)
Fig-2: ELK stack architecture with Kafka (Source: https://elastic-stack.readthedocs.io)

Generally, there exists a bottleneck for a production environment which scales out unlimitedly:

 Logstash needs to process logs with pipelines and filters which cost considerable time,
it may become a bottleneck if log bursts exist;
 Elasticsearch needs to index logs which cost time, and it becomes a bottleneck when
log bursts happen.

The above mentioned bottlenecks can be smoothed by adding more Logstash deployment and
scaling Elasticsearch cluster, which can also be smoothed by introducing a cache layer in the
middle like all other IT solutions. One of the most popular solutions to leverage a cache layer is
integrating Kafka into the ELK stack.
Process Flow:
Data gets collected through beats and processed to kafka, which can serve as a data hub where
Beats can persist to and Logstash nodes can consume, from where the logs get consumed by
Logstash, for log processing. The common way to feed data into Logstash is through HTTP, TCP
and UDP protocols. Logstash can expose endpoint listeners with the respective TCP, UDP,
and HTTP input plugins. After processing, the processed logs get stored in Elasticsearch and get
consumed by Kibana for metric visualisation.

Real time Uses:

Besides the Log analysis, following are few real time use cases, where Elasticsearch get used
hugely:

1. Text Mining and Natural Language Processing (NLP): Elasticsearch is widely used as
a search and analytics engine. Following are few use cases:

Most NLP tasks start with a standard preprocessing pipeline such as :

1. Gathering the data

2. Extracting raw text
3. Sentence Splitting
4. Tokenization
5. Normalizing (Stemming and Lemmantization etc..)
6. Stopword removal
7. Part of Speech (POS) tagging

A. PREPROCESSING (NORMALIZATION)
Have you ever used the ‘_analyze’ endpoint? ElasticSearch has over 20 language-analyzers
built in. What is an ‘Analyzer’ doing? Tokenization, stemming and stopword removal. That is
very often all we need for preprocessing for higher level tasks such as Machine Learning,
Language Modelling etc. We basically just need a running instance of ElasticSearch, without
any configuration or setup. Then ‘analyze-endpoint’ can be used as a Rest-API for NLP-
preprocessing. For more information, please visit:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html.

B. LANGUAGE DETECTION
‘Language Detection’ is a major challenge in NLP problem. This can be solved by installing
another plugin ‘langdetect’ from Elasticsearch. For more information, please visit:
https://github.com/jprante/elasticsearch-langdetect. It uses 3-gram character and a Bayesian
filter supporting various normalizations and feature sampling techniques. The precision is over
99% for 53 languages. Is it quite good? The plugin offers a ‘mapping type’ to specify the fields
where we want to enable language detection. The plugin offers a REST endpoint, where we can
post a short text which goes in UTF-8 format, and the plugin responds with a list of recognized
languages. What happens when that query is fired?

It will analyse our input text that comes either from the documents in the index or directly
from the like text. It will extract the most important keywords from that text and run a
‘Boolean’ query with all those keywords.
How does it know what a keyword is?
Keywords can be determined with a formula which shall be applied to a set of documents and
can be used to compare a subset of the documents to all documents based on word
probabilities. It is called Tf-Idf which is a very important formula for TextMining. It assigns a
score to each term in the subset compared to the entire corpus of documents. A high score
indicates that it is more likely that a term identifies or characterizes the current subset of
documents and distinguishes it clearly from all other documents.

C. RECOMMENDATION ENGINE
Basically, recommendation engines can be of 2 types: social and content based. A social
recommendation engine like Amazon e-commerce site, is referred to as “Collaborative
Filtering” where recommendation happens for People who bought this product also bought…
The other type of recommendation engine is called “Item based recommendation engine”,
which tries to group the datasets based on the properties of the entries, which is used to
answer “Any novel or scientific paper which is similar to the one which I read in recent past.”

With Elasticsearch we can easily build an item based recommendation engine.

We just configure the ‘MLT’ query template based on our data. We will use the actual item ID
as a starting point and recommend the most similar documents from your index. We can add
custom logic by running a bool query that combines a function score query to boost by
popularity or recency on top of the more like this query. The ‘More Like This’ (MLT) Query finds
documents that are "like" a given set of documents. In order to do so, MLT selects a set of
representative terms of these input documents, forms a query using these terms, executes the
query and returns the results. The user controls the input documents, how the terms should be
selected and how the query is formed. For more information, please visit:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html.

D. DUPLICATE DETECTION
If we have data from several sources (news, affiliate ads, etc.) there might be a possibility
that we are running our model into a dataset having many duplicates, which is an unwanted
behaviour for most end user applications.

How does it work?

There is a challenge with duplicate detection:

We need to compare all documents pairwise. The objective is to retain the first inspected
element and discard all others. So we need a lot of custom logic to choose the first document
to look at. As the complexity is very high, it is quite difficult to detect the duplicates offline,
but yes, an online tool is much required for this. General algorithms for industry standard for
duplicate detection are Simhash and Minhash (used by Google and Twitter). They generate
hashes for all documents, store them in an extra datastore and use a similarity function and
the documents that exceed a certain threshold are considered duplicates. For very short
documents we can work with the Levenshtein distance or Minimum Edit Distance. But for
longer documents we might want to rely on a token based solution.
https://www.elastic.co/blog/how-to-find-and-remove-duplicate- documents-in-elasticsearch.

2. Image Processing:
Can you imagine how nice it will be if there is a tool with image search facility?

This can be addressed with Deepdetect (https://www.deepdetect.com/ ). We send images to

Deepdetect, images get annotated, then the annotations and the image URL get indexed into
Elasticsearch directly without any glue code. Deepdetect is a classification service that
distinguishes among 1000 different image categories, from 'ambulance' to 'padlock' to
'hedgehog', and indexes images with their categories into an instance of Elasticsearch. For
every image, the Deepdetect server directly indexes the predicted categories into
Elasticsearch, by avoiding the glue code in between the deep learning server and
Elasticsearch. DeepDetect supports output templates, which allows transforming the standard
output of the DeepDetect server into any custom format, which provides a functionality to
search images with text, even without having caption. It is also scalable as prediction works
over batches of images, and multiple prediction servers can be set to work in parallel.
Following are few uses cases which can be resolved by using deepdetect:
 Signatureless Malware detection from binaries
 Anomaly detection from raw traffic logs
 False positives filtering from SOC alerts
 Domain Generation Algorithm detection
 URL filtering and clustering on GPUs
3. Additional Applications:
The Elastic Stack, along with custom-built Elasticsearch plugins, helps to drive the
following content search experiences:

 Search based on computer vision and metadata

 Deep textual and hybrid content search

 Video and richer format search

 Enterprise search

 Discovery and recommendations

4. Crawling and Document Processing:

StormCrawler is a popular and mature open source web crawler, which gets used to
provide documents to index to a search engine and, with Elastic being an open source
tools for search and analytics, we needed a resource in StormCrawler to achieve
this. IndexBolt in the Elasticsearch module takes a web page fetched and parsed by
StormCrawler and sends it for indexing to Elasticsearch. It builds a representation of a
document containing its URL and the text extracted by the parser and any relevant
metadata extracted during the parsing, such as the title of the page, keywords, summary,
language, hostname, etc. StormCrawler comes with various resources for data extraction
which can be easily configured or extended.

What differentiates StormCrawler from other web crawlers is that it uses Elasticsearch as a
back end for storage as well. Elasticsearch is an excellent resource for doing this and
provides visibility into the data as well as great performance. The Elasticsearch module
contains a number of spout implementations, which query the status index to get the URLs
for StormCrawler to fetch. For more information, please visit:
https://www.elastic.co/blog/stormcrawler-open-source-web-crawler-strengthened-by-
elasticsearch-kibana.

5. Multitenancy:

Often, we have multiple customers or users with separate collections of documents, and a user
should never be able to search documents that do not belong to him. Then we end up with a
design where each user has his own index. More often than not, this leads to way too many
indexes. In almost every case we see index-per-user implemented, whereas we can have one
larger Elasticsearch index to address following downsides of having a huge number of small
indexes:

 The memory overhead can be controlled because thousands of small indexes consume
a lot of heap space.

 There can be a lot of duplication.

Conclusion:
There is a lot to learn with Elasticsearch, and sometimes it can be hard to know what you need to
learn. In this article, I have covered quite a few common use cases and some important things to
be aware of for all of them.

Percona Monitoring and Management Documentation: Date .Getfullyear )
No ratings yet
Percona Monitoring and Management Documentation: Date .Getfullyear )
589 pages
DataBase Administration
No ratings yet
DataBase Administration
50 pages
REACT.JS PPT
No ratings yet
REACT.JS PPT
61 pages
Student Guide - SQL-4402 MySQL Performace Tuning
No ratings yet
Student Guide - SQL-4402 MySQL Performace Tuning
372 pages
Paper Format Ifa
57% (7)
Paper Format Ifa
2 pages
Git and GitHub
No ratings yet
Git and GitHub
40 pages
Machine Learning Project
100% (1)
Machine Learning Project
17 pages
Git Gitlab Cheatsheet 1702839836
No ratings yet
Git Gitlab Cheatsheet 1702839836
9 pages
Lab14 - Understanding Blob storage -Azure
No ratings yet
Lab14 - Understanding Blob storage -Azure
35 pages
Java Challengers Master The Java Fundamentals With Fun Java Code Challenges Become A Java Challenger (2023), 2023 - ISBN - English
No ratings yet
Java Challengers Master The Java Fundamentals With Fun Java Code Challenges Become A Java Challenger (2023), 2023 - ISBN - English
548 pages
DWV Unit1
No ratings yet
DWV Unit1
102 pages
NLP All Units
No ratings yet
NLP All Units
81 pages
Kubernetes
No ratings yet
Kubernetes
27 pages
Amazon: Questions & Answers
No ratings yet
Amazon: Questions & Answers
279 pages
Centralized Logging: Implementation Guide
No ratings yet
Centralized Logging: Implementation Guide
40 pages
AWS Developer Associate Certification Exam: Description Priority Type Cost
No ratings yet
AWS Developer Associate Certification Exam: Description Priority Type Cost
2 pages
SQL
No ratings yet
SQL
77 pages
Interview-questions-Real Time PHP Project
No ratings yet
Interview-questions-Real Time PHP Project
95 pages
JSON Path Course Resource
No ratings yet
JSON Path Course Resource
57 pages
Dark Mode - 341 - AWS Certified Solutions Architect-PDF - 1574730494 PDF
100% (1)
Dark Mode - 341 - AWS Certified Solutions Architect-PDF - 1574730494 PDF
266 pages
Elastic Search
No ratings yet
Elastic Search
19 pages
Docker Cheat Sheet
No ratings yet
Docker Cheat Sheet
50 pages
Jenkins Declarative Pipeline
No ratings yet
Jenkins Declarative Pipeline
41 pages
100 Linux Best Practices
No ratings yet
100 Linux Best Practices
15 pages
Ec2 Ug PDF
No ratings yet
Ec2 Ug PDF
722 pages
Kibana Essentials - Sample Chapter
No ratings yet
Kibana Essentials - Sample Chapter
21 pages
AngularJS Tutorial
100% (1)
AngularJS Tutorial
41 pages
1 - Creating A Data Transformation Pipeline With Cloud Dataprep
0% (1)
1 - Creating A Data Transformation Pipeline With Cloud Dataprep
39 pages
Mongo DB
No ratings yet
Mongo DB
21 pages
Elasticsearch: Getting Started With Elasticsearch
No ratings yet
Elasticsearch: Getting Started With Elasticsearch
6 pages
Cloud Computing Day - 1
No ratings yet
Cloud Computing Day - 1
124 pages
Automatic Indexing
No ratings yet
Automatic Indexing
26 pages
Principles of Taxation For Business and Investment Planning 14th Edition Jones Solutions Manual 1
100% (50)
Principles of Taxation For Business and Investment Planning 14th Edition Jones Solutions Manual 1
36 pages
Hadoop Admin Course
No ratings yet
Hadoop Admin Course
8 pages
AWS Solutions Architect Lesson 3
No ratings yet
AWS Solutions Architect Lesson 3
85 pages
The Data Center As A Computer
100% (1)
The Data Center As A Computer
156 pages
RabbitMQ Master
No ratings yet
RabbitMQ Master
136 pages
Version Control Systems
No ratings yet
Version Control Systems
14 pages
IRCh 7 Slides
No ratings yet
IRCh 7 Slides
52 pages
244256-Exabeam Security Content in The Legacy Structure-Pdf-En
No ratings yet
244256-Exabeam Security Content in The Legacy Structure-Pdf-En
142 pages
PART-I: Multiple Choices: Jimma University
100% (1)
PART-I: Multiple Choices: Jimma University
6 pages
CB Queryoptimization 01
No ratings yet
CB Queryoptimization 01
78 pages
UIJAVAKIT
100% (1)
UIJAVAKIT
33 pages
Machine Learning - Manual
No ratings yet
Machine Learning - Manual
32 pages
Learn Tech Skills From Scratch!: The College Companion To Give You An Edge
No ratings yet
Learn Tech Skills From Scratch!: The College Companion To Give You An Edge
12 pages
Implementation Guide For Electronic Transmission of Individual Case Safety Reports (Icsrs)
No ratings yet
Implementation Guide For Electronic Transmission of Individual Case Safety Reports (Icsrs)
166 pages
Elasticsearch Optimization
No ratings yet
Elasticsearch Optimization
25 pages
Kubernetes Setup Notes
No ratings yet
Kubernetes Setup Notes
5 pages
Xlri Apm Brochure
No ratings yet
Xlri Apm Brochure
14 pages
04 - Google BigQuery Pricing
No ratings yet
04 - Google BigQuery Pricing
18 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
Js Cheatsheet: Basics Loops
No ratings yet
Js Cheatsheet: Basics Loops
23 pages
Search Accuracy Analytics White Paper
No ratings yet
Search Accuracy Analytics White Paper
35 pages
Create A Simple HTML Website With Postgres Databas
No ratings yet
Create A Simple HTML Website With Postgres Databas
8 pages
Understanding The Top 5 Redis Performance Metrics
No ratings yet
Understanding The Top 5 Redis Performance Metrics
22 pages
Fundamentals of AI - Visual Map (AIF-C01)
No ratings yet
Fundamentals of AI - Visual Map (AIF-C01)
1 page
Franz Thalmair Curating Medianetart 1
No ratings yet
Franz Thalmair Curating Medianetart 1
160 pages
Wall Product Feature Enhancement Short Form BRD 4.10.20
No ratings yet
Wall Product Feature Enhancement Short Form BRD 4.10.20
38 pages
Getting Started With Laserfiche Guide
No ratings yet
Getting Started With Laserfiche Guide
78 pages
Internet of Spatial Things: A New Reference Model With Insight Analysis
No ratings yet
Internet of Spatial Things: A New Reference Model With Insight Analysis
18 pages
Methodology: Berlin Principles of Higher Education Institutions
No ratings yet
Methodology: Berlin Principles of Higher Education Institutions
10 pages
Top Kubernetes Interview Questions and Answers
No ratings yet
Top Kubernetes Interview Questions and Answers
26 pages
UCL Library Services, Gower ST., London WC1E 6BT
No ratings yet
UCL Library Services, Gower ST., London WC1E 6BT
4 pages
R Note
No ratings yet
R Note
56 pages
IR_MOD4_NOTES
No ratings yet
IR_MOD4_NOTES
19 pages
Meta Tags
No ratings yet
Meta Tags
4 pages
Lecture3 Tolerant Retrieval Handout 6 Per
No ratings yet
Lecture3 Tolerant Retrieval Handout 6 Per
8 pages
Doc-IT Features and Benefits
No ratings yet
Doc-IT Features and Benefits
8 pages
T5 Moleculardescriptors Models PDF
No ratings yet
T5 Moleculardescriptors Models PDF
9 pages
Apache Kafka
No ratings yet
Apache Kafka
6 pages
Andargachew Mekonnen Gezmu
No ratings yet
Andargachew Mekonnen Gezmu
113 pages
Angular Material Tutorial PDF
0% (1)
Angular Material Tutorial PDF
17 pages
Characteristics of The Invisible Web
No ratings yet
Characteristics of The Invisible Web
14 pages
Principal Components Analysis: Mathematical Development
No ratings yet
Principal Components Analysis: Mathematical Development
23 pages
Rahul Sharma
100% (1)
Rahul Sharma
2 pages
Angular 6
No ratings yet
Angular 6
54 pages
SEO Strategy Template
100% (8)
SEO Strategy Template
14 pages
Frontend Cheatsheet
No ratings yet
Frontend Cheatsheet
2 pages
Cloud-Based Design and Manufacturing Status and Promise
No ratings yet
Cloud-Based Design and Manufacturing Status and Promise
24 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
DevOps Training in Kukatpally
100% (1)
DevOps Training in Kukatpally
4 pages
AAR P1 Incident MOL Outage V0.2
No ratings yet
AAR P1 Incident MOL Outage V0.2
5 pages
Tips For Working With PDF Documents
No ratings yet
Tips For Working With PDF Documents
17 pages
MySql NDB Clustering
No ratings yet
MySql NDB Clustering
6 pages
An Effective Query System Using Llms and Langchain IJERTV12IS060161
No ratings yet
An Effective Query System Using Llms and Langchain IJERTV12IS060161
3 pages
Hud Hud Dabang Dabang Dabang Hud Hud Dabang Dabang Dabang Dabang ..'
No ratings yet
Hud Hud Dabang Dabang Dabang Hud Hud Dabang Dabang Dabang Dabang ..'
13 pages
Apache Spark Architecture
No ratings yet
Apache Spark Architecture
7 pages
Zaheer Ahmad, Presentation Information Literacy Skills
No ratings yet
Zaheer Ahmad, Presentation Information Literacy Skills
29 pages
T5 Moleculardescriptors Models PDF
No ratings yet
T5 Moleculardescriptors Models PDF
9 pages
Design High - Speed Search by Classifying Binary Code Book Based On Edge Information in Block
No ratings yet
Design High - Speed Search by Classifying Binary Code Book Based On Edge Information in Block
3 pages
10 User Guide Dose Forms and Routes of Administration v1 0
No ratings yet
10 User Guide Dose Forms and Routes of Administration v1 0
8 pages
Applied Coding Track
No ratings yet
Applied Coding Track
10 pages
Library Software Packages Available in India
No ratings yet
Library Software Packages Available in India
9 pages
Google Hacking Mini
No ratings yet
Google Hacking Mini
8 pages
List Vs Tuples
No ratings yet
List Vs Tuples
10 pages
Handling Imbalanced Data
No ratings yet
Handling Imbalanced Data
21 pages
Filebeat To Graylog
No ratings yet
Filebeat To Graylog
4 pages
RDBMS To MongoDB Migration
No ratings yet
RDBMS To MongoDB Migration
19 pages
Search Engines: Submitted To: Submitted by
No ratings yet
Search Engines: Submitted To: Submitted by
16 pages
0 - Summary of Document History - v1 - 6
No ratings yet
0 - Summary of Document History - v1 - 6
1 page
Node Cheatsheet PDF
No ratings yet
Node Cheatsheet PDF
6 pages
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Java servlet Second Edition
From Everand
Java servlet Second Edition
Gerardus Blokdyk
No ratings yet

Subject: A Glance To Elasticsearch in The Era of Analytics and Machine Learning

Uploaded by

Subject: A Glance To Elasticsearch in The Era of Analytics and Machine Learning

Uploaded by

Subject: A Glance to Elasticsearch in the era of Analytics and Machine

So, here Elasticsearch comes into picture.

 Elasticsearch is a full text search, readily-scalable, enterprise-grade, analytics engine for

Concept and Components:

Fig: 1: Elasticsearch Concept (Source: W3 school)

 Index: An index is a collection of documents that have somewhat similar characteristics. In

How Elasticsearch represents data?

In Elasticsearch, we search a document and an index consists of one or more Documents, and a

What is ELK Stack and its Components?

 Logs: Server logs that need to be analyzed.

Log management and analysis include the following key capabilities:

Architecture of ELK Stack.

But for complex scenarios, the pipeline looks as follows:

Real time Uses:

Most NLP tasks start with a standard preprocessing pipeline such as :

1. Gathering the data

With Elasticsearch we can easily build an item based recommendation engine.

How does it work?

There is a challenge with duplicate detection:

This can be addressed with Deepdetect (https://www.deepdetect.com/ ). We send images to

 Search based on computer vision and metadata

 Deep textual and hybrid content search

 Video and richer format search

 Discovery and recommendations

4. Crawling and Document Processing:

 There can be a lot of duplication.

You might also like