0% found this document useful (0 votes)
30 views

UI Developer - Elasticsearch - 20200421

This document summarizes the use of Elasticsearch to improve search capabilities for message history in Dispatch. Elasticsearch allows querying 150 million rows in under a second, compared to over 7 minutes for an Oracle query. A managed Elasticsearch service on AWS is proposed, which would provide automatic updates, backups, and basic monitoring for $335 per month. The architecture would load metadata to Elasticsearch daily from SQS and backup to S3. Indexes would be partitioned monthly and deleted automatically by Curator.

Uploaded by

laurentbia
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

UI Developer - Elasticsearch - 20200421

This document summarizes the use of Elasticsearch to improve search capabilities for message history in Dispatch. Elasticsearch allows querying 150 million rows in under a second, compared to over 7 minutes for an Oracle query. A managed Elasticsearch service on AWS is proposed, which would provide automatic updates, backups, and basic monitoring for $335 per month. The architecture would load metadata to Elasticsearch daily from SQS and backup to S3. Indexes would be partitioned monthly and deleted automatically by Curator.

Uploaded by

laurentbia
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Managed

Elasticsearch
for Dispatch
Michael Alberhasky
AIS-Architecture Team

April 22, 2020

ITS - Administrative Information Services


What is Elasticsearch?
• Distributed search and analytics engine
• Built upon Apache Lucene
• RESTful interface
• Elasticsearch + Logstash + Kibana = ELK stack
• AIS also uses it for log searching and course searching
in MyUI.

ITS - Administrative Information Services


Dispatch’s
Dis patc h’s Use
Us e Case
Cas e
• Feature request to search through message history
• Oracle query against 70 million rows = 7.79 minutes
• Elasticsearch query in the most inefficient way = 1083ms
• Can’t just add more indices to a table when there are
150 million rows
• No downtown and an index that big would take eons to build
• Storage considerations – database already 2TB
• Need ability to quickly get metadata for new app

ITS - Administrative
Adminis trative Information
Info rmatio n Services
S e rvic e s
Managed Services
• Paying to use it (hopefully cheaper than it would cost to
run it yourself)
• Updates/patches are done for you
• Backups happen automatically
• Self-service tools for configuration
• Basic monitoring provided

ITS - Administrative Information Services


Hosting our own or Managed on AWS?
PRO CON
• Elasticity - If I need a cluster • Blue/Green deployments take
with more CPU for ingesting time as the entire cluster must
lots of data, I can change be replaced and data copied to
instance type easily new cluster. Data can still be
read/write to old cluster while
• New feature called UltraWarm
that is happening
Storage - Extend your storage
into low cost storage so you • More expensive then just trying
can search through gobs to run your own - however, time
more data is money - $335/month

ITS - Administrative Information Services


Architecture

ITS - Administrative Information Services


Interfaces {
"query": {
"bool": {
"filter": [{
"bool": {

• RESTful API offers ability query "minimum_should_match": 1,


"should": [ {
"match_phrase": {
indices via: }
"member_id": ”foobar-1234-abcd-5678"

}
• Query DSL ]
}
• SQL }],
"must": [{

• Kibana
"range": {
"index_date": {
"format": "strict_date_optional_time",
"gte": "2020-04-15T20:25:16.707Z",
"lte": "2020-04-15T20:55:16.707Z"
}
}
}],
"must_not": [],
"should": []
}
}
}

ITS - Administrative Information Services


It’s a bird, it’s a plane, it’s a cache
• Treat indices like a cache, it could go poof at any time.
• Message metadata loaded to Elasticsearch after batch is completed.
• Daily exports of metadata to S3.
• Load Lambda function to rapidly reload indices.
S3 Bucket SQS Queue Load Function Elasticsearch

Dispatch

ITS - Administrative Information Services


Index Design
• Index for each day = bad idea
• Index for each month = good idea
• Aliased to a super index
• Curator to manage indices and
delete old indices
• AWS now offers index
management as a feature

ITS - Administrative Information Services


It ain’t free
• 3 x m5.large.elasticsearch with 70GB each
• On-demand pricing:
• Compute: $0.142/hour = $306.72
• Storage: $9.45/month x 3 = $28.35
• Elected not to use dedicated Master nodes
• Opportunities for reduced cost:
• Reserved instances would save over $100/month
• Reduce number of instances and accept higher risk?

ITS - Administrative Information Services


Demo
• Search function in Dispatch
• Kibana interface
• AWS Console

ITS - Administrative Information Services


Michael
Alberhasky

319-353-4484

[email protected]

ITS - Administrative Information Services

You might also like