SlideShare a Scribd company logo
Introduction to ELK stack
– 巨量資料處理、搜尋、及分析工具介紹 –
計資中心網路組 邵喻美
madeline@ntu.edu.tw
1
Topics
• Why big data tool for network traffic and log analysis
• What is ELK stack, and why choose it
• ELK stack intro
• ELK use cases
• Implementation of ELK on network & account anomaly detection
2
Network operation and security management issues
• Lots of users
• Faculty & staff & students  more than 40000 users on campus
• Lots of systems
• Routers, firewalls, servers….
• Lots of logs
• Netflow, syslogs, access logs, service logs, audit logs….
• Nobody cares until something go wrong….
3
Logs & events analysis for network managements
• Logs & events collection from multiple sources
• Accept and parse different log formats
• Large amount, and various formats of data
• Scalable architecture
• Expert knowledge requirement
4
How we “traditional” system managers treat logs
• Set up one or more log servers for receiving logs from
servers/routers/appliances
• Unix commands -- grep + awk + sed + sort + uniq + perl + shell
script ….
• Cronjobs executed periodically
• compute stats and send out report/alert
• detect possible abnormal behavior and react accordingly
• Plain text reports or stats trends webpage
5
Amount of data….
• Router
• Netflow – 43GB daily
• Wifi
• NAT log – 4.8TB daily
• Auth log
• WAF/Firewall
• Server access logs & events
• Mail server log ~18GBdaily
• POP3 – avg. 7GB daily
• SMTP – avg. 1.75GB daily
• Exchange – avg. 140MB daily
• OWA – avg. 8.4GB daily
• MessageTrackingLog – avg. 100MB
daily
6
What is ELK, and why choose it
7
Splunk vs. ELK on Google Trend
One of the leaders in security
information and event
management (SIEM) market
How do Netflix, Facebook, Microsoft,
LinkedIn, and Cisco monitor their
logs? With ELK.
The ELK Stack is now
downloaded 500,000
times every month,
making it the world’s
most popular log
management platform
8
Why ELK?
• Rapid on-premise (or cloud) installation and easy to deploy
• Scales vertically and horizontally
• Easy and various APIs to use
• Ease of writing queries, a lot easier then writing a MapReduce job
• Availability of libraries for most programming/scripting languages
• Elastic offers a host of language clients for Elasticsearch, including Ruby, Python, PHP,
Perl, .NET, Java, and Javascript, and more
• Tools availability
• It’s free (open source), and it’s quick
9
Logstash
Data From Any Source
Elasticsearch
Instantly Search & Analyze
Kibana
Actionable Insight
Elasticsearch is a NoSQL
database that is based on the
Lucene search engine
 indexes and stores the
information
Kibana is a visualization
layer that works on top
of Elasticsearch
 presents the data in
visualizations that
provide actionable
insights
Logstash is a log pipeline
tool that accepts inputs
from various sources,
executes different
transformations, and
exports the data to
various targets
 collects and parses logs
10
11
ELK modules
Open Source —
• ElasticSearch
• Logstash
• Kibana
• Beats
• data shippers – collect, parse & ship
Extension plugins —
• Alerting (Watcher)
• Proactively monitoring and alerting based on
elasticsearch queries or conditions
• Security (Shield)
• Protect and provide security to elastic stack
• Monitoring (Marvel)
• Monitor and diagnose health and performance
of elastics cluster
• Graph
• discover and explore the relationships live in
data by adding relevance to your exploration
12
Connect Speedy Search with Big Data Analytics –
Elasticsearch for Apache Hadoop
ES-Hadoop -- a two-way connector
• Read and write data to ES and query it in real time
13
let’s look into ELK stack
14
The ELK stack
15
Elasticsearch-Logstash-Kibana
16
Logstash
17
Logstash architecture
Ip: 140.1.1.1
Ip: 140.1.1.1
City: Zurich
Country: CH
18
How logstash works
19
Logstash Input plugins
• Stdin – Reads events from standard input
• File – Streams events from files (similar to “tail -0F”)
• Syslog – Reads syslog messages as events
• Eventlog – Pulls events from theWindows Event Log
• Imap – read mail from an IMAP server
• Rss – captures the output of command line tools as an event
• Snmptrap – creates events based on SNMP trap messages
• Twitter – Reads events from the Twitter Streaming API
• Irc – reads events from an IRC server
• Exec – Captures the output of a shell command as an event
• Elasticsearch – Reads query results from an Elasticsearch cluster
• ….
20
Logstash Filter plugins
• grok – parses unstructured event data into fields
• Mutate – performs mutations on fields
• Geoip – adds geographical information about an IP address
• Date – parse dates from fields to use as the Logstash timestamp for
an event
• Cidr – checks IP addresses against a list of network blocks
• Drop – drops all events
• …
21
Logstash Output plugins
• Stdout – prints events to the standard output
• Csv – write events to disk in a delimited format
• Email – sends email to a specified address when output is received
• Elasticsearch – stores logs in Elasticsearch
• Exec – runs a command for a matching event
• File – writes events to files on disk
• mongoDB – writes events to MongoDB
• Redmine – creates tickets using the Redmine API
• ….
22
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp}
%{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?:
%{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}
Dec 23 14:30:01 louis CRON[619]: (www-data) CMD (php /usr/share/cacti/site/poller.php
>/dev/null 2>/var/log/cacti/poller-error.log)
23
{
"message" => "Dec 23 14:30:01 louis CRON[619]: (www-data) CMD (php
/usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)",
"@timestamp" => "2013-12-23T22:30:01.000Z",
"@version" => "1",
"type" => "syslog",
"host" => "0:0:0:0:0:0:0:1:52617",
"syslog_timestamp" => "Dec 23 14:30:01",
"syslog_hostname" => "louis",
"syslog_program" => "CRON",
"syslog_pid" => "619",
"syslog_message" => "(www-data) CMD (php /usr/share/cacti/site/poller.php
>/dev/null 2>/var/log/cacti/poller-error.log)",
"received_at" => "2013-12-23 22:49:22 UTC",
"received_from" => "0:0:0:0:0:0:0:1:52617",
"syslog_severity_code" => 5,
"syslog_facility_code" => 1,
"syslog_facility" => "user-level",
"syslog_severity" => "notice"
}
24
date
25
26
https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns
27
Deploying and scaling Logstash
Minimal installation
Using Filters
28
Deploying and scaling Logstash
Using log shipper to
minimize the resource
demands on Logstash
Scaling to a Larger
Elasticsearch Cluster
29
Deploying and scaling Logstash
Managing Throughput
Spikes with Message
Queuing
30
Multiple Connections
for Logstash High
Availability
31
Elasticsearch-Logstash-Kibana
32
ElasticSearch
• Built on top of Apache Lucene™, a full-text search-engine library
• A Schema-free, REST & JSON based distributed search engine with real-time analytics
• Capable of scaling to hundreds of servers and petabytes of structured and unstructured data
• Open Source:Apache License 2.0
• Wikipedia uses Elasticsearch to provide full-text search with highlighted search snippets, andsearch-as-you-type and did-you-
mean suggestions
• The Guardian uses Elasticsearch to combine visitor logs with social-network data to provide real-time feedback to its
editors about the public’s response to new articles
• Stack Overflow combines full-text search with geolocation queries and uses more-like-this to find related questions and
answers
• GitHub uses Elasticsearch to query 130 billion lines of code
Real scalability comes from
horizontal scale
Schema-flexible
33
Elasticsearch vs. Relational DB
ElasticSearch Relational DB
Index Database
Type Table
Document Row
Field Column
Shard Partition
Mapping Schema
- (everything is indexed) Index
Query DSL (domain specific language) SQL
Shards are how
Elasticsearch distributes
data around your
cluster
34
What is a shard
• a shard is a single instance of Lucene, and is a complete search engine
in its own right
• Documents are stored and indexed in shards  shards are allocated
to nodes in your cluster
• As your cluster grows or shrinks, Elasticsearch will automatically
migrate shards between nodes so that the cluster remains balanced
• A shard can be either a primary shard or a replica shard
• Each document in your index belongs to a single primary shard
• A replica shard is just a copy of a primary shard
35
ElasticSearch clustering – single node cluster
• Node = running instance of ES
• Cluster = 1+ nodes with the same cluster.name
• Every cluster has 1 master node
• 1 Cluster can have any number of indexes
36
A cluster consists of one or more nodes with the same cluster.name
• All primary and replica shards are allocated
• Each index has one primary (P) and one replica (R) shard
• Clients talk to any node in the cluster
ElasticSearch clustering – adding a second node
37
ElasticSearch clustering – adding a third node
• More primary shards:
• faster indexing
• more scale
• More replicas:
• faster searching
• more failover
Increase the number of replicas
38
Creating, Indexing, and Deleting a document
1. The client sends a create, index, or delete request to Node 1
2. The node uses the document’s _id to determine that the document belongs to shard 0. It forwards
the request to Node 3, where the primary copy of shard 0 is currently allocated
3. Node 3 executes the request on the primary shard. If it is successful, it forwards the request in
parallel to the replica shards on Node 1 and Node 2. Once all of the replica shards report
success, Node 3 reports success to the coordinating node, which reports success to the client.
39
Retrieving a Document
1. The client sends a get request to node 1
2. The node uses the document’s _id to determine that the document belongs to shard
0. Copies of shard 0 exist on all three nodes. On this occasion, it forwards the request
to node 2.
3. Node 2 returns the document to node 1, which returns the document to the client.
For read requests, the coordinating
node will choose a different shard
copy on every request in order to
balance the load
40
Partial update to a document
When a primary shard forwards changes to its replica shards, it doesn’t
forward the update request. Instead it forwards the new version of the
full document.
41
Multidocument Patterns • the coordinating node knows in which
shard each document lives.
• It breaks up the multidocument request
into a multidocument request per shard,
and forwards these in parallel to each
participating node
• Once it receives answers from each
node, it collates their responses into a
single response
mget
bulk 42
Talking to Elasticsearch
• RESTful API with JSON over HTTP
• Over port 9200
• Access via web client, or command line by curl command
• JSON ( JavaScript Object Notation )  the standard format used by NoSQL
• Elasticsearch clients
• Java API, Java REST client, JavaScript API, PHP API, Python API, Perl API…
HTTP method or verb: GET,
POST, PUT, HEAD, or DELETE
43
Indexing a document
• Store a document in an index so that it can be retrieved and queried
• Like the INSERT keyword in SQL
44
Retrieving documents
• Using GET method to retrieve document
• We can retrieve a specific document if we happen to know its id
45
Performing Queries
• Using the q=<query> form performs a
full-text search by parsing the query
string value
• Query with query DSL, which is specified
using a JSON request body
46
Query DSL – Combining Filters
Bool Filter
47
Query DSL – Nesting Boolean Queries
48
Elasticsearch-Logstash-Kibana
49
Kibana
• Search, view, and interact with data stored in Elasticsearch indices
• Execute queries on data & visualize results in charts, tables, and maps
• Add/remove widgets
• Share/Save/Load dashboards
• Open Source:Apache License 2.0
50
51
52
ELK use cases
53
54
User cases
55
Cisco Talos Security Intelligence and Research Group:
Hunting for Hackers
• Focus -- Creating leading threat intelligence
• Aggregation and analysis of unrivaled telemetry data at Cisco,
encompassing:
• Billions of web requests and emails
• Millions of malware samples
• Open source data sets (snort, clamAV…)
• Millions of network intrusions
56
CiscoTalos use ELK to analyze…
• Sandbox data cluster
• Dynamic malware analysis reports
• Search for related pattern, malewares
• ES stats
• 10 nodes
• 3 TB
• 100k reports/day
• ~8 months of data
• Honeypot cluster
• Collect attackers’ attempt
• { Account, password } pair
• Executed commands
• url of download files
• Suspicious command center for report back
57
Yale’s {elastic}SEARCH –
The Search for Cancer’s Causes and Cures
https://www.elastic.co/elasticon/2015/sf/videos/search-for-cancer-causes-and-cures
• With Next generation
sequencing technology, the lab
can process 8 million patients
specimens yearly
• How to interpret this amount
of data  what software can be
used
58
59
NYC restaurants inspection @ELK
• Real data from NYC open data project
• Restaurants inspection data
• Restaurants info
• Inspection info
• Violation codes and description
60
Ad

More Related Content

What's hot (20)

Devops
DevopsDevops
Devops
TejashBansal2
 
Jenkins
JenkinsJenkins
Jenkins
Roger Xia
 
Evolution of containers to kubernetes
Evolution of containers to kubernetesEvolution of containers to kubernetes
Evolution of containers to kubernetes
Krishna-Kumar
 
Slide DevSecOps Microservices
Slide DevSecOps Microservices Slide DevSecOps Microservices
Slide DevSecOps Microservices
Hendri Karisma
 
Azure Key Vault - Getting Started
Azure Key Vault - Getting StartedAzure Key Vault - Getting Started
Azure Key Vault - Getting Started
Taswar Bhatti
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...
Claus Ibsen
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
Arvind Kumar G.S
 
Gitlab, GitOps & ArgoCD
Gitlab, GitOps & ArgoCDGitlab, GitOps & ArgoCD
Gitlab, GitOps & ArgoCD
Haggai Philip Zagury
 
Modern CI/CD Pipeline Using Azure DevOps
Modern CI/CD Pipeline Using Azure DevOpsModern CI/CD Pipeline Using Azure DevOps
Modern CI/CD Pipeline Using Azure DevOps
GlobalLogic Ukraine
 
Kubernetes 101 for Beginners
Kubernetes 101 for BeginnersKubernetes 101 for Beginners
Kubernetes 101 for Beginners
Oktay Esgul
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
Knoldus Inc.
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Databricks
 
Elk - An introduction
Elk - An introductionElk - An introduction
Elk - An introduction
Hossein Shemshadi
 
Jenkins tutorial
Jenkins tutorialJenkins tutorial
Jenkins tutorial
HarikaReddy115
 
ELK Stack
ELK StackELK Stack
ELK Stack
Phuc Nguyen
 
Getting Started with Infrastructure as Code
Getting Started with Infrastructure as CodeGetting Started with Infrastructure as Code
Getting Started with Infrastructure as Code
WinWire Technologies Inc
 
Microservice intro
Microservice introMicroservice intro
Microservice intro
ramesh_sharma
 
Migration d'une Architecture Microservice vers une Architecture Event-Driven ...
Migration d'une Architecture Microservice vers une Architecture Event-Driven ...Migration d'une Architecture Microservice vers une Architecture Event-Driven ...
Migration d'une Architecture Microservice vers une Architecture Event-Driven ...
Daniel Rene FOUOMENE PEWO
 
Azure DevOps - Azure Guatemala Meetup
Azure DevOps - Azure Guatemala MeetupAzure DevOps - Azure Guatemala Meetup
Azure DevOps - Azure Guatemala Meetup
Guillermo Zepeda Selman
 
Kubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory GuideKubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory Guide
Bytemark
 
Evolution of containers to kubernetes
Evolution of containers to kubernetesEvolution of containers to kubernetes
Evolution of containers to kubernetes
Krishna-Kumar
 
Slide DevSecOps Microservices
Slide DevSecOps Microservices Slide DevSecOps Microservices
Slide DevSecOps Microservices
Hendri Karisma
 
Azure Key Vault - Getting Started
Azure Key Vault - Getting StartedAzure Key Vault - Getting Started
Azure Key Vault - Getting Started
Taswar Bhatti
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...
Claus Ibsen
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
Arvind Kumar G.S
 
Modern CI/CD Pipeline Using Azure DevOps
Modern CI/CD Pipeline Using Azure DevOpsModern CI/CD Pipeline Using Azure DevOps
Modern CI/CD Pipeline Using Azure DevOps
GlobalLogic Ukraine
 
Kubernetes 101 for Beginners
Kubernetes 101 for BeginnersKubernetes 101 for Beginners
Kubernetes 101 for Beginners
Oktay Esgul
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
Knoldus Inc.
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Databricks
 
Getting Started with Infrastructure as Code
Getting Started with Infrastructure as CodeGetting Started with Infrastructure as Code
Getting Started with Infrastructure as Code
WinWire Technologies Inc
 
Migration d'une Architecture Microservice vers une Architecture Event-Driven ...
Migration d'une Architecture Microservice vers une Architecture Event-Driven ...Migration d'une Architecture Microservice vers une Architecture Event-Driven ...
Migration d'une Architecture Microservice vers une Architecture Event-Driven ...
Daniel Rene FOUOMENE PEWO
 
Kubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory GuideKubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory Guide
Bytemark
 

Similar to ELK stack introduction (20)

Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
Vinay Kumar
 
Elk presentation 2#3
Elk presentation 2#3Elk presentation 2#3
Elk presentation 2#3
uzzal basak
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
Rich Lee
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
Jesus Rodriguez
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
Varun Thacker
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
Vikram Shinde
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Streamsets Inc.
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Rick Bilodeau
 
Log Monitoring and Anomaly Detection at Scale at ORNL
Log Monitoring and Anomaly Detection at Scale at ORNLLog Monitoring and Anomaly Detection at Scale at ORNL
Log Monitoring and Anomaly Detection at Scale at ORNL
Elasticsearch
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWS
Paolo latella
 
Elk presentation1#3
Elk presentation1#3Elk presentation1#3
Elk presentation1#3
uzzal basak
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Codemotion
 
OSDC 2016 - Unifying Logs and Metrics Data with Elastic Beats by Monica Sarbu
OSDC 2016 - Unifying Logs and Metrics Data with Elastic Beats by Monica SarbuOSDC 2016 - Unifying Logs and Metrics Data with Elastic Beats by Monica Sarbu
OSDC 2016 - Unifying Logs and Metrics Data with Elastic Beats by Monica Sarbu
NETWAYS
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
amesar0
 
Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018
Roy Russo
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK Stack
Rohit Sharma
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
Roy Russo
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
eswcsummerschool
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
Vinay Kumar
 
Elk presentation 2#3
Elk presentation 2#3Elk presentation 2#3
Elk presentation 2#3
uzzal basak
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
Rich Lee
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
Jesus Rodriguez
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
Varun Thacker
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
Vikram Shinde
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Streamsets Inc.
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Rick Bilodeau
 
Log Monitoring and Anomaly Detection at Scale at ORNL
Log Monitoring and Anomaly Detection at Scale at ORNLLog Monitoring and Anomaly Detection at Scale at ORNL
Log Monitoring and Anomaly Detection at Scale at ORNL
Elasticsearch
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWS
Paolo latella
 
Elk presentation1#3
Elk presentation1#3Elk presentation1#3
Elk presentation1#3
uzzal basak
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Codemotion
 
OSDC 2016 - Unifying Logs and Metrics Data with Elastic Beats by Monica Sarbu
OSDC 2016 - Unifying Logs and Metrics Data with Elastic Beats by Monica SarbuOSDC 2016 - Unifying Logs and Metrics Data with Elastic Beats by Monica Sarbu
OSDC 2016 - Unifying Logs and Metrics Data with Elastic Beats by Monica Sarbu
NETWAYS
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
amesar0
 
Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018
Roy Russo
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK Stack
Rohit Sharma
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
Fabian Hueske
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
Roy Russo
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
eswcsummerschool
 
Ad

More from abenyeung1 (9)

ELK-Stack-Grid-KA-School.pptx
ELK-Stack-Grid-KA-School.pptxELK-Stack-Grid-KA-School.pptx
ELK-Stack-Grid-KA-School.pptx
abenyeung1
 
F5 Distributed Cloud.pptx
F5 Distributed Cloud.pptxF5 Distributed Cloud.pptx
F5 Distributed Cloud.pptx
abenyeung1
 
Why use Gitlab
Why use GitlabWhy use Gitlab
Why use Gitlab
abenyeung1
 
HashiTalk
HashiTalkHashiTalk
HashiTalk
abenyeung1
 
F5 and HashiCorp Multi-Cloud
F5 and HashiCorp Multi-CloudF5 and HashiCorp Multi-Cloud
F5 and HashiCorp Multi-Cloud
abenyeung1
 
7130 layer-1-datasheet
7130 layer-1-datasheet7130 layer-1-datasheet
7130 layer-1-datasheet
abenyeung1
 
Itt provision of wi fi network design and implementation services
Itt   provision of wi fi network design and implementation servicesItt   provision of wi fi network design and implementation services
Itt provision of wi fi network design and implementation services
abenyeung1
 
Ccs 720 xp-datasheet
Ccs 720 xp-datasheetCcs 720 xp-datasheet
Ccs 720 xp-datasheet
abenyeung1
 
Wifi rfp-sample1
Wifi rfp-sample1Wifi rfp-sample1
Wifi rfp-sample1
abenyeung1
 
ELK-Stack-Grid-KA-School.pptx
ELK-Stack-Grid-KA-School.pptxELK-Stack-Grid-KA-School.pptx
ELK-Stack-Grid-KA-School.pptx
abenyeung1
 
F5 Distributed Cloud.pptx
F5 Distributed Cloud.pptxF5 Distributed Cloud.pptx
F5 Distributed Cloud.pptx
abenyeung1
 
Why use Gitlab
Why use GitlabWhy use Gitlab
Why use Gitlab
abenyeung1
 
F5 and HashiCorp Multi-Cloud
F5 and HashiCorp Multi-CloudF5 and HashiCorp Multi-Cloud
F5 and HashiCorp Multi-Cloud
abenyeung1
 
7130 layer-1-datasheet
7130 layer-1-datasheet7130 layer-1-datasheet
7130 layer-1-datasheet
abenyeung1
 
Itt provision of wi fi network design and implementation services
Itt   provision of wi fi network design and implementation servicesItt   provision of wi fi network design and implementation services
Itt provision of wi fi network design and implementation services
abenyeung1
 
Ccs 720 xp-datasheet
Ccs 720 xp-datasheetCcs 720 xp-datasheet
Ccs 720 xp-datasheet
abenyeung1
 
Wifi rfp-sample1
Wifi rfp-sample1Wifi rfp-sample1
Wifi rfp-sample1
abenyeung1
 
Ad

Recently uploaded (16)

OSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description fOSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description f
cbr49917
 
highend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptxhighend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptx
elhadjcheikhdiop
 
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 SupportReliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
steve198109
 
Seminar.MAJor presentation for final project viva
Seminar.MAJor presentation for final project vivaSeminar.MAJor presentation for final project viva
Seminar.MAJor presentation for final project viva
daditya2501
 
cxbcxfzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz7.pdf
cxbcxfzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz7.pdfcxbcxfzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz7.pdf
cxbcxfzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz7.pdf
ssuser060b2e1
 
5-Ways-To-Future-Proof-Your-SIEM-Securonix[1].pdf
5-Ways-To-Future-Proof-Your-SIEM-Securonix[1].pdf5-Ways-To-Future-Proof-Your-SIEM-Securonix[1].pdf
5-Ways-To-Future-Proof-Your-SIEM-Securonix[1].pdf
AndrHenrique77
 
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHostingTop Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
steve198109
 
project_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptxproject_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptx
redzuriel13
 
(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security
aluacharya169
 
Grade 7 Google_Sites_Lesson creating website.pptx
Grade 7 Google_Sites_Lesson creating website.pptxGrade 7 Google_Sites_Lesson creating website.pptx
Grade 7 Google_Sites_Lesson creating website.pptx
AllanGuevarra1
 
Determining Glass is mechanical textile
Determining  Glass is mechanical textileDetermining  Glass is mechanical textile
Determining Glass is mechanical textile
Azizul Hakim
 
Breaching The Perimeter - Our Most Impactful Bug Bounty Findings.pdf
Breaching The Perimeter - Our Most Impactful Bug Bounty Findings.pdfBreaching The Perimeter - Our Most Impactful Bug Bounty Findings.pdf
Breaching The Perimeter - Our Most Impactful Bug Bounty Findings.pdf
Nirmalthapa24
 
Cyber Safety: security measure about navegating on internet.
Cyber Safety: security measure about navegating on internet.Cyber Safety: security measure about navegating on internet.
Cyber Safety: security measure about navegating on internet.
manugodinhogentil
 
AI Days 2025_GM1 : Interface in theage of AI
AI Days 2025_GM1 : Interface in theage of AIAI Days 2025_GM1 : Interface in theage of AI
AI Days 2025_GM1 : Interface in theage of AI
Prashant Singh
 
Best web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you businessBest web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you business
steve198109
 
Organizing_Data_Grade4 how to organize.pptx
Organizing_Data_Grade4 how to organize.pptxOrganizing_Data_Grade4 how to organize.pptx
Organizing_Data_Grade4 how to organize.pptx
AllanGuevarra1
 
OSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description fOSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description f
cbr49917
 
highend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptxhighend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptx
elhadjcheikhdiop
 
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 SupportReliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
steve198109
 
Seminar.MAJor presentation for final project viva
Seminar.MAJor presentation for final project vivaSeminar.MAJor presentation for final project viva
Seminar.MAJor presentation for final project viva
daditya2501
 
cxbcxfzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz7.pdf
cxbcxfzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz7.pdfcxbcxfzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz7.pdf
cxbcxfzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz7.pdf
ssuser060b2e1
 
5-Ways-To-Future-Proof-Your-SIEM-Securonix[1].pdf
5-Ways-To-Future-Proof-Your-SIEM-Securonix[1].pdf5-Ways-To-Future-Proof-Your-SIEM-Securonix[1].pdf
5-Ways-To-Future-Proof-Your-SIEM-Securonix[1].pdf
AndrHenrique77
 
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHostingTop Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
steve198109
 
project_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptxproject_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptx
redzuriel13
 
(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security
aluacharya169
 
Grade 7 Google_Sites_Lesson creating website.pptx
Grade 7 Google_Sites_Lesson creating website.pptxGrade 7 Google_Sites_Lesson creating website.pptx
Grade 7 Google_Sites_Lesson creating website.pptx
AllanGuevarra1
 
Determining Glass is mechanical textile
Determining  Glass is mechanical textileDetermining  Glass is mechanical textile
Determining Glass is mechanical textile
Azizul Hakim
 
Breaching The Perimeter - Our Most Impactful Bug Bounty Findings.pdf
Breaching The Perimeter - Our Most Impactful Bug Bounty Findings.pdfBreaching The Perimeter - Our Most Impactful Bug Bounty Findings.pdf
Breaching The Perimeter - Our Most Impactful Bug Bounty Findings.pdf
Nirmalthapa24
 
Cyber Safety: security measure about navegating on internet.
Cyber Safety: security measure about navegating on internet.Cyber Safety: security measure about navegating on internet.
Cyber Safety: security measure about navegating on internet.
manugodinhogentil
 
AI Days 2025_GM1 : Interface in theage of AI
AI Days 2025_GM1 : Interface in theage of AIAI Days 2025_GM1 : Interface in theage of AI
AI Days 2025_GM1 : Interface in theage of AI
Prashant Singh
 
Best web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you businessBest web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you business
steve198109
 
Organizing_Data_Grade4 how to organize.pptx
Organizing_Data_Grade4 how to organize.pptxOrganizing_Data_Grade4 how to organize.pptx
Organizing_Data_Grade4 how to organize.pptx
AllanGuevarra1
 

ELK stack introduction

  • 1. Introduction to ELK stack – 巨量資料處理、搜尋、及分析工具介紹 – 計資中心網路組 邵喻美 [email protected] 1
  • 2. Topics • Why big data tool for network traffic and log analysis • What is ELK stack, and why choose it • ELK stack intro • ELK use cases • Implementation of ELK on network & account anomaly detection 2
  • 3. Network operation and security management issues • Lots of users • Faculty & staff & students  more than 40000 users on campus • Lots of systems • Routers, firewalls, servers…. • Lots of logs • Netflow, syslogs, access logs, service logs, audit logs…. • Nobody cares until something go wrong…. 3
  • 4. Logs & events analysis for network managements • Logs & events collection from multiple sources • Accept and parse different log formats • Large amount, and various formats of data • Scalable architecture • Expert knowledge requirement 4
  • 5. How we “traditional” system managers treat logs • Set up one or more log servers for receiving logs from servers/routers/appliances • Unix commands -- grep + awk + sed + sort + uniq + perl + shell script …. • Cronjobs executed periodically • compute stats and send out report/alert • detect possible abnormal behavior and react accordingly • Plain text reports or stats trends webpage 5
  • 6. Amount of data…. • Router • Netflow – 43GB daily • Wifi • NAT log – 4.8TB daily • Auth log • WAF/Firewall • Server access logs & events • Mail server log ~18GBdaily • POP3 – avg. 7GB daily • SMTP – avg. 1.75GB daily • Exchange – avg. 140MB daily • OWA – avg. 8.4GB daily • MessageTrackingLog – avg. 100MB daily 6
  • 7. What is ELK, and why choose it 7
  • 8. Splunk vs. ELK on Google Trend One of the leaders in security information and event management (SIEM) market How do Netflix, Facebook, Microsoft, LinkedIn, and Cisco monitor their logs? With ELK. The ELK Stack is now downloaded 500,000 times every month, making it the world’s most popular log management platform 8
  • 9. Why ELK? • Rapid on-premise (or cloud) installation and easy to deploy • Scales vertically and horizontally • Easy and various APIs to use • Ease of writing queries, a lot easier then writing a MapReduce job • Availability of libraries for most programming/scripting languages • Elastic offers a host of language clients for Elasticsearch, including Ruby, Python, PHP, Perl, .NET, Java, and Javascript, and more • Tools availability • It’s free (open source), and it’s quick 9
  • 10. Logstash Data From Any Source Elasticsearch Instantly Search & Analyze Kibana Actionable Insight Elasticsearch is a NoSQL database that is based on the Lucene search engine  indexes and stores the information Kibana is a visualization layer that works on top of Elasticsearch  presents the data in visualizations that provide actionable insights Logstash is a log pipeline tool that accepts inputs from various sources, executes different transformations, and exports the data to various targets  collects and parses logs 10
  • 11. 11
  • 12. ELK modules Open Source — • ElasticSearch • Logstash • Kibana • Beats • data shippers – collect, parse & ship Extension plugins — • Alerting (Watcher) • Proactively monitoring and alerting based on elasticsearch queries or conditions • Security (Shield) • Protect and provide security to elastic stack • Monitoring (Marvel) • Monitor and diagnose health and performance of elastics cluster • Graph • discover and explore the relationships live in data by adding relevance to your exploration 12
  • 13. Connect Speedy Search with Big Data Analytics – Elasticsearch for Apache Hadoop ES-Hadoop -- a two-way connector • Read and write data to ES and query it in real time 13
  • 14. let’s look into ELK stack 14
  • 18. Logstash architecture Ip: 140.1.1.1 Ip: 140.1.1.1 City: Zurich Country: CH 18
  • 20. Logstash Input plugins • Stdin – Reads events from standard input • File – Streams events from files (similar to “tail -0F”) • Syslog – Reads syslog messages as events • Eventlog – Pulls events from theWindows Event Log • Imap – read mail from an IMAP server • Rss – captures the output of command line tools as an event • Snmptrap – creates events based on SNMP trap messages • Twitter – Reads events from the Twitter Streaming API • Irc – reads events from an IRC server • Exec – Captures the output of a shell command as an event • Elasticsearch – Reads query results from an Elasticsearch cluster • …. 20
  • 21. Logstash Filter plugins • grok – parses unstructured event data into fields • Mutate – performs mutations on fields • Geoip – adds geographical information about an IP address • Date – parse dates from fields to use as the Logstash timestamp for an event • Cidr – checks IP addresses against a list of network blocks • Drop – drops all events • … 21
  • 22. Logstash Output plugins • Stdout – prints events to the standard output • Csv – write events to disk in a delimited format • Email – sends email to a specified address when output is received • Elasticsearch – stores logs in Elasticsearch • Exec – runs a command for a matching event • File – writes events to files on disk • mongoDB – writes events to MongoDB • Redmine – creates tickets using the Redmine API • …. 22
  • 23. filter { if [type] == "syslog" { grok { match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: %{GREEDYDATA:syslog_message}" } add_field => [ "received_at", "%{@timestamp}" ] add_field => [ "received_from", "%{host}" ] } date { match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ] } } } Dec 23 14:30:01 louis CRON[619]: (www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log) 23
  • 24. { "message" => "Dec 23 14:30:01 louis CRON[619]: (www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)", "@timestamp" => "2013-12-23T22:30:01.000Z", "@version" => "1", "type" => "syslog", "host" => "0:0:0:0:0:0:0:1:52617", "syslog_timestamp" => "Dec 23 14:30:01", "syslog_hostname" => "louis", "syslog_program" => "CRON", "syslog_pid" => "619", "syslog_message" => "(www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)", "received_at" => "2013-12-23 22:49:22 UTC", "received_from" => "0:0:0:0:0:0:0:1:52617", "syslog_severity_code" => 5, "syslog_facility_code" => 1, "syslog_facility" => "user-level", "syslog_severity" => "notice" } 24
  • 26. 26
  • 28. Deploying and scaling Logstash Minimal installation Using Filters 28
  • 29. Deploying and scaling Logstash Using log shipper to minimize the resource demands on Logstash Scaling to a Larger Elasticsearch Cluster 29
  • 30. Deploying and scaling Logstash Managing Throughput Spikes with Message Queuing 30
  • 31. Multiple Connections for Logstash High Availability 31
  • 33. ElasticSearch • Built on top of Apache Lucene™, a full-text search-engine library • A Schema-free, REST & JSON based distributed search engine with real-time analytics • Capable of scaling to hundreds of servers and petabytes of structured and unstructured data • Open Source:Apache License 2.0 • Wikipedia uses Elasticsearch to provide full-text search with highlighted search snippets, andsearch-as-you-type and did-you- mean suggestions • The Guardian uses Elasticsearch to combine visitor logs with social-network data to provide real-time feedback to its editors about the public’s response to new articles • Stack Overflow combines full-text search with geolocation queries and uses more-like-this to find related questions and answers • GitHub uses Elasticsearch to query 130 billion lines of code Real scalability comes from horizontal scale Schema-flexible 33
  • 34. Elasticsearch vs. Relational DB ElasticSearch Relational DB Index Database Type Table Document Row Field Column Shard Partition Mapping Schema - (everything is indexed) Index Query DSL (domain specific language) SQL Shards are how Elasticsearch distributes data around your cluster 34
  • 35. What is a shard • a shard is a single instance of Lucene, and is a complete search engine in its own right • Documents are stored and indexed in shards  shards are allocated to nodes in your cluster • As your cluster grows or shrinks, Elasticsearch will automatically migrate shards between nodes so that the cluster remains balanced • A shard can be either a primary shard or a replica shard • Each document in your index belongs to a single primary shard • A replica shard is just a copy of a primary shard 35
  • 36. ElasticSearch clustering – single node cluster • Node = running instance of ES • Cluster = 1+ nodes with the same cluster.name • Every cluster has 1 master node • 1 Cluster can have any number of indexes 36
  • 37. A cluster consists of one or more nodes with the same cluster.name • All primary and replica shards are allocated • Each index has one primary (P) and one replica (R) shard • Clients talk to any node in the cluster ElasticSearch clustering – adding a second node 37
  • 38. ElasticSearch clustering – adding a third node • More primary shards: • faster indexing • more scale • More replicas: • faster searching • more failover Increase the number of replicas 38
  • 39. Creating, Indexing, and Deleting a document 1. The client sends a create, index, or delete request to Node 1 2. The node uses the document’s _id to determine that the document belongs to shard 0. It forwards the request to Node 3, where the primary copy of shard 0 is currently allocated 3. Node 3 executes the request on the primary shard. If it is successful, it forwards the request in parallel to the replica shards on Node 1 and Node 2. Once all of the replica shards report success, Node 3 reports success to the coordinating node, which reports success to the client. 39
  • 40. Retrieving a Document 1. The client sends a get request to node 1 2. The node uses the document’s _id to determine that the document belongs to shard 0. Copies of shard 0 exist on all three nodes. On this occasion, it forwards the request to node 2. 3. Node 2 returns the document to node 1, which returns the document to the client. For read requests, the coordinating node will choose a different shard copy on every request in order to balance the load 40
  • 41. Partial update to a document When a primary shard forwards changes to its replica shards, it doesn’t forward the update request. Instead it forwards the new version of the full document. 41
  • 42. Multidocument Patterns • the coordinating node knows in which shard each document lives. • It breaks up the multidocument request into a multidocument request per shard, and forwards these in parallel to each participating node • Once it receives answers from each node, it collates their responses into a single response mget bulk 42
  • 43. Talking to Elasticsearch • RESTful API with JSON over HTTP • Over port 9200 • Access via web client, or command line by curl command • JSON ( JavaScript Object Notation )  the standard format used by NoSQL • Elasticsearch clients • Java API, Java REST client, JavaScript API, PHP API, Python API, Perl API… HTTP method or verb: GET, POST, PUT, HEAD, or DELETE 43
  • 44. Indexing a document • Store a document in an index so that it can be retrieved and queried • Like the INSERT keyword in SQL 44
  • 45. Retrieving documents • Using GET method to retrieve document • We can retrieve a specific document if we happen to know its id 45
  • 46. Performing Queries • Using the q=<query> form performs a full-text search by parsing the query string value • Query with query DSL, which is specified using a JSON request body 46
  • 47. Query DSL – Combining Filters Bool Filter 47
  • 48. Query DSL – Nesting Boolean Queries 48
  • 50. Kibana • Search, view, and interact with data stored in Elasticsearch indices • Execute queries on data & visualize results in charts, tables, and maps • Add/remove widgets • Share/Save/Load dashboards • Open Source:Apache License 2.0 50
  • 51. 51
  • 52. 52
  • 54. 54
  • 56. Cisco Talos Security Intelligence and Research Group: Hunting for Hackers • Focus -- Creating leading threat intelligence • Aggregation and analysis of unrivaled telemetry data at Cisco, encompassing: • Billions of web requests and emails • Millions of malware samples • Open source data sets (snort, clamAV…) • Millions of network intrusions 56
  • 57. CiscoTalos use ELK to analyze… • Sandbox data cluster • Dynamic malware analysis reports • Search for related pattern, malewares • ES stats • 10 nodes • 3 TB • 100k reports/day • ~8 months of data • Honeypot cluster • Collect attackers’ attempt • { Account, password } pair • Executed commands • url of download files • Suspicious command center for report back 57
  • 58. Yale’s {elastic}SEARCH – The Search for Cancer’s Causes and Cures https://www.elastic.co/elasticon/2015/sf/videos/search-for-cancer-causes-and-cures • With Next generation sequencing technology, the lab can process 8 million patients specimens yearly • How to interpret this amount of data  what software can be used 58
  • 59. 59
  • 60. NYC restaurants inspection @ELK • Real data from NYC open data project • Restaurants inspection data • Restaurants info • Inspection info • Violation codes and description 60