SlideShare a Scribd company logo
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
≈ LEAVE A COMMENT 
[] 
Tags 
leveragebigdata 
— 
Elastic Search integration with Hadoop 
28 Saturday Jun 2014 
POSTED BY LEVERAGEBIGDATA IN UNCATEGORIZED 
Elastic Search, Hadoop, Hive, MapReduce 
Elastic is open source distributed search engine, based on lucene framework with Rest API. You 
can download the elastic search using the URL 
http://www.elasticsearch.org/overview/elkdownloads/. Unzip the downloaded zip or tar file and 
then start one instance or node of elastic search by running the script ‘elasticsearch- 
1.2.1/bin/elasticsearch’ as shown below: 
Installing plugin: 
We can install plugins for enhance feature like elasticsearch-head provide the web interface to 
interact with its cluster. Use the command ‘elasticsearch-1.2.1/bin/plugin -install 
mobz/elasticsearch-head’ as shown below: 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 1/9
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
And, Elastic Search web interface can be using url: http://localhost:9200/_plugin/head/ 
Creating the index: 
(You can skip this step) In Search domain, index is like relational database. By default number of 
shared created is ’5′ and replication factor “1″ which can be changed on creation depending on 
your requirement. We can increase the number of replication factor but not number of shards. 
1 curl -XPUT "http://localhost:9200/movies/" -d '{"settings" : {"number_of_shards" Create Elastic Search Index 
Loading data to Elastic Search: 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 2/9
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
If we put data to the search domain it will automatically create the index. 
Load data using -XPUT 
We need to specify the id (1) as shown below: 
1 curl -XPUT "http://localhost:9200/movies/movie/1" -d '{"title": "Men with Wings", 1 curl -XPOST "http://localhost:9200/movies/movie" -d' { "title": "Lawrence of Arabia", 1 curl -XPOST "http://localhost:9200/_search" -d' { "query": { "query_string": { Note: movies->index, movie->index type, 1->id 
Elastic Search -XPUT 
Load data using -XPOST 
The id will be automatically generated as shown below: 
Elastic Search -XPOST 
Note: _id: U2oQjN5LRQCW8PWBF9vipA is automatically generated. 
The _search endpoint 
The index document can be searched using below query: 
ES Search Result 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 3/9
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
Integrating with Map Reduce (Hadoop 1.2.1) 
To integrate Elastic Search with Map Reduce follow the below steps: 
Add a dependency to pom.xml: 
123456789 
<dependency> 
<groupId>org.elasticsearch</groupId> 
<artifactId>elasticsearch-hadoop</artifactId> 
<version>2.0.0</version> 
</dependency> 
or Download and add elasticSearch-hadoop.jar file to classpath. 
Elastic Search as source & HDFS as sink: 
In Map Reduce job, you specify the index/index type of search engine from where you need to 
fetch data in hdfs file system. And input format type as ‘EsInputFormat’ (This format type is 
defined in elasticsearch-hadoop jar). In org.apache.hadoop.conf.Configuration set elastic search 
index type using field ‘es.resource’ and any search query using field ‘es.query’ and also set 
InputFormatClass as ‘EsInputFormat’ as shown below: 
ElasticSourceHadoopSinkJob.java 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
import java.io.IOException; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.MapWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; 
import org.elasticsearch.hadoop.mr.EsInputFormat; 
public class ElasticSourceHadoopSinkJob { 
public static void main(String arg[]) throws IOException, ClassNotFoundException, Configuration conf = new Configuration(); 
conf.set("es.resource", "movies/movie"); 
//conf.set("es.query", "?q=kill"); 
final Job job = new Job(conf, 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 4/9
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
"Get information from elasticSearch."); 
job.setJarByClass(ElasticSourceHadoopSinkJob.class); 
job.setMapperClass(ElasticSourceHadoopSinkMapper.class); 
job.setInputFormatClass(EsInputFormat.class); 
job.setOutputFormatClass(TextOutputFormat.class); 
job.setNumReduceTasks(0); 
job.setMapOutputKeyClass(Text.class); 
job.setMapOutputValueClass(MapWritable.class); 
FileOutputFormat.setOutputPath(job, new Path(arg[0])); 
System.exit(job.waitForCompletion(true) ? 0 : 1); 
} 
} 
ElasticSourceHadoopSinkMapper.java 
123456789 
10 
11 
12 
13 
14 
15 
import java.io.IOException; 
import org.apache.hadoop.io.MapWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Mapper; 
public class ElasticSourceHadoopSinkMapper extends Mapper<Object, MapWritable, @Override 
protected void map(Object key, MapWritable value, 
Context context) 
throws IOException, InterruptedException { 
context.write(new Text(key.toString()), value); 
} 
} 
HDFS as source & Elastic Search as sink: 
In Map Reduce job, specify the index/index type of search engine from where you need to load 
data from hdfs file system. And input format type as ‘EsOutputFormat’ (This format type is 
defined in elasticsearch-hadoop jar). ElasticSinkHadoopSourceJob.java 
123456789 10 
11 
import java.io.IOException; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.MapWritable; 
import org.apache.hadoop.io.NullWritable; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; 
import org.elasticsearch.hadoop.mr.EsOutputFormat; 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 5/9
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
public class ElasticSinkHadoopSourceJob { 
public static void main(String str[]) throws IOException, ClassNotFoundException, Configuration conf = new Configuration(); 
conf.set("es.resource", "movies/movie"); 
final Job job = new Job(conf, 
"Get information from elasticSearch."); 
job.setJarByClass(ElasticSinkHadoopSourceJob.class); 
job.setMapperClass(ElasticSinkHadoopSourceMapper.class); 
job.setInputFormatClass(TextInputFormat.class); 
job.setOutputFormatClass(EsOutputFormat.class); 
job.setNumReduceTasks(0); 
job.setMapOutputKeyClass(NullWritable.class); 
job.setMapOutputValueClass(MapWritable.class); 
FileInputFormat.setInputPaths(job, new Path("data/ElasticSearchData")); 
System.exit(job.waitForCompletion(true) ? 0 : 1); 
} 
} 
ElasticSinkHadoopSourceMapper.java 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
import java.io.IOException; 
import org.apache.hadoop.io.ArrayWritable; 
import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.MapWritable; 
import org.apache.hadoop.io.NullWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Mapper; 
public class ElasticSinkHadoopSourceMapper extends Mapper<LongWritable, Text, @Override 
protected void map(LongWritable key, Text value, 
Context context) 
throws IOException, InterruptedException { 
String[] splitValue=value.toString().split(","); 
MapWritable doc = new MapWritable(); 
doc.put(new Text("year"), new IntWritable(Integer.parseInt(splitValue[0]))); 
doc.put(new Text("title"), new Text(splitValue[1])); 
doc.put(new Text("director"), new Text(splitValue[2])); 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 6/9
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
String genres=splitValue[3]; 
if(genres!=null){ 
String[] splitGenres=genres.split("$"); 
ArrayWritable genresList=new ArrayWritable(splitGenres); 
doc.put(new Text("genres"), genresList); 
} 
context.write(NullWritable.get(), doc); 
} 
} 
Integrate with Hive: 
Download elasticsearch-hadoop.jar file and include it in path using hive.aux.jars.path as shown 
below: bin/hive –hiveconf hive.aux.jars.path=<path-of-jar>/elasticsearch-hadoop-2.0.0.jar or ADD 
elasticsearch-hadoop-2.0.0.jar to <hive-home>/lib and <hadoop-home>/lib 
Elastic Search as source & Hive as sink: 
Now, create external table to load data from Elastic search as shown below: 
1 CREATE EXTERNAL TABLE movie (id BIGINT, title STRING, director STRING, year BIGINT, 1 CREATE TABLE movie_internal (title STRING, id BIGINT, director STRING, year BIGINT, You need to specify the elastic search index type using ‘es.resource’ and can specify query using 
‘es.query’. 
Load data from Elastic Search to Hive 
Elastic Search as sink & Hive as source: 
Create an internal table in hive like ‘movie_internal’ and load data to it. Then load data from 
internal table to elastic search as shown below: 
Create internal table: 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 7/9
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
Load data to internal table: 
1 LOAD DATA LOCAL INPATH '<path>/hiveElastic.txt' OVERWRITE INTO TABLE movie_internal; 
hiveElastic.txt 
12 
Title1,1,dire1,2003,Action$Crime$Thriller 
Title2,2,dire2,2007,Biography$Crime$Drama 
Load data from hive internal table to ElasticSearch : 
1 INSERT OVERWRITE TABLE movie SELECT NULL, m.title, m.director, m.year, m.genres Load data from Hive to Elastic Search 
Verify inserted data from Elastic Search query 
References: 
1. ElasticSearch 
2. Apache Hadoop 
3. Apache Hbase 
4. Apache Spark 
5. JBKSoft Technologies 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 8/9
11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 
About Occasionally, these ads 
some of your visitors may see an advertisement here. 
Tell me more | Dismiss this message 
Create a free website or blog at WordPress.com. The Chateau Theme. 
http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 9/9
Ad

More Related Content

What's hot (20)

2014 spark with elastic search
2014   spark with elastic search2014   spark with elastic search
2014 spark with elastic search
Henry Saputra
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
Alexandre Rafalovitch
 
Debugging and Testing ES Systems
Debugging and Testing ES SystemsDebugging and Testing ES Systems
Debugging and Testing ES Systems
Chris Birchall
 
Building a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearchBuilding a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearch
Mark Greene
 
Distributed in memory data grid
Distributed in memory data gridDistributed in memory data grid
Distributed in memory data grid
Alexander Albul
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
Matteo Moci
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
Jonathan Katz
 
Spark with Elasticsearch
Spark with ElasticsearchSpark with Elasticsearch
Spark with Elasticsearch
Holden Karau
 
Powershell for Log Analysis and Data Crunching
 Powershell for Log Analysis and Data Crunching Powershell for Log Analysis and Data Crunching
Powershell for Log Analysis and Data Crunching
Michelle D'israeli
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
Sematext Group, Inc.
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
Mitsuharu Hamba
 
ElasticSearch
ElasticSearchElasticSearch
ElasticSearch
Luiz Rocha
 
Distributed percolator in elasticsearch
Distributed percolator in elasticsearchDistributed percolator in elasticsearch
Distributed percolator in elasticsearch
martijnvg
 
Web scraping with nutch solr part 2
Web scraping with nutch solr part 2Web scraping with nutch solr part 2
Web scraping with nutch solr part 2
Mike Frampton
 
Understanding OpenStack Deployments - PuppetConf 2014
Understanding OpenStack Deployments - PuppetConf 2014Understanding OpenStack Deployments - PuppetConf 2014
Understanding OpenStack Deployments - PuppetConf 2014
Puppet
 
Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용
종민 김
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
NAVER D2
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
Tom Z Zeng
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
dnoble00
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014
Holden Karau
 
2014 spark with elastic search
2014   spark with elastic search2014   spark with elastic search
2014 spark with elastic search
Henry Saputra
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
Alexandre Rafalovitch
 
Debugging and Testing ES Systems
Debugging and Testing ES SystemsDebugging and Testing ES Systems
Debugging and Testing ES Systems
Chris Birchall
 
Building a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearchBuilding a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearch
Mark Greene
 
Distributed in memory data grid
Distributed in memory data gridDistributed in memory data grid
Distributed in memory data grid
Alexander Albul
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
Matteo Moci
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
Jonathan Katz
 
Spark with Elasticsearch
Spark with ElasticsearchSpark with Elasticsearch
Spark with Elasticsearch
Holden Karau
 
Powershell for Log Analysis and Data Crunching
 Powershell for Log Analysis and Data Crunching Powershell for Log Analysis and Data Crunching
Powershell for Log Analysis and Data Crunching
Michelle D'israeli
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
Sematext Group, Inc.
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
Mitsuharu Hamba
 
Distributed percolator in elasticsearch
Distributed percolator in elasticsearchDistributed percolator in elasticsearch
Distributed percolator in elasticsearch
martijnvg
 
Web scraping with nutch solr part 2
Web scraping with nutch solr part 2Web scraping with nutch solr part 2
Web scraping with nutch solr part 2
Mike Frampton
 
Understanding OpenStack Deployments - PuppetConf 2014
Understanding OpenStack Deployments - PuppetConf 2014Understanding OpenStack Deployments - PuppetConf 2014
Understanding OpenStack Deployments - PuppetConf 2014
Puppet
 
Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용
종민 김
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
NAVER D2
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
Tom Z Zeng
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
dnoble00
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014
Holden Karau
 

Similar to Elastic search integration with hadoop leveragebigdata (20)

Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
Olga Lavrentieva
 
Testing multi outputformat based mapreduce
Testing multi outputformat based mapreduceTesting multi outputformat based mapreduce
Testing multi outputformat based mapreduce
Ashok Agarwal
 
Compass Framework
Compass FrameworkCompass Framework
Compass Framework
Lukas Vlcek
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
Anurag Patel
 
Neo4J and Weka 2
Neo4J and Weka 2 Neo4J and Weka 2
Neo4J and Weka 2
Vasko Yordanov
 
Cascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGCascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUG
Matthew McCullough
 
Streaming using Kafka Flink & Elasticsearch
Streaming using Kafka Flink & ElasticsearchStreaming using Kafka Flink & Elasticsearch
Streaming using Kafka Flink & Elasticsearch
Keira Zhou
 
Speed up your GWT coding with gQuery
Speed up your GWT coding with gQuerySpeed up your GWT coding with gQuery
Speed up your GWT coding with gQuery
Manuel Carrasco Moñino
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
NexThoughts Technologies
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
Jairam Chandar
 
Javascript Continues Integration in Jenkins with AngularJS
Javascript Continues Integration in Jenkins with AngularJSJavascript Continues Integration in Jenkins with AngularJS
Javascript Continues Integration in Jenkins with AngularJS
Ladislav Prskavec
 
Vocanic Map Reduce Lite
Vocanic Map Reduce LiteVocanic Map Reduce Lite
Vocanic Map Reduce Lite
Shreeniwas Iyer
 
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataPigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
Alexander Schätzle
 
Cloud native java script apps
Cloud native java script appsCloud native java script apps
Cloud native java script apps
Gary Sieling
 
Django deployment with PaaS
Django deployment with PaaSDjango deployment with PaaS
Django deployment with PaaS
Appsembler
 
Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2
Gabriele Modena
 
Release 8.1 - Breakfast Paris
Release 8.1 - Breakfast ParisRelease 8.1 - Breakfast Paris
Release 8.1 - Breakfast Paris
Nuxeo
 
CouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourCouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 Hour
Peter Friese
 
Overview of Android Infrastructure
Overview of Android InfrastructureOverview of Android Infrastructure
Overview of Android Infrastructure
Alexey Buzdin
 
Overview of Android Infrastructure
Overview of Android InfrastructureOverview of Android Infrastructure
Overview of Android Infrastructure
C.T.Co
 
Testing multi outputformat based mapreduce
Testing multi outputformat based mapreduceTesting multi outputformat based mapreduce
Testing multi outputformat based mapreduce
Ashok Agarwal
 
Compass Framework
Compass FrameworkCompass Framework
Compass Framework
Lukas Vlcek
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
Anurag Patel
 
Cascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGCascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUG
Matthew McCullough
 
Streaming using Kafka Flink & Elasticsearch
Streaming using Kafka Flink & ElasticsearchStreaming using Kafka Flink & Elasticsearch
Streaming using Kafka Flink & Elasticsearch
Keira Zhou
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
Jairam Chandar
 
Javascript Continues Integration in Jenkins with AngularJS
Javascript Continues Integration in Jenkins with AngularJSJavascript Continues Integration in Jenkins with AngularJS
Javascript Continues Integration in Jenkins with AngularJS
Ladislav Prskavec
 
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataPigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
Alexander Schätzle
 
Cloud native java script apps
Cloud native java script appsCloud native java script apps
Cloud native java script apps
Gary Sieling
 
Django deployment with PaaS
Django deployment with PaaSDjango deployment with PaaS
Django deployment with PaaS
Appsembler
 
Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2
Gabriele Modena
 
Release 8.1 - Breakfast Paris
Release 8.1 - Breakfast ParisRelease 8.1 - Breakfast Paris
Release 8.1 - Breakfast Paris
Nuxeo
 
CouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourCouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 Hour
Peter Friese
 
Overview of Android Infrastructure
Overview of Android InfrastructureOverview of Android Infrastructure
Overview of Android Infrastructure
Alexey Buzdin
 
Overview of Android Infrastructure
Overview of Android InfrastructureOverview of Android Infrastructure
Overview of Android Infrastructure
C.T.Co
 
Ad

Recently uploaded (20)

Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Ad

Elastic search integration with hadoop leveragebigdata

  • 1. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata ≈ LEAVE A COMMENT [] Tags leveragebigdata — Elastic Search integration with Hadoop 28 Saturday Jun 2014 POSTED BY LEVERAGEBIGDATA IN UNCATEGORIZED Elastic Search, Hadoop, Hive, MapReduce Elastic is open source distributed search engine, based on lucene framework with Rest API. You can download the elastic search using the URL http://www.elasticsearch.org/overview/elkdownloads/. Unzip the downloaded zip or tar file and then start one instance or node of elastic search by running the script ‘elasticsearch- 1.2.1/bin/elasticsearch’ as shown below: Installing plugin: We can install plugins for enhance feature like elasticsearch-head provide the web interface to interact with its cluster. Use the command ‘elasticsearch-1.2.1/bin/plugin -install mobz/elasticsearch-head’ as shown below: http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 1/9
  • 2. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata And, Elastic Search web interface can be using url: http://localhost:9200/_plugin/head/ Creating the index: (You can skip this step) In Search domain, index is like relational database. By default number of shared created is ’5′ and replication factor “1″ which can be changed on creation depending on your requirement. We can increase the number of replication factor but not number of shards. 1 curl -XPUT "http://localhost:9200/movies/" -d '{"settings" : {"number_of_shards" Create Elastic Search Index Loading data to Elastic Search: http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 2/9
  • 3. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata If we put data to the search domain it will automatically create the index. Load data using -XPUT We need to specify the id (1) as shown below: 1 curl -XPUT "http://localhost:9200/movies/movie/1" -d '{"title": "Men with Wings", 1 curl -XPOST "http://localhost:9200/movies/movie" -d' { "title": "Lawrence of Arabia", 1 curl -XPOST "http://localhost:9200/_search" -d' { "query": { "query_string": { Note: movies->index, movie->index type, 1->id Elastic Search -XPUT Load data using -XPOST The id will be automatically generated as shown below: Elastic Search -XPOST Note: _id: U2oQjN5LRQCW8PWBF9vipA is automatically generated. The _search endpoint The index document can be searched using below query: ES Search Result http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 3/9
  • 4. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata Integrating with Map Reduce (Hadoop 1.2.1) To integrate Elastic Search with Map Reduce follow the below steps: Add a dependency to pom.xml: 123456789 <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch-hadoop</artifactId> <version>2.0.0</version> </dependency> or Download and add elasticSearch-hadoop.jar file to classpath. Elastic Search as source & HDFS as sink: In Map Reduce job, you specify the index/index type of search engine from where you need to fetch data in hdfs file system. And input format type as ‘EsInputFormat’ (This format type is defined in elasticsearch-hadoop jar). In org.apache.hadoop.conf.Configuration set elastic search index type using field ‘es.resource’ and any search query using field ‘es.query’ and also set InputFormatClass as ‘EsInputFormat’ as shown below: ElasticSourceHadoopSinkJob.java 123456789 10 11 12 13 14 15 16 17 18 19 import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.MapWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.elasticsearch.hadoop.mr.EsInputFormat; public class ElasticSourceHadoopSinkJob { public static void main(String arg[]) throws IOException, ClassNotFoundException, Configuration conf = new Configuration(); conf.set("es.resource", "movies/movie"); //conf.set("es.query", "?q=kill"); final Job job = new Job(conf, http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 4/9
  • 5. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 "Get information from elasticSearch."); job.setJarByClass(ElasticSourceHadoopSinkJob.class); job.setMapperClass(ElasticSourceHadoopSinkMapper.class); job.setInputFormatClass(EsInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); job.setNumReduceTasks(0); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(MapWritable.class); FileOutputFormat.setOutputPath(job, new Path(arg[0])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } ElasticSourceHadoopSinkMapper.java 123456789 10 11 12 13 14 15 import java.io.IOException; import org.apache.hadoop.io.MapWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class ElasticSourceHadoopSinkMapper extends Mapper<Object, MapWritable, @Override protected void map(Object key, MapWritable value, Context context) throws IOException, InterruptedException { context.write(new Text(key.toString()), value); } } HDFS as source & Elastic Search as sink: In Map Reduce job, specify the index/index type of search engine from where you need to load data from hdfs file system. And input format type as ‘EsOutputFormat’ (This format type is defined in elasticsearch-hadoop jar). ElasticSinkHadoopSourceJob.java 123456789 10 11 import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.MapWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.elasticsearch.hadoop.mr.EsOutputFormat; http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 5/9
  • 6. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 public class ElasticSinkHadoopSourceJob { public static void main(String str[]) throws IOException, ClassNotFoundException, Configuration conf = new Configuration(); conf.set("es.resource", "movies/movie"); final Job job = new Job(conf, "Get information from elasticSearch."); job.setJarByClass(ElasticSinkHadoopSourceJob.class); job.setMapperClass(ElasticSinkHadoopSourceMapper.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(EsOutputFormat.class); job.setNumReduceTasks(0); job.setMapOutputKeyClass(NullWritable.class); job.setMapOutputValueClass(MapWritable.class); FileInputFormat.setInputPaths(job, new Path("data/ElasticSearchData")); System.exit(job.waitForCompletion(true) ? 0 : 1); } } ElasticSinkHadoopSourceMapper.java 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 import java.io.IOException; import org.apache.hadoop.io.ArrayWritable; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.MapWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class ElasticSinkHadoopSourceMapper extends Mapper<LongWritable, Text, @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] splitValue=value.toString().split(","); MapWritable doc = new MapWritable(); doc.put(new Text("year"), new IntWritable(Integer.parseInt(splitValue[0]))); doc.put(new Text("title"), new Text(splitValue[1])); doc.put(new Text("director"), new Text(splitValue[2])); http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 6/9
  • 7. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata 24 25 26 27 28 29 30 31 32 33 String genres=splitValue[3]; if(genres!=null){ String[] splitGenres=genres.split("$"); ArrayWritable genresList=new ArrayWritable(splitGenres); doc.put(new Text("genres"), genresList); } context.write(NullWritable.get(), doc); } } Integrate with Hive: Download elasticsearch-hadoop.jar file and include it in path using hive.aux.jars.path as shown below: bin/hive –hiveconf hive.aux.jars.path=<path-of-jar>/elasticsearch-hadoop-2.0.0.jar or ADD elasticsearch-hadoop-2.0.0.jar to <hive-home>/lib and <hadoop-home>/lib Elastic Search as source & Hive as sink: Now, create external table to load data from Elastic search as shown below: 1 CREATE EXTERNAL TABLE movie (id BIGINT, title STRING, director STRING, year BIGINT, 1 CREATE TABLE movie_internal (title STRING, id BIGINT, director STRING, year BIGINT, You need to specify the elastic search index type using ‘es.resource’ and can specify query using ‘es.query’. Load data from Elastic Search to Hive Elastic Search as sink & Hive as source: Create an internal table in hive like ‘movie_internal’ and load data to it. Then load data from internal table to elastic search as shown below: Create internal table: http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 7/9
  • 8. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata Load data to internal table: 1 LOAD DATA LOCAL INPATH '<path>/hiveElastic.txt' OVERWRITE INTO TABLE movie_internal; hiveElastic.txt 12 Title1,1,dire1,2003,Action$Crime$Thriller Title2,2,dire2,2007,Biography$Crime$Drama Load data from hive internal table to ElasticSearch : 1 INSERT OVERWRITE TABLE movie SELECT NULL, m.title, m.director, m.year, m.genres Load data from Hive to Elastic Search Verify inserted data from Elastic Search query References: 1. ElasticSearch 2. Apache Hadoop 3. Apache Hbase 4. Apache Spark 5. JBKSoft Technologies http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 8/9
  • 9. 11/23/2014 Elastic Search integration with Hadoop | leveragebigdata About Occasionally, these ads some of your visitors may see an advertisement here. Tell me more | Dismiss this message Create a free website or blog at WordPress.com. The Chateau Theme. http://leveragebigdata.wordpress.com/2014/06/28/elasticsearch-integration-with-hadoop/ 9/9