SlideShare a Scribd company logo
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 1
Hadoop Workshop using
Cloudera on Amazon EC2
May 2015
Dr.Thanachart Numnonda
IMC Institute
thanachart@imcinstitute.com
Modifiy from Original Version by Danairat T.
Certified Java Programmer, TOGAF – Silver
danairat@gmail.com
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Launch a virtual server
on EC2 Amazon Web Services
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Virtual Server
This lab will use a EC2 virtual server to install a
Hadoop server using the following features:
1. Ubuntu Server 14.04 LTS
2. m3.xLarge 4vCPU, 15 GB memory, 80 GB SSD
3. Security group: create new
4. Keypair: imchadoop
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Select a EC2 service and click on Lunch Instance
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Select an Amazon Machine Image (AMI) and
Ubuntu Server 14.04 LTS (PV)
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Choose m3.xlarge Type virtual server
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Leave configuration details as default
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Add Storage: 30 GB
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Name the instance
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Select Create a new security group > Add Rule as
follows
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Click Launch and choose imchadoop as a key pair
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Review an instance / click Connect for
an instruction to connect to the instance
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Connect to an instance from Mac/Linux
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Connect to an instance from Windows using Putty
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Connect to the instance
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Installing Cloudera on EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Download Cloudera Manager
1) Type command >wget
http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
2) Type command > chmod u+x cloudera-manager-installer.bin
3) Type command > sudo ./cloudera-manager-installer.bin
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Login to Cloudera Manager
Wait several minutes for the Cloudera Manager Server to complete its startup.
Then running web browser: http:// public-ip: 7180
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Select Cloudera Express Edition
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Provide your instance <public ip>
addresses in the cluster
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Browse the private key (imchadoop.pem) file which we have downloaded in the
previous part. Keep Passphrase as blank
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
If you see the above error, DO NOT worry at all, it’s known issue. You can find
the known issue list at Cloudera Issue List.
Click “Back” button until home screen then click “Continue” button
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
If you see the above error, DO NOT worry at all, it’s known issue. You can find
the known issue list at Cloudera Issue List.
Click “Back” button until home screen then click “Continue” button
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Now you will find a tab “Currently Managed Hosts” with their private dns and
private ip address. Select all and click “Continue”
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Finish
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Running Hue
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Running Hue
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Sign in to Hue
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Starting Hue on Cloudera
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Viewing HDFS
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Importing/Exporting
Data to HDFS
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Importing Data to Hadoop
Download War and Peace Full Text
www.gutenberg.org/ebooks/2600
$hadoop fs -mkdir input
$hadoop fs -mkdir output
$hadoop fs -copyFromLocal Downloads/pg2600.txt input
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Review file in Hadoop HDFS
[hdadmin@localhost bin]$ hadoop fs -cat input/pg2600.txt
List HDFS File
Read HDFS File
Retrieve HDFS File to Local File System
Please see also http://hadoop.apache.org/docs/r1.0.4/commands_manual.html
[hdadmin@localhost bin]$ hadoop fs -copyToLocal input/pg2600.txt tmp/file.txt
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Review file in Hadoop HDFS using
File Browse
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Review file in Hadoop HDFS using Hue
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Port Numbers
Daemon Default
Port
Configuration Parameter in
conf/*-site.xml
HDFS Namenode 50070 dfs.http.address
Datanodes 50075 dfs.datanode.http.address
Secondarynamenode 50090 dfs.secondary.http.address
MR JobTracker 50030 mapred.job.tracker.http.addre
ss
Tasktrackers 50060 mapred.task.tracker.http.addr
ess
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Removing data from HDFS using
Shell Command
hdadmin@localhost detach]$ hadoop fs -rm input/pg2600.txt
Deleted hdfs://localhost:54310/input/pg2600.txt
hdadmin@localhost detach]$
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Writing Map/Reduce
Program on Eclipse
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Starting Eclipse in Cloudera VM
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Create a Java Project
Let's name it HadoopWordCount
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 64
Add dependencies to the project
●
Add the following two JARs to your build path
●
hadoop-common.jar and hadoop-mapreduce-client-core.jar. Both can be
founded at /usr/lib/hadoop/client
●
By perform the following steps
– Add a folder named lib to the project
– Copy the mentioned JARs in this folder
– Right-click on the project name >> select Build Path >> then
Configure Build Path
– Click on Add Jars, select these two JARs from the lib folder
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 65
Add dependencies to the project
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 66
Writing a source code
●
Right click the project, the select New >> Package
●
Name the package as org.myorg
●
Right click at org.myorg, the select New >> Class
●
Name the package as WordCount
●
Writing a source code as shown in previoud slides
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 67
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 68
Building a Jar file
●
Right click the project, the select Export
●
Select Java and then JAR file
●
Provide the JAR name, as wordcount.jar
●
Leave the JAR package options as default
●
In the JAR Manifest Specification section, in the botton, specify the Main
class
●
In this case, select WordCount
●
Click on Finish
●
The JAR file will be build and will be located at cloudera/workspace
Note: you may need to re-size the dialog font size by select
Windows >> Preferences >> Appearance >> Colors and Fonts
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 69
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Running Map Reduce and
Deploying to Hadoop Runtime
Environment
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Running Map Reduce Program
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Reviewing MapReduce Job in Hue
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Reviewing MapReduce Job in Hue
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Reviewing MapReduce Output Result
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Reviewing MapReduce Output Result
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Reviewing MapReduce Output Result
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Running Map Reduce
using Oozie workflow
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Using Hue: select WorkFlow >> Editor
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 79
Create a new workflow
●
Click Create button; the following screen will be displayed
●
Name the workflow as WordCountWorkflow
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 80
Select a Java job for the workflow
●
From the Oozie editor, drag Java and drop between start and end
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 81
Edit the Java Job
●
Assign the following value
– Name: WordCount
– Jar name: wordcount.jar (select … choose upload from local machine)
– Main Class: org.myorg.WordCount
– Arguments: input/* output/wordcount_output2
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 82
Submit the workflow
●
Click Done, follow by Save
●
Then click submit
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Working with a csv data
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 84
A sample CSV data
●
The input data is access logs with the following form
Date, Requesting-IP-Address
●
We will write a map reduce program to count the number of hits to the
website per country.
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
HitsByCountryMapper.java
package learning.bigdata.mapreduce;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class HitsByCountryMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static String[] COUNTRIES = { "India", "UK", "US", "China" };
private Text outputKey = new Text();
private IntWritable outputValue = new IntWritable();
@Override
protected void setup(Context context) throws IOException, InterruptedException {
super.setup(context);
}
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
try {
String valueString = value.toString();
// Split the value string to get Date and ipAddress
String[] row = valueString.split(",");
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
HitsByCountryMapper.java
// row[0]= Date and row[1]=ipAddress
String ipAddress = row[1];
// Get the country name to which the ipAddress belongs
String countryName = getCountryNameFromIpAddress(ipAddress);
outputKey.set(countryName);
outputValue.set(1);
context.write(outputKey, outputValue);
} catch (ArrayIndexOutOfBoundsException ex) {
context.getCounter("Custom counters", "MAPPER_EXCEPTION_COUNTER").increment(1);
ex.printStackTrace();
}
}
private static String getCountryNameFromIpAddress(String ipAddress) {
if (ipAddress != null && !ipAddress.isEmpty()) {
int randomIndex = Math.abs(ipAddress.hashCode()) % COUNTRIES.length;
return COUNTRIES[randomIndex];
}
return null;
}
}
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
HitsByCountryReducer.java
package learning.bigdata.mapreduce;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class HitsByCountryReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private Text outputKey = new Text();
private IntWritable outputValue = new IntWritable();
private int count = 0;
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,
InterruptedException {
count = 0;
Iterator<IntWritable> iterator = values.iterator();
while (iterator.hasNext()) {
IntWritable value = iterator.next();
count += value.get();
}
outputKey.set(key);
outputValue.set(count);
context.write(outputKey, outputValue);
}
}
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
HitsByCountry.java
package learning.bigdata.main;
import learning.bigdata.mapreduce.HitsByCountryMapper;
import learning.bigdata.mapreduce.HitsByCountryReducer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class HitsByCountry extends Configured implements Tool {
private static final String JOB_NAME = "Calculating hits by country";
public static void main(String[] args) throws Exception {
if (args.length < 2) {
System.out.println("Usage: HitsByCountry <comma separated input directories> <output dir>");
System.exit(-1);
}
int result = ToolRunner.run(new HitsByCountry(), args);
System.exit(result);
}
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
HitsByCountry.java
@Override
public int run(String[] args) throws Exception {
try {
Configuration conf = getConf();
Job job = Job.getInstance(conf);
job.setJarByClass(HitsByCountry.class);
job.setJobName(JOB_NAME);
job.setMapperClass(HitsByCountryMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setReducerClass(HitsByCountryReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, args[0]);
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
} catch (Exception e) {
e.printStackTrace();
return 1;
}
}
}
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Lecture: Developing Complex
Hadoop MapReduce
Applications
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 92
Choosing appropriate Hadoop data types
●
Hadoop uses the Writable interface based classes as
the data types for the MapReduce computations.
●
Choosing the appropriate Writable data types for your
input, intermediate, and output data can have a large
effect on the performance and the programmability of
your MapReduce programs.
●
In order to be used as a value data type, a data type
must implement the org.apache.hadoop.io.Writable
interface.
●
In order to be used as a key data type, a data type must
implement the
org.apache.hadoop.io.WritableComparable<T> interface
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 93
Examples
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 94
Hadoop built-in data types
●
Text: This stores a UTF8 text
●
BytesWritable: This stores a sequence of bytes
●
VIntWritable and VLongWritable: These store variable
length integer and long values
●
NullWritable: This is a zero-length Writable type that can
be used when you don't want to use a key or value type
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 95
Hadoop built-in data types
●
The following Hadoop build-in collection data types can
only be used as value types.
– ArrayWritable: This stores an array of values belonging to a
Writable type.
– TwoDArrayWritable: This stores a matrix of values belonging to
the same Writable type.
– MapWritable: This stores a map of key-value pairs. Keys and
values should be of the Writable data types.
– SortedMapWritable: This stores a sorted map of key-value
pairs. Keys should implement the WritableComparable
interface.
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 96
Implementing a custom Hadoop Writable
data type
●
we can easily write a custom Writable data type by
implementing the org.apache.hadoop.io.Writable
interface
●
The Writable interface-based types can be used as
value types in Hadoop MapReduce computations.
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 97
Examples
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 98
Examples
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 99
Examples
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 100
Choosing a suitable Hadoop InputFormat
for your input data format
●
Hadoop supports processing of many different formats
and types of data through InputFormat.
●
The InputFormat of a Hadoop MapReduce computation
generates the key-value pair inputs for the mappers by
parsing the input data.
●
InputFormat also performs the splitting of the input data
into logical partitions
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 101
InputFormat that Hadoop provide
●
TextInputFormat: This is used for plain text files.
TextInputFormat generates a key-value record for each
line of the input text files.
●
NLineInputFormat: This is used for plain text files.
NlineInputFormat splits the input files into logical splits
of fixed number of lines.
●
SequenceFileInputFormat: For Hadoop Sequence file
input data
●
DBInputFormat: This supports reading the input data for
MapReduce computation from a SQL table.
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 102
Implementing new input data formats
●
Hadoop enables us to implement and specify custom
InputFormat implementations for our MapReduce
computations.
●
A InputFormat implementation should extend the
org.apache.hadoop.mapreduce.InputFormat<K,V>
abstract class
●
overriding the createRecordReader() and getSplits()
methods.
Danairat T., 2013, danairat@gmail.comBig Data Hadoop – Hands On Workshop 103
Formatting the results of MapReduce
computations – using Hadoop
OutputFormats
●
it is important to store the result of a MapReduce
computation in a format that can be consumed
efficiently by the target application
●
We can use Hadoop OutputFormat interface to define
the data storage format
●
A OutputFormat prepares the output location and
provides a RecordWriter implementation to perform the
actual serialization and storage of the data.
●
Hadoop uses the
org.apache.hadoop.mapreduce.lib.output.
TextOutputFormat<K,V> as the default OutputFormat
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Analytics Using
MapReduce
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Lecture
Understanding HBase
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Introduction
An open source, non-relational, distributed database
HBase is an open source, non-relational, distributed database
modeled after Google's BigTable and is written in Java. It is
developed as part of Apache Software Foundation's Apache
Hadoop project and runs on top of HDFS (, providing
BigTable-like capabilities for Hadoop. That is, it provides a
fault-tolerant way of storing large quantities of sparse data.
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
HBase Features
●
Hadoop database modelled after Google's Bigtable
●
Column oriented data store, known as Hadoop Database
●
Support random realtime CRUD operations (unlike
HDFS)
●
No SQL Database
●
Opensource, written in Java
●
Run on a cluster of commodity hardware
Hive.apache.org
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
When to use Hbase?
●
When you need high volume data to be stored
●
Un-structured data
●
Sparse data
●
Column-oriented data
●
Versioned data (same data template, captured at various
time, time-elapse data)
●
When you need high scalability
Hive.apache.org
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Which one to use?
●
HDFS
●
Only append dataset (no random write)
●
Read the whole dataset (no random read)
●
HBase
●
Need random write and/or read
●
Has thousands of operation per second on TB+ of data
●
RDBMS
●
Data fits on one big node
●
Need full transaction support
●
Need real-time query capabilities
Hive.apache.org
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
HBase Components
Hive.apache.org
●
Region
●
Row of table are stores
●
Region Server
●
Hosts the tables
●
Master
●
Coordinating the Region
Servers
●
ZooKeeper
●
HDFS
●
API
●
The Java Client API
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
HBase Shell Commands
Hive.apache.org
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Running HBase
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Starting HBase shell
[hdadmin@localhost ~]$
[hdadmin@localhost ~]$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.10, r1504995, Fri Jul 19 20:24:16 UTC 2013
hbase(main):001:0>
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Create a table and insert data in HBase
hbase(main):009:0> create 'test', 'cf'
0 row(s) in 1.0830 seconds
hbase(main):010:0> put 'test', 'row1', 'cf:a', 'val1'
0 row(s) in 0.0750 seconds
hbase(main):011:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1375363287644,
value=val1
1 row(s) in 0.0640 seconds
hbase(main):002:0> get 'test', 'row1'
COLUMN CELL
cf:a timestamp=1375363287644, value=val1
1 row(s) in 0.0370 seconds
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Using Data Browsers in Hue for HBase
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Using Data Browsers in Hue for HBase
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Using Data Browsers in Hue for HBase
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Recommendation to Further Study
Danairat T., , danairat@gmail.com: Thanachart Numnonda, thanachart@imcinstitute.com May 2015Hadoop Workshop using Cloudera on Amazon EC2
Thank you
www.imcinstitute.com
www.facebook.com/imcinstitute
Ad

More Related Content

What's hot (20)

Connecting the smart factory to the cloud with MQTT and Sparkplug
Connecting the smart factory to the cloud with MQTT and SparkplugConnecting the smart factory to the cloud with MQTT and Sparkplug
Connecting the smart factory to the cloud with MQTT and Sparkplug
Ian Skerrett
 
Java Basics
Java BasicsJava Basics
Java Basics
shivamgarg_nitj
 
Getting up to speed with Kafka Connect: from the basics to the latest feature...
Getting up to speed with Kafka Connect: from the basics to the latest feature...Getting up to speed with Kafka Connect: from the basics to the latest feature...
Getting up to speed with Kafka Connect: from the basics to the latest feature...
HostedbyConfluent
 
Nested class in java
Nested class in javaNested class in java
Nested class in java
ChiradipBhattacharya
 
gRPC with java
gRPC with javagRPC with java
gRPC with java
Knoldus Inc.
 
VLAN Trunking Protocol
VLAN Trunking ProtocolVLAN Trunking Protocol
VLAN Trunking Protocol
Netwax Lab
 
Deploying your first application with Kubernetes
Deploying your first application with KubernetesDeploying your first application with Kubernetes
Deploying your first application with Kubernetes
OVHcloud
 
OpenFlow tutorial
OpenFlow tutorialOpenFlow tutorial
OpenFlow tutorial
openflow
 
OCA Java SE 8 Exam Chapter 6 Exceptions
OCA Java SE 8 Exam Chapter 6 ExceptionsOCA Java SE 8 Exam Chapter 6 Exceptions
OCA Java SE 8 Exam Chapter 6 Exceptions
İbrahim Kürce
 
The Eclipse Transformer Project
The Eclipse Transformer Project The Eclipse Transformer Project
The Eclipse Transformer Project
Jakarta_EE
 
Developer’s guide to contributing code to Kafka with Mickael Maison and Tom B...
Developer’s guide to contributing code to Kafka with Mickael Maison and Tom B...Developer’s guide to contributing code to Kafka with Mickael Maison and Tom B...
Developer’s guide to contributing code to Kafka with Mickael Maison and Tom B...
HostedbyConfluent
 
Mitigating Common CloudStack Instance Deployment Failures
Mitigating Common CloudStack Instance Deployment FailuresMitigating Common CloudStack Instance Deployment Failures
Mitigating Common CloudStack Instance Deployment Failures
ShapeBlue
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
Lhouceine OUHAMZA
 
[오픈소스컨설팅]인프라 자동화 도구 Chef
[오픈소스컨설팅]인프라 자동화 도구  Chef[오픈소스컨설팅]인프라 자동화 도구  Chef
[오픈소스컨설팅]인프라 자동화 도구 Chef
Open Source Consulting
 
How Sentry can help us with bugs
How Sentry can help us with bugsHow Sentry can help us with bugs
How Sentry can help us with bugs
Erison Silva
 
Wrapper class
Wrapper classWrapper class
Wrapper class
kamal kotecha
 
Method Overloading in Java
Method Overloading in JavaMethod Overloading in Java
Method Overloading in Java
Sonya Akter Rupa
 
What's Coming In CloudStack 4.18
What's Coming In CloudStack 4.18What's Coming In CloudStack 4.18
What's Coming In CloudStack 4.18
ShapeBlue
 
Dell Networking Switch Configuration Examples
Dell Networking Switch Configuration ExamplesDell Networking Switch Configuration Examples
Dell Networking Switch Configuration Examples
ssuserecfcc8
 
WiFi – Mobile BNG Offload Deployments
WiFi – Mobile BNG Offload DeploymentsWiFi – Mobile BNG Offload Deployments
WiFi – Mobile BNG Offload Deployments
Cisco Canada
 
Connecting the smart factory to the cloud with MQTT and Sparkplug
Connecting the smart factory to the cloud with MQTT and SparkplugConnecting the smart factory to the cloud with MQTT and Sparkplug
Connecting the smart factory to the cloud with MQTT and Sparkplug
Ian Skerrett
 
Getting up to speed with Kafka Connect: from the basics to the latest feature...
Getting up to speed with Kafka Connect: from the basics to the latest feature...Getting up to speed with Kafka Connect: from the basics to the latest feature...
Getting up to speed with Kafka Connect: from the basics to the latest feature...
HostedbyConfluent
 
VLAN Trunking Protocol
VLAN Trunking ProtocolVLAN Trunking Protocol
VLAN Trunking Protocol
Netwax Lab
 
Deploying your first application with Kubernetes
Deploying your first application with KubernetesDeploying your first application with Kubernetes
Deploying your first application with Kubernetes
OVHcloud
 
OpenFlow tutorial
OpenFlow tutorialOpenFlow tutorial
OpenFlow tutorial
openflow
 
OCA Java SE 8 Exam Chapter 6 Exceptions
OCA Java SE 8 Exam Chapter 6 ExceptionsOCA Java SE 8 Exam Chapter 6 Exceptions
OCA Java SE 8 Exam Chapter 6 Exceptions
İbrahim Kürce
 
The Eclipse Transformer Project
The Eclipse Transformer Project The Eclipse Transformer Project
The Eclipse Transformer Project
Jakarta_EE
 
Developer’s guide to contributing code to Kafka with Mickael Maison and Tom B...
Developer’s guide to contributing code to Kafka with Mickael Maison and Tom B...Developer’s guide to contributing code to Kafka with Mickael Maison and Tom B...
Developer’s guide to contributing code to Kafka with Mickael Maison and Tom B...
HostedbyConfluent
 
Mitigating Common CloudStack Instance Deployment Failures
Mitigating Common CloudStack Instance Deployment FailuresMitigating Common CloudStack Instance Deployment Failures
Mitigating Common CloudStack Instance Deployment Failures
ShapeBlue
 
[오픈소스컨설팅]인프라 자동화 도구 Chef
[오픈소스컨설팅]인프라 자동화 도구  Chef[오픈소스컨설팅]인프라 자동화 도구  Chef
[오픈소스컨설팅]인프라 자동화 도구 Chef
Open Source Consulting
 
How Sentry can help us with bugs
How Sentry can help us with bugsHow Sentry can help us with bugs
How Sentry can help us with bugs
Erison Silva
 
Method Overloading in Java
Method Overloading in JavaMethod Overloading in Java
Method Overloading in Java
Sonya Akter Rupa
 
What's Coming In CloudStack 4.18
What's Coming In CloudStack 4.18What's Coming In CloudStack 4.18
What's Coming In CloudStack 4.18
ShapeBlue
 
Dell Networking Switch Configuration Examples
Dell Networking Switch Configuration ExamplesDell Networking Switch Configuration Examples
Dell Networking Switch Configuration Examples
ssuserecfcc8
 
WiFi – Mobile BNG Offload Deployments
WiFi – Mobile BNG Offload DeploymentsWiFi – Mobile BNG Offload Deployments
WiFi – Mobile BNG Offload Deployments
Cisco Canada
 

Similar to Hadoop Workshop using Cloudera on Amazon EC2 (18)

Big data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartBig data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera Quickstart
IMC Institute
 
Big data processing using Cloudera Quickstart
Big data processing using Cloudera QuickstartBig data processing using Cloudera Quickstart
Big data processing using Cloudera Quickstart
IMC Institute
 
Thailand Hadoop Big Data Challenge #1
Thailand Hadoop Big Data Challenge #1Thailand Hadoop Big Data Challenge #1
Thailand Hadoop Big Data Challenge #1
IMC Institute
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/Production
IMC Institute
 
Hadoop Workshop on EC2 : March 2015
Hadoop Workshop on EC2 : March 2015Hadoop Workshop on EC2 : March 2015
Hadoop Workshop on EC2 : March 2015
IMC Institute
 
Using Parse Server to send emails via Mandrill
Using Parse Server to send emails via MandrillUsing Parse Server to send emails via Mandrill
Using Parse Server to send emails via Mandrill
Charles Ramos
 
Website with AWS+WORDPRESS
Website with AWS+WORDPRESSWebsite with AWS+WORDPRESS
Website with AWS+WORDPRESS
MrUtsavgohel
 
How to Use IFTTT to Automate your Virtual Life
How to Use IFTTT to Automate your  Virtual Life How to Use IFTTT to Automate your  Virtual Life
How to Use IFTTT to Automate your Virtual Life
Aimee Emejas
 
The art of prestenting to upper-level management
The art of prestenting to upper-level managementThe art of prestenting to upper-level management
The art of prestenting to upper-level management
Thomas Aldous
 
Analyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and HiveAnalyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and Hive
IMC Institute
 
Custom Links Buttons In Salesforce Com
Custom Links Buttons In Salesforce ComCustom Links Buttons In Salesforce Com
Custom Links Buttons In Salesforce Com
amber9904
 
Cloud Computing And Author It Live Tcuk09
Cloud Computing And Author It Live   Tcuk09Cloud Computing And Author It Live   Tcuk09
Cloud Computing And Author It Live Tcuk09
Amanda Caley
 
How to set up a high tech business in the Cloud for 2,000 EUR
How to set up a high tech business in the Cloud for 2,000 EURHow to set up a high tech business in the Cloud for 2,000 EUR
How to set up a high tech business in the Cloud for 2,000 EUR
kantanmt
 
Google Cloud Pricing Calculators
Google Cloud Pricing CalculatorsGoogle Cloud Pricing Calculators
Google Cloud Pricing Calculators
Digital Shende
 
AI/ML Powered Personalized Recommendations in Gaming Industry
AI/ML PoweredPersonalized Recommendations in Gaming IndustryAI/ML PoweredPersonalized Recommendations in Gaming Industry
AI/ML Powered Personalized Recommendations in Gaming Industry
Hasan Basri AKIRMAK, MSc,ExecMBA
 
AI & Machine Learning at AWS - An Introduction
AI & Machine Learning at AWS - An IntroductionAI & Machine Learning at AWS - An Introduction
AI & Machine Learning at AWS - An Introduction
Daniel Zivkovic
 
Building a Production-ready Predictive App for Customer Service - Alex Ingerm...
Building a Production-ready Predictive App for Customer Service - Alex Ingerm...Building a Production-ready Predictive App for Customer Service - Alex Ingerm...
Building a Production-ready Predictive App for Customer Service - Alex Ingerm...
PAPIs.io
 
Informatica Cloud for Oracle
Informatica Cloud for OracleInformatica Cloud for Oracle
Informatica Cloud for Oracle
Darren Cunningham
 
Big data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartBig data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera Quickstart
IMC Institute
 
Big data processing using Cloudera Quickstart
Big data processing using Cloudera QuickstartBig data processing using Cloudera Quickstart
Big data processing using Cloudera Quickstart
IMC Institute
 
Thailand Hadoop Big Data Challenge #1
Thailand Hadoop Big Data Challenge #1Thailand Hadoop Big Data Challenge #1
Thailand Hadoop Big Data Challenge #1
IMC Institute
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/Production
IMC Institute
 
Hadoop Workshop on EC2 : March 2015
Hadoop Workshop on EC2 : March 2015Hadoop Workshop on EC2 : March 2015
Hadoop Workshop on EC2 : March 2015
IMC Institute
 
Using Parse Server to send emails via Mandrill
Using Parse Server to send emails via MandrillUsing Parse Server to send emails via Mandrill
Using Parse Server to send emails via Mandrill
Charles Ramos
 
Website with AWS+WORDPRESS
Website with AWS+WORDPRESSWebsite with AWS+WORDPRESS
Website with AWS+WORDPRESS
MrUtsavgohel
 
How to Use IFTTT to Automate your Virtual Life
How to Use IFTTT to Automate your  Virtual Life How to Use IFTTT to Automate your  Virtual Life
How to Use IFTTT to Automate your Virtual Life
Aimee Emejas
 
The art of prestenting to upper-level management
The art of prestenting to upper-level managementThe art of prestenting to upper-level management
The art of prestenting to upper-level management
Thomas Aldous
 
Analyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and HiveAnalyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and Hive
IMC Institute
 
Custom Links Buttons In Salesforce Com
Custom Links Buttons In Salesforce ComCustom Links Buttons In Salesforce Com
Custom Links Buttons In Salesforce Com
amber9904
 
Cloud Computing And Author It Live Tcuk09
Cloud Computing And Author It Live   Tcuk09Cloud Computing And Author It Live   Tcuk09
Cloud Computing And Author It Live Tcuk09
Amanda Caley
 
How to set up a high tech business in the Cloud for 2,000 EUR
How to set up a high tech business in the Cloud for 2,000 EURHow to set up a high tech business in the Cloud for 2,000 EUR
How to set up a high tech business in the Cloud for 2,000 EUR
kantanmt
 
Google Cloud Pricing Calculators
Google Cloud Pricing CalculatorsGoogle Cloud Pricing Calculators
Google Cloud Pricing Calculators
Digital Shende
 
AI/ML Powered Personalized Recommendations in Gaming Industry
AI/ML PoweredPersonalized Recommendations in Gaming IndustryAI/ML PoweredPersonalized Recommendations in Gaming Industry
AI/ML Powered Personalized Recommendations in Gaming Industry
Hasan Basri AKIRMAK, MSc,ExecMBA
 
AI & Machine Learning at AWS - An Introduction
AI & Machine Learning at AWS - An IntroductionAI & Machine Learning at AWS - An Introduction
AI & Machine Learning at AWS - An Introduction
Daniel Zivkovic
 
Building a Production-ready Predictive App for Customer Service - Alex Ingerm...
Building a Production-ready Predictive App for Customer Service - Alex Ingerm...Building a Production-ready Predictive App for Customer Service - Alex Ingerm...
Building a Production-ready Predictive App for Customer Service - Alex Ingerm...
PAPIs.io
 
Informatica Cloud for Oracle
Informatica Cloud for OracleInformatica Cloud for Oracle
Informatica Cloud for Oracle
Darren Cunningham
 
Ad

More from IMC Institute (20)

นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14
IMC Institute
 
Digital trends Vol 4 No. 13 Sep-Dec 2019
Digital trends Vol 4 No. 13  Sep-Dec 2019Digital trends Vol 4 No. 13  Sep-Dec 2019
Digital trends Vol 4 No. 13 Sep-Dec 2019
IMC Institute
 
บทความ The evolution of AI
บทความ The evolution of AIบทความ The evolution of AI
บทความ The evolution of AI
IMC Institute
 
IT Trends eMagazine Vol 4. No.12
IT Trends eMagazine  Vol 4. No.12IT Trends eMagazine  Vol 4. No.12
IT Trends eMagazine Vol 4. No.12
IMC Institute
 
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformationเพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
IMC Institute
 
IT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to WorkIT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to Work
IMC Institute
 
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรมมูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
IMC Institute
 
IT Trends eMagazine Vol 4. No.11
IT Trends eMagazine  Vol 4. No.11IT Trends eMagazine  Vol 4. No.11
IT Trends eMagazine Vol 4. No.11
IMC Institute
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformation
IMC Institute
 
บทความ The New Silicon Valley
บทความ The New Silicon Valleyบทความ The New Silicon Valley
บทความ The New Silicon Valley
IMC Institute
 
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
IMC Institute
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformation
IMC Institute
 
The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)
IMC Institute
 
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
IMC Institute
 
IT Trends eMagazine Vol 3. No.9
IT Trends eMagazine  Vol 3. No.9 IT Trends eMagazine  Vol 3. No.9
IT Trends eMagazine Vol 3. No.9
IMC Institute
 
Thailand software & software market survey 2016
Thailand software & software market survey 2016Thailand software & software market survey 2016
Thailand software & software market survey 2016
IMC Institute
 
Developing Business Blockchain Applications on Hyperledger
Developing Business  Blockchain Applications on Hyperledger Developing Business  Blockchain Applications on Hyperledger
Developing Business Blockchain Applications on Hyperledger
IMC Institute
 
Digital transformation @thanachart.org
Digital transformation @thanachart.orgDigital transformation @thanachart.org
Digital transformation @thanachart.org
IMC Institute
 
บทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.orgบทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.org
IMC Institute
 
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformationกลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
IMC Institute
 
นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14
IMC Institute
 
Digital trends Vol 4 No. 13 Sep-Dec 2019
Digital trends Vol 4 No. 13  Sep-Dec 2019Digital trends Vol 4 No. 13  Sep-Dec 2019
Digital trends Vol 4 No. 13 Sep-Dec 2019
IMC Institute
 
บทความ The evolution of AI
บทความ The evolution of AIบทความ The evolution of AI
บทความ The evolution of AI
IMC Institute
 
IT Trends eMagazine Vol 4. No.12
IT Trends eMagazine  Vol 4. No.12IT Trends eMagazine  Vol 4. No.12
IT Trends eMagazine Vol 4. No.12
IMC Institute
 
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformationเพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
IMC Institute
 
IT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to WorkIT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to Work
IMC Institute
 
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรมมูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
IMC Institute
 
IT Trends eMagazine Vol 4. No.11
IT Trends eMagazine  Vol 4. No.11IT Trends eMagazine  Vol 4. No.11
IT Trends eMagazine Vol 4. No.11
IMC Institute
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformation
IMC Institute
 
บทความ The New Silicon Valley
บทความ The New Silicon Valleyบทความ The New Silicon Valley
บทความ The New Silicon Valley
IMC Institute
 
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
IMC Institute
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformation
IMC Institute
 
The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)
IMC Institute
 
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
IMC Institute
 
IT Trends eMagazine Vol 3. No.9
IT Trends eMagazine  Vol 3. No.9 IT Trends eMagazine  Vol 3. No.9
IT Trends eMagazine Vol 3. No.9
IMC Institute
 
Thailand software & software market survey 2016
Thailand software & software market survey 2016Thailand software & software market survey 2016
Thailand software & software market survey 2016
IMC Institute
 
Developing Business Blockchain Applications on Hyperledger
Developing Business  Blockchain Applications on Hyperledger Developing Business  Blockchain Applications on Hyperledger
Developing Business Blockchain Applications on Hyperledger
IMC Institute
 
Digital transformation @thanachart.org
Digital transformation @thanachart.orgDigital transformation @thanachart.org
Digital transformation @thanachart.org
IMC Institute
 
บทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.orgบทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.org
IMC Institute
 
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformationกลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
IMC Institute
 
Ad

Recently uploaded (20)

Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 

Hadoop Workshop using Cloudera on Amazon EC2

  • 1. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 1 Hadoop Workshop using Cloudera on Amazon EC2 May 2015 Dr.Thanachart Numnonda IMC Institute [email protected] Modifiy from Original Version by Danairat T. Certified Java Programmer, TOGAF – Silver [email protected]
  • 2. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Hands-On: Launch a virtual server on EC2 Amazon Web Services
  • 3. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 4. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Virtual Server This lab will use a EC2 virtual server to install a Hadoop server using the following features: 1. Ubuntu Server 14.04 LTS 2. m3.xLarge 4vCPU, 15 GB memory, 80 GB SSD 3. Security group: create new 4. Keypair: imchadoop
  • 5. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Select a EC2 service and click on Lunch Instance
  • 6. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Select an Amazon Machine Image (AMI) and Ubuntu Server 14.04 LTS (PV)
  • 7. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Choose m3.xlarge Type virtual server
  • 8. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Leave configuration details as default
  • 9. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Add Storage: 30 GB
  • 10. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Name the instance
  • 11. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Select Create a new security group > Add Rule as follows
  • 12. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Click Launch and choose imchadoop as a key pair
  • 13. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Review an instance / click Connect for an instruction to connect to the instance
  • 14. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Connect to an instance from Mac/Linux
  • 15. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Connect to an instance from Windows using Putty
  • 16. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Connect to the instance
  • 17. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Hands-On: Installing Cloudera on EC2
  • 18. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Download Cloudera Manager 1) Type command >wget http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin 2) Type command > chmod u+x cloudera-manager-installer.bin 3) Type command > sudo ./cloudera-manager-installer.bin
  • 19. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 20. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 21. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Login to Cloudera Manager Wait several minutes for the Cloudera Manager Server to complete its startup. Then running web browser: http:// public-ip: 7180
  • 22. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Select Cloudera Express Edition
  • 23. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 24. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Provide your instance <public ip> addresses in the cluster
  • 25. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 26. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 27. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 28. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 29. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Browse the private key (imchadoop.pem) file which we have downloaded in the previous part. Keep Passphrase as blank
  • 30. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 31. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 If you see the above error, DO NOT worry at all, it’s known issue. You can find the known issue list at Cloudera Issue List. Click “Back” button until home screen then click “Continue” button
  • 32. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 33. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 If you see the above error, DO NOT worry at all, it’s known issue. You can find the known issue list at Cloudera Issue List. Click “Back” button until home screen then click “Continue” button
  • 34. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 35. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Now you will find a tab “Currently Managed Hosts” with their private dns and private ip address. Select all and click “Continue”
  • 36. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 37. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 38. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 39. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 40. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 41. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 42. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 43. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 44. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 45. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Finish
  • 46. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 47. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Running Hue
  • 48. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Running Hue
  • 49. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Sign in to Hue
  • 50. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Starting Hue on Cloudera
  • 51. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 52. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Viewing HDFS
  • 53. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 54. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Hands-On: Importing/Exporting Data to HDFS
  • 55. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Importing Data to Hadoop Download War and Peace Full Text www.gutenberg.org/ebooks/2600 $hadoop fs -mkdir input $hadoop fs -mkdir output $hadoop fs -copyFromLocal Downloads/pg2600.txt input
  • 56. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Review file in Hadoop HDFS [hdadmin@localhost bin]$ hadoop fs -cat input/pg2600.txt List HDFS File Read HDFS File Retrieve HDFS File to Local File System Please see also http://hadoop.apache.org/docs/r1.0.4/commands_manual.html [hdadmin@localhost bin]$ hadoop fs -copyToLocal input/pg2600.txt tmp/file.txt
  • 57. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Review file in Hadoop HDFS using File Browse
  • 58. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Review file in Hadoop HDFS using Hue
  • 59. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Hadoop Port Numbers Daemon Default Port Configuration Parameter in conf/*-site.xml HDFS Namenode 50070 dfs.http.address Datanodes 50075 dfs.datanode.http.address Secondarynamenode 50090 dfs.secondary.http.address MR JobTracker 50030 mapred.job.tracker.http.addre ss Tasktrackers 50060 mapred.task.tracker.http.addr ess
  • 60. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Removing data from HDFS using Shell Command hdadmin@localhost detach]$ hadoop fs -rm input/pg2600.txt Deleted hdfs://localhost:54310/input/pg2600.txt hdadmin@localhost detach]$
  • 61. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Hands-On: Writing Map/Reduce Program on Eclipse
  • 62. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Starting Eclipse in Cloudera VM
  • 63. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Create a Java Project Let's name it HadoopWordCount
  • 64. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 64 Add dependencies to the project ● Add the following two JARs to your build path ● hadoop-common.jar and hadoop-mapreduce-client-core.jar. Both can be founded at /usr/lib/hadoop/client ● By perform the following steps – Add a folder named lib to the project – Copy the mentioned JARs in this folder – Right-click on the project name >> select Build Path >> then Configure Build Path – Click on Add Jars, select these two JARs from the lib folder
  • 65. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 65 Add dependencies to the project
  • 66. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 66 Writing a source code ● Right click the project, the select New >> Package ● Name the package as org.myorg ● Right click at org.myorg, the select New >> Class ● Name the package as WordCount ● Writing a source code as shown in previoud slides
  • 67. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 67
  • 68. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 68 Building a Jar file ● Right click the project, the select Export ● Select Java and then JAR file ● Provide the JAR name, as wordcount.jar ● Leave the JAR package options as default ● In the JAR Manifest Specification section, in the botton, specify the Main class ● In this case, select WordCount ● Click on Finish ● The JAR file will be build and will be located at cloudera/workspace Note: you may need to re-size the dialog font size by select Windows >> Preferences >> Appearance >> Colors and Fonts
  • 69. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 69
  • 70. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Hands-On: Running Map Reduce and Deploying to Hadoop Runtime Environment
  • 71. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Running Map Reduce Program
  • 72. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Reviewing MapReduce Job in Hue
  • 73. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Reviewing MapReduce Job in Hue
  • 74. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Reviewing MapReduce Output Result
  • 75. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Reviewing MapReduce Output Result
  • 76. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Reviewing MapReduce Output Result
  • 77. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Hands-On: Running Map Reduce using Oozie workflow
  • 78. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Using Hue: select WorkFlow >> Editor
  • 79. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 79 Create a new workflow ● Click Create button; the following screen will be displayed ● Name the workflow as WordCountWorkflow
  • 80. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 80 Select a Java job for the workflow ● From the Oozie editor, drag Java and drop between start and end
  • 81. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 81 Edit the Java Job ● Assign the following value – Name: WordCount – Jar name: wordcount.jar (select … choose upload from local machine) – Main Class: org.myorg.WordCount – Arguments: input/* output/wordcount_output2
  • 82. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 82 Submit the workflow ● Click Done, follow by Save ● Then click submit
  • 83. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Hands-On: Working with a csv data
  • 84. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 84 A sample CSV data ● The input data is access logs with the following form Date, Requesting-IP-Address ● We will write a map reduce program to count the number of hits to the website per country.
  • 85. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 HitsByCountryMapper.java package learning.bigdata.mapreduce; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class HitsByCountryMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static String[] COUNTRIES = { "India", "UK", "US", "China" }; private Text outputKey = new Text(); private IntWritable outputValue = new IntWritable(); @Override protected void setup(Context context) throws IOException, InterruptedException { super.setup(context); } @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { try { String valueString = value.toString(); // Split the value string to get Date and ipAddress String[] row = valueString.split(",");
  • 86. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 HitsByCountryMapper.java // row[0]= Date and row[1]=ipAddress String ipAddress = row[1]; // Get the country name to which the ipAddress belongs String countryName = getCountryNameFromIpAddress(ipAddress); outputKey.set(countryName); outputValue.set(1); context.write(outputKey, outputValue); } catch (ArrayIndexOutOfBoundsException ex) { context.getCounter("Custom counters", "MAPPER_EXCEPTION_COUNTER").increment(1); ex.printStackTrace(); } } private static String getCountryNameFromIpAddress(String ipAddress) { if (ipAddress != null && !ipAddress.isEmpty()) { int randomIndex = Math.abs(ipAddress.hashCode()) % COUNTRIES.length; return COUNTRIES[randomIndex]; } return null; } }
  • 87. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 HitsByCountryReducer.java package learning.bigdata.mapreduce; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class HitsByCountryReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private Text outputKey = new Text(); private IntWritable outputValue = new IntWritable(); private int count = 0; protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { count = 0; Iterator<IntWritable> iterator = values.iterator(); while (iterator.hasNext()) { IntWritable value = iterator.next(); count += value.get(); } outputKey.set(key); outputValue.set(count); context.write(outputKey, outputValue); } }
  • 88. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 HitsByCountry.java package learning.bigdata.main; import learning.bigdata.mapreduce.HitsByCountryMapper; import learning.bigdata.mapreduce.HitsByCountryReducer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class HitsByCountry extends Configured implements Tool { private static final String JOB_NAME = "Calculating hits by country"; public static void main(String[] args) throws Exception { if (args.length < 2) { System.out.println("Usage: HitsByCountry <comma separated input directories> <output dir>"); System.exit(-1); } int result = ToolRunner.run(new HitsByCountry(), args); System.exit(result); }
  • 89. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 HitsByCountry.java @Override public int run(String[] args) throws Exception { try { Configuration conf = getConf(); Job job = Job.getInstance(conf); job.setJarByClass(HitsByCountry.class); job.setJobName(JOB_NAME); job.setMapperClass(HitsByCountryMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setReducerClass(HitsByCountryReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, args[0]); FileOutputFormat.setOutputPath(job, new Path(args[1])); boolean success = job.waitForCompletion(true); return success ? 0 : 1; } catch (Exception e) { e.printStackTrace(); return 1; } } }
  • 90. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 91. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Lecture: Developing Complex Hadoop MapReduce Applications
  • 92. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 92 Choosing appropriate Hadoop data types ● Hadoop uses the Writable interface based classes as the data types for the MapReduce computations. ● Choosing the appropriate Writable data types for your input, intermediate, and output data can have a large effect on the performance and the programmability of your MapReduce programs. ● In order to be used as a value data type, a data type must implement the org.apache.hadoop.io.Writable interface. ● In order to be used as a key data type, a data type must implement the org.apache.hadoop.io.WritableComparable<T> interface
  • 93. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 93 Examples
  • 94. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 94 Hadoop built-in data types ● Text: This stores a UTF8 text ● BytesWritable: This stores a sequence of bytes ● VIntWritable and VLongWritable: These store variable length integer and long values ● NullWritable: This is a zero-length Writable type that can be used when you don't want to use a key or value type
  • 95. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 95 Hadoop built-in data types ● The following Hadoop build-in collection data types can only be used as value types. – ArrayWritable: This stores an array of values belonging to a Writable type. – TwoDArrayWritable: This stores a matrix of values belonging to the same Writable type. – MapWritable: This stores a map of key-value pairs. Keys and values should be of the Writable data types. – SortedMapWritable: This stores a sorted map of key-value pairs. Keys should implement the WritableComparable interface.
  • 96. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 96 Implementing a custom Hadoop Writable data type ● we can easily write a custom Writable data type by implementing the org.apache.hadoop.io.Writable interface ● The Writable interface-based types can be used as value types in Hadoop MapReduce computations.
  • 97. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 97 Examples
  • 98. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 98 Examples
  • 99. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 99 Examples
  • 100. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 100 Choosing a suitable Hadoop InputFormat for your input data format ● Hadoop supports processing of many different formats and types of data through InputFormat. ● The InputFormat of a Hadoop MapReduce computation generates the key-value pair inputs for the mappers by parsing the input data. ● InputFormat also performs the splitting of the input data into logical partitions
  • 101. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 101 InputFormat that Hadoop provide ● TextInputFormat: This is used for plain text files. TextInputFormat generates a key-value record for each line of the input text files. ● NLineInputFormat: This is used for plain text files. NlineInputFormat splits the input files into logical splits of fixed number of lines. ● SequenceFileInputFormat: For Hadoop Sequence file input data ● DBInputFormat: This supports reading the input data for MapReduce computation from a SQL table.
  • 102. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 102 Implementing new input data formats ● Hadoop enables us to implement and specify custom InputFormat implementations for our MapReduce computations. ● A InputFormat implementation should extend the org.apache.hadoop.mapreduce.InputFormat<K,V> abstract class ● overriding the createRecordReader() and getSplits() methods.
  • 103. Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 103 Formatting the results of MapReduce computations – using Hadoop OutputFormats ● it is important to store the result of a MapReduce computation in a format that can be consumed efficiently by the target application ● We can use Hadoop OutputFormat interface to define the data storage format ● A OutputFormat prepares the output location and provides a RecordWriter implementation to perform the actual serialization and storage of the data. ● Hadoop uses the org.apache.hadoop.mapreduce.lib.output. TextOutputFormat<K,V> as the default OutputFormat
  • 104. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Hands-On: Analytics Using MapReduce
  • 105. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 106. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 107. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 108. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 109. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 110. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 111. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 112. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 113. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 114. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 115. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 116. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 117. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 118. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Lecture Understanding HBase
  • 119. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Introduction An open source, non-relational, distributed database HBase is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (, providing BigTable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.
  • 120. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 HBase Features ● Hadoop database modelled after Google's Bigtable ● Column oriented data store, known as Hadoop Database ● Support random realtime CRUD operations (unlike HDFS) ● No SQL Database ● Opensource, written in Java ● Run on a cluster of commodity hardware Hive.apache.org
  • 121. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 When to use Hbase? ● When you need high volume data to be stored ● Un-structured data ● Sparse data ● Column-oriented data ● Versioned data (same data template, captured at various time, time-elapse data) ● When you need high scalability Hive.apache.org
  • 122. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Which one to use? ● HDFS ● Only append dataset (no random write) ● Read the whole dataset (no random read) ● HBase ● Need random write and/or read ● Has thousands of operation per second on TB+ of data ● RDBMS ● Data fits on one big node ● Need full transaction support ● Need real-time query capabilities Hive.apache.org
  • 123. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 124. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
  • 125. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 HBase Components Hive.apache.org ● Region ● Row of table are stores ● Region Server ● Hosts the tables ● Master ● Coordinating the Region Servers ● ZooKeeper ● HDFS ● API ● The Java Client API
  • 126. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 HBase Shell Commands Hive.apache.org
  • 127. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Hands-On: Running HBase
  • 128. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Starting HBase shell [hdadmin@localhost ~]$ [hdadmin@localhost ~]$ hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.94.10, r1504995, Fri Jul 19 20:24:16 UTC 2013 hbase(main):001:0>
  • 129. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Create a table and insert data in HBase hbase(main):009:0> create 'test', 'cf' 0 row(s) in 1.0830 seconds hbase(main):010:0> put 'test', 'row1', 'cf:a', 'val1' 0 row(s) in 0.0750 seconds hbase(main):011:0> scan 'test' ROW COLUMN+CELL row1 column=cf:a, timestamp=1375363287644, value=val1 1 row(s) in 0.0640 seconds hbase(main):002:0> get 'test', 'row1' COLUMN CELL cf:a timestamp=1375363287644, value=val1 1 row(s) in 0.0370 seconds
  • 130. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Using Data Browsers in Hue for HBase
  • 131. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Using Data Browsers in Hue for HBase
  • 132. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Using Data Browsers in Hue for HBase
  • 133. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Recommendation to Further Study
  • 134. Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2 Thank you www.imcinstitute.com www.facebook.com/imcinstitute