0% found this document useful (0 votes)

65 views

Running A Pig Program On The CDH Single Node Cluster On An Aws Ec2 Instance

1. The document provides steps to run a Pig program on a CDH cluster using a single node AWS EC2 instance. This involves starting the EC2 instance, increasing YARN memory settings, connecting via PuTTY, verifying Pig is installed, copying files to the instance, creating an HDFS directory, uploading a file, and running the Pig script. 2. Key steps include editing YARN configuration to increase memory limits, using WinSCP to copy files to the instance, changing ownership of an HDFS directory, putting a file into HDFS, and running the Pig count-words program on the text file. 3. The Pig script tokenizes lines from a text file in HDFS and counts

Uploaded by

Ram Guggul

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views

Running A Pig Program On The CDH Single Node Cluster On An Aws Ec2 Instance

Uploaded by

Ram Guggul

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

RUNNING A PIG PROGRAM ON THE CDH SINGLE NODE

CLUSTER ON AN AWS EC2 INSTANCE

PREREQUISITES

● Ensure that you have installed the WinSCP tool on your Windows machine.

IMPORTANT INSTRUCTIONS

● The following notations have been used while running the Java API code:
[ec2-user@ip-10-0-0-14 ~]$ hadoop command
Output of the command
As shown above, the command to be run is written in bold. The output of the command
is written in italics. The [ec2-user@ip-10-0-0-14 ~] informs us about the user through
which the command is to be executed.

● Whenever you want to access a file whether to execute the code or to push it onto the
HDFS, you need to either be in the directory of the file or specify the relative path to the
file.
● Be careful with the spaces in the commands.
● If a series of commands is given in a particular order, make sure that you run them in
the same order.

NOTE: Before starting with the document below, it is necessary that you create the EC2
instance with Cloudera installed on your machine. Go through Video 1 before getting
started with this document.

In this document, we are running the wordcount program on the ‘dropbox-policy.txt file
using PIG.
Steps to Run a PIG Program on the CDH Single-Node Cluster on an
AWS EC2 Instance

1. Start CDH EC2 instance from the AWS console and wait until the instance state changes
to ‘running’.

2. Copy the public IP address from the EC2 dashboard.

3. Go to your browser and open Cloudera Manager. To access the Cloudera Manager page,
enter your public IP address followed by ‘:7180’, as shown below:
<your public IP>:7180
Username: admin
Password: admin
4. After logging in to the Cloudera Manager, click on ‘Cloudera management service’
followed by ‘Restart’.
5. After the restart, click on “Close”.

6. Wait until all the services turn green, as shown in the image below.

7. Whenever a MapReduce or PIG program is to be run on the AWS EC2 instance, we need
to visit the YARN Configuration page and edit the following properties:
● yarn.scheduler.maximum-allocation-mb
● yarn.nodemanager.resource.memory-mb

8. To edit them, follow the instructions below:

a. Click on ‘YARN (MR2 Included)’, after which the following page appears:
b. Click on ‘Configuration’.

c. Now we have to increase the RAM. To do so, enter the following properties in
the ‘Search’ field. Increase the RAM of each property to the number mentioned
below:
i. yarn.scheduler.maximum-allocation-mb
The default value is 1 GB. Change it to 8 GB and then click on ‘Save
Changes’.
ii. yarn.nodemanager.resource.memory-mb
The default value is 1 GB. Change it to 10 GB and then click on ‘Save
Changes’.
d. Restart the YARN service by clicking on the ‘State Configuration Restart needed’
icon as shown in the image below:

e. Now, click on ‘Restart Stale services’.

f. Click on ‘Restart Now’.

g. It generally takes a few minutes to restart the service. Wait until both the
green ticks appear. Now, click on ‘Finish’.
h. Wait until the YARN service turns green.

8. Next, we have to connect to the EC2 instance using PuTTY. Once connected, we log
in as an EC2 user. We now need to switch to the root user using the sudo -i
command.

9. PIG is by default installed in Cloudera. To verify, type ‘pig’ as shown below:
[root@ip-10-0-0-105 ~]# pig
10. Now, the grunt shell opens in the terminal where you can run PIG codes.
We have used quit to exit from the grunt shell.

11. Now, we have to run a program as per the instructions in the video.
First, we have to create a directory named ‘Pig’. We will use the ‘mkdir’ command to
do so:
[root@ip-10-0-0-105 ~]# mkdir Pig

12. Download the files ‘Dropbox.txt’ and ‘count-words.pig’.

13. Now, we need to copy the files downloaded above from our local machine to the
EC2 instance.
a. Mac/Linux users:
Use the following command to copy a file from your local system to the EC2
instance.
scp -i <path of .pem> < path of the file in your local system>
ec2-user@public ip: <destination path in the EC2 instance>
b. Windows users:
Move the downloaded files to a folder named ‘Pig_Data’ on the desktop of
your Windows machine. Although it is not necessary to do so and you need
only to remember where you have stored the downloaded files.
WinSCP is a tool to transfer a file from a Windows machine to a Linux
machine (EC2 instance).
i. Open WinSCP.

ii. Enter the following credentials:

Hostname: Provide the public IP from the EC2 dashboard.

Username: ec2-user
iii. Then, click on ‘Advanced’.
iv. After clicking on ‘Authentication’, enter the path of your PPK file.

v. Click ‘OK’ followed by ‘Login’. Click ‘Yes’ on the pop-up that appears.
vi. Now, the following screen appears.

On the left-hand side of the screen, your local Windows machine is

displayed, as it is in our case. On the right-hand side, your AWS EC2
instance is displayed.

vii. Browse to the folder where you have stored the downloaded files. In
our case, it was a folder named ‘Pig_Data’ on the desktop.

viii. Drag and drop the files ‘dropbox-policy.txt’ and ‘count-words.pig’

from the left-hand side of the screen to the right-hand side. Click ‘OK’
on the prompt that appears. Ensure that you are in the
/home/ec2-user directory on the right-hand side screen.
ix. We have now successfully copied the files ‘dropbox-policy.txt’ and
‘count-words.pig’ from our local machine to our EC2 instance.

14. Now, go back to the EC2 instance. We need to copy the files ‘count-words.pig’ and
‘dropbox-policy.txt’ from /home/ec2-user/ to /root/Pig/ using the following
commands:
[root@ip-10-0-0-105 ~]# cp /home/ec2-user/count-words.pig /root/Pig/
[root@ip-10-0-0-105 ~]# cp /home/ec2-user/dropbox-policy.txt /root/Pig/

15. Verify whether the files have been copied to the Pig directory or not. To do so,
change the working directory to ‘Pig’ using the ‘cd’ command followed by ‘ls’, and
check whether the files ‘count-words.pig’ and ‘dropbox-policy.txt’ exist.
[root@ip-10-0-0-105 ~]# cd Pig
[root@ip-10-0-0-105 Pig]# ls

16. Come out of the Pig directory using the ‘cd’ command.
[root@ip-10-0-0-105 Pig]# cd
[root@ip-10-0-0-105 ~]#
Creating a Directory Inside the HDFS and Changing its Owner
17. The commands used below demonstrate how to create a directory in the HDFS.
Note: A directory can be created in Hadoop only using the HDFS user. So now, switch
to the HDFS user. Note that there is a space between ‘-’ and ‘hdfs’ in the command
used below:
[root@ip-10-0-0-105 ~]# su – hdfs

[hdfs@ip-10-0-0-105 ~]$ hadoop fs -mkdir /user/root/

18. You can verify the directory that was created by the command, as shown below:

[hdfs@ip-10-0-0-105 ~]$ hadoop fs -ls /user/

Found 6 items

drwxrwxrwx - mapred hadoop 0 2018-02-09 09:28 /user/history

drwxrwxr-t - hive hive 0 2018-02-09 09:30 /user/hive

drwxrwxr-x - hue hue 0 2018-02-09 09:30 /user/hue

drwxrwxr-x - oozie oozie 0 2018-02-09 09:30 /user/oozie

drwxr-xr-x - hdfs supergroup 0 2018-02-12 05:58 /user/root

drwxr-x--x - spark spark 0 2018-02-09 09:29 /user/spark

Now, as seen above, the owner of the directory is ‘hdfs’ (underlined above). To send
a file from any user to hdfs, the owner of the directory inside hdfs should be changed
to the user sending the file. For example, if you have to send a file from the root user
to a directory inside hdfs, the owner of that particular directory inside hdfs should be
changed to root.
19. To change the owner of the directory created from hdfs to root, run the following
command:
[hdfs@ip-10-0-0-105 ~]$ hadoop fs -chown root /user/root

20. You can verify whether or not the owner has been changed using the command
shown below:
[hdfs@ip-10-0-0-105 ~]$ hadoop fs -ls /user/

Found 6 items
drwxrwxrwx - mapred hadoop 0 2018-02-18 07:16 /user/history
drwxrwxr-t - hive hive 0 2018-02-18 07:17 /user/hive
drwxrwxr-x - hue hue 0 2018-02-18 07:18 /user/hue
drwxrwxr-x - oozie oozie 0 2018-02-18 07:18 /user/oozie
drwxr-xr-x - root supergroup 0 2018-02-18 09:49 /user/root
drwxr-x--x - spark spark 0 2018-02-18 07:17 /user/spark

You can see that the owner has changed from hdfs to root.
21. Now, use the ‘exit’ command to shift from the hdfs user to the root user.
[hdfs@ip-10-0-0-105 ~]$ exit;
logout
[root@ip-10-0-0-105 ~]#

22. Now, use the ‘put’ command to send the ‘dropbox-policy.txt’ file into the hdfs
/user/root directory.
Syntax: hadoop fs -put <source> <destination>
[root@ip-10-0-0-105 ~]# hadoop fs -put /root/Pig/dropbox-policy.txt /user/root

23. Verify the same using the ‘ls’ command.

[root@ip-10-0-0-105 ~]# hadoop fs -ls /user/root
Found 1 items
-rw-r--r-- 3 root supergroup 15048 2018-02-23 08:54
/user/root/dropbox-policy.txt
[root@ip-10-0-0-105 ~]#
24. Now, navigate to the Pig directory using the ‘cd’ command. After executing the cd
command, use the ‘cat’ command to view the code.
[root@ip-10-0-0-105 Pig]# cat count-words.pig
-- File: count-words.pig
-- Description: Count the number of words in a text file
file = LOAD 'dropbox-policy.txt' AS (line);
words = FOREACH file GENERATE FLATTEN(TOKENIZE(line)) AS word;
grouped = GROUP words BY word;
counted = FOREACH grouped GENERATE group, COUNT(words);
sorted_counted = ORDER counted BY $1;
-- DUMP sorted_counted;
STORE sorted_counted INTO '/user/cloudera/new-output.out';

25. Now, we need to edit the count-word.pig file. We need to change the load and store
paths according to the HDFS.

a. In count-words.pig, instead of LOAD 'dropbox-policy.txt', we will write LOAD

'/user/root/dropbox-policy.txt'
b. In count-words.pig, instead of STORE sorted_counted INTO
'/user/cloudera/new-output.out', we will write STORE sorted_counted INTO
'/user/root/new-output.out/'
To make the above changes, we will open the ‘count-words.pig’ file using the vi
editor. We will then move into the insert mode of vi using ‘i’ and make the
aforementioned changes.

[root@ip-10-0-0-105 Pig]# vi count-words.pig

After making the changes, we will switch to the command mode and then use :wq!
to save and exit the file in the vi editor.
26. Verify the changes using the ‘cat’ command.

27. Now, we will run the code using the ‘pig count-words.pig’ command and check the
output as specified in the store command in the code file.
[root@ip-10-0-0-105 Pig]# pig count-words.pig

The output is as follows:

28. To verify whether the code has run successfully or not, go to the Resource Manager.
Access the Resource Manager using your public IP followed by ‘:8088’, as shown
below:
<Public IP>:8088
Username: admin
Password: admin
Now, check whether the PIG Program shows ‘SUCCEEDED’ in the FinalStatus column
or not.

29. Now, to verify the output, we need to first access Hue.

Access Hue using your public IP followed by ‘:8888’, as shown below:
<Public IP>:8888
Username: admin
Password: admin
30. Then, click on ‘Files’.

31. After that, click on ‘root’.

32. Next, click on ‘new-output.out’.

33. Click on ‘part-r-00000’.

34. Now, you can see the output as shown below:

Comandos SIU
50% (2)
Comandos SIU
3 pages
How To Set Up A Hadoop Cluster in Docker
No ratings yet
How To Set Up A Hadoop Cluster in Docker
13 pages
Ansible For Linux by Examples
From Everand
Ansible For Linux by Examples
Luca Berton
No ratings yet
The Little Book of Sitecore® Tips: Volume 1
From Everand
The Little Book of Sitecore® Tips: Volume 1
Neil P Shack
No ratings yet
Practical Ethical Hacking from Scratch
From Everand
Practical Ethical Hacking from Scratch
Ansh Goyal
4.5/5 (2)
ISTQB Agile Tester Extension Sample Exam-ASTQB-Version
No ratings yet
ISTQB Agile Tester Extension Sample Exam-ASTQB-Version
9 pages
HDFS Shell Commands On AWS
No ratings yet
HDFS Shell Commands On AWS
12 pages
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mamood Alassouli
No ratings yet
Backend Handbook: for Ruby on Rails Apps
From Everand
Backend Handbook: for Ruby on Rails Apps
Francisco Quintero
1/5 (1)
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Hdfs Java Api On Amazon Ec2: Prerequisites
No ratings yet
Hdfs Java Api On Amazon Ec2: Prerequisites
18 pages
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hidaia Mahmood Alassouli
No ratings yet
Big Data Analytics Laboratory
No ratings yet
Big Data Analytics Laboratory
57 pages
DevOps. How to build pipelines with Jenkins, Docker container, AWS ECS, JDK 11, git and maven 3?
From Everand
DevOps. How to build pipelines with Jenkins, Docker container, AWS ECS, JDK 11, git and maven 3?
John Edward Cooper Berg
No ratings yet
Experiment No 1
No ratings yet
Experiment No 1
13 pages
basic HDFS commands
No ratings yet
basic HDFS commands
7 pages
Linux Commands - Mkdir - Rmdir - Touch - RM - CP - More - Less - Head - Tail - Cat
No ratings yet
Linux Commands - Mkdir - Rmdir - Touch - RM - CP - More - Less - Head - Tail - Cat
16 pages
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
2 HDFS Commands
No ratings yet
2 HDFS Commands
7 pages
Hadoop Cluster
No ratings yet
Hadoop Cluster
18 pages
L2 Accessing HDFS On Cloudera Distribution
No ratings yet
L2 Accessing HDFS On Cloudera Distribution
5 pages
Linux DevOps Tools Engineer (701) Practice Tests: 400 Questions to Ace Your Certification
From Everand
Linux DevOps Tools Engineer (701) Practice Tests: 400 Questions to Ace Your Certification
Steve Brown
No ratings yet
Apache Hadoop Installation and Cluster Setup On AWS EC2 (Ubuntu) - Part 2
No ratings yet
Apache Hadoop Installation and Cluster Setup On AWS EC2 (Ubuntu) - Part 2
23 pages
Docker Tutorial for Beginners: Learn Programming, Containers, Data Structures, Software Engineering, and Coding
From Everand
Docker Tutorial for Beginners: Learn Programming, Containers, Data Structures, Software Engineering, and Coding
Andrew Lee
3/5 (2)
Exp 2 Bda
No ratings yet
Exp 2 Bda
4 pages
Vps Toolkit
From Everand
Vps Toolkit
Davide Gatti
No ratings yet
Wireless and Mobile Hacking and Sniffing Techniques
From Everand
Wireless and Mobile Hacking and Sniffing Techniques
Dr. Hidaia Mahmood Alassouli
No ratings yet
Hiding Web Traffic with SSH: How to Protect Your Internet Privacy against Corporate Firewall or Insecure Wireless
From Everand
Hiding Web Traffic with SSH: How to Protect Your Internet Privacy against Corporate Firewall or Insecure Wireless
Slava Gomzin
No ratings yet
Hadoop1
No ratings yet
Hadoop1
15 pages
Procedure: 1
No ratings yet
Procedure: 1
29 pages
Build Your First Home Server
From Everand
Build Your First Home Server
R.R. Arnob
No ratings yet
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Evaluation of Some Android Emulators and Installation of Android OS on Virtualbox and VMware
From Everand
Evaluation of Some Android Emulators and Installation of Android OS on Virtualbox and VMware
Dr. Hidaia Mahmood Alassouli
No ratings yet
Evaluation of Some Windows and Linux Intrusion Detection Tools
From Everand
Evaluation of Some Windows and Linux Intrusion Detection Tools
Dr. Hedaya Alasooly
No ratings yet
NoSQL Injection for Elasticsearch
From Everand
NoSQL Injection for Elasticsearch
Gary Drocella
No ratings yet
Setup of a Graphical User Interface Desktop for Linux Virtual Machine on Cloud Platforms
From Everand
Setup of a Graphical User Interface Desktop for Linux Virtual Machine on Cloud Platforms
Dr. Hidaia Mahmood Alassouli
No ratings yet
reStructuredText for Sphinx
From Everand
reStructuredText for Sphinx
Vimalkumar Velayudhan
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
Firebase Storage for Angular: A reliable file upload solution for your applications
From Everand
Firebase Storage for Angular: A reliable file upload solution for your applications
Abdelfattah Ragab
No ratings yet
Linux Services Deployment
From Everand
Linux Services Deployment
Fabian Mestre
No ratings yet
Hadoop Commands Only
No ratings yet
Hadoop Commands Only
19 pages
Evaluation of Some Intrusion Detection and Vulnerability Assessment Tools
From Everand
Evaluation of Some Intrusion Detection and Vulnerability Assessment Tools
Dr. Hedaya Mahmood Alasooly
No ratings yet
Hadoop Configruation To Use EMC Isilon Storage
No ratings yet
Hadoop Configruation To Use EMC Isilon Storage
5 pages
BigData_Lab_Manual
No ratings yet
BigData_Lab_Manual
44 pages
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
From Everand
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
Dr. Hidaia Mahmood Alassouli
No ratings yet
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
TP 1 - HDFS
No ratings yet
TP 1 - HDFS
40 pages
Exp1 Hirday Merged
No ratings yet
Exp1 Hirday Merged
102 pages
Hadoop Lab
100% (2)
Hadoop Lab
6 pages
Apache Hadoop Yarn Commands
No ratings yet
Apache Hadoop Yarn Commands
8 pages
Hadoop-HDFS-commands
No ratings yet
Hadoop-HDFS-commands
1 page
hadoop
No ratings yet
hadoop
6 pages
Build your own Blockchain: Make your own blockchain and trading bot on your pc
From Everand
Build your own Blockchain: Make your own blockchain and trading bot on your pc
Magelan Cybersecurity
No ratings yet
Hacking of Computer Networks: Full Course on Hacking of Computer Networks
From Everand
Hacking of Computer Networks: Full Course on Hacking of Computer Networks
Dr. Hidaia Mahmood Alassouli
No ratings yet
CI - CD of Flask Application Using AWS CodeBuild, CodeDeploy and CodePipeline
No ratings yet
CI - CD of Flask Application Using AWS CodeBuild, CodeDeploy and CodePipeline
76 pages
Umass Edu
No ratings yet
Umass Edu
2 pages
Cisco Packet Tracer for Beginners
From Everand
Cisco Packet Tracer for Beginners
kalyan chinta
5/5 (3)
Ansible For Security by Examples
From Everand
Ansible For Security by Examples
Berton
No ratings yet
Amrita CC 3.1
No ratings yet
Amrita CC 3.1
7 pages
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
ADD Find Unsorted Array Sorted Array Linked List
No ratings yet
ADD Find Unsorted Array Sorted Array Linked List
27 pages
Running A Pig Program On The CDH Single Node Cluster On An Aws Ec2 Instance
No ratings yet
Running A Pig Program On The CDH Single Node Cluster On An Aws Ec2 Instance
21 pages
Pig Installation On Mac and Linux
No ratings yet
Pig Installation On Mac and Linux
2 pages
Isilon GUI Administration
No ratings yet
Isilon GUI Administration
434 pages
Client-Server Based File Processing Application: The Basic Flow of The Application Is The Same
No ratings yet
Client-Server Based File Processing Application: The Basic Flow of The Application Is The Same
3 pages
Steps To Run A JAVA API On Virtual-Box
No ratings yet
Steps To Run A JAVA API On Virtual-Box
5 pages
Multiple Storage Allocations 19 DEC 2017
No ratings yet
Multiple Storage Allocations 19 DEC 2017
1 page
Basic Linux Commands
No ratings yet
Basic Linux Commands
24 pages
Install Cloudera Manager Using AMI On Amazon EC2
No ratings yet
Install Cloudera Manager Using AMI On Amazon EC2
39 pages
Running A Mapreduce Program On Cloudera Quickstart VM: Requirements
100% (1)
Running A Mapreduce Program On Cloudera Quickstart VM: Requirements
13 pages
Downloading Pig Datasets
No ratings yet
Downloading Pig Datasets
2 pages
VNX File Shares and Its Capacity Details
No ratings yet
VNX File Shares and Its Capacity Details
2 pages
2017 Storage Arrays Location Total Iops Iops Average Peak Iops Total Capacity (TB) Used (TB)
No ratings yet
2017 Storage Arrays Location Total Iops Iops Average Peak Iops Total Capacity (TB) Used (TB)
24 pages
HPUX oldVMAX Info
No ratings yet
HPUX oldVMAX Info
4 pages
NSX NAS Size
No ratings yet
NSX NAS Size
9 pages
DB Servername VNX Project Name (EBS/RMS/APS Etc) Type (Prod or Dev/Test)
No ratings yet
DB Servername VNX Project Name (EBS/RMS/APS Etc) Type (Prod or Dev/Test)
14 pages
Individual Fund Factsheet: March 2019
No ratings yet
Individual Fund Factsheet: March 2019
51 pages
S.No Array Name Model Capacity in TB: Phoenix DC Storage SAN Details
No ratings yet
S.No Array Name Model Capacity in TB: Phoenix DC Storage SAN Details
2 pages
M09res01-Data Security
No ratings yet
M09res01-Data Security
38 pages
Hey G, Here Is Your Zestmoney Account Statement As On 01 September, 2019
No ratings yet
Hey G, Here Is Your Zestmoney Account Statement As On 01 September, 2019
2 pages
Copying A File From Local To HDFS Using The Java API
No ratings yet
Copying A File From Local To HDFS Using The Java API
10 pages
M01res01-Technology Overview
No ratings yet
M01res01-Technology Overview
46 pages
m08res01-DD Boost
No ratings yet
m08res01-DD Boost
44 pages
Display Mask Configuration Sheet: Key Mask Name Region
No ratings yet
Display Mask Configuration Sheet: Key Mask Name Region
253 pages
Form12BB Zensar
No ratings yet
Form12BB Zensar
2 pages
Ingate Siparator How To Guide
No ratings yet
Ingate Siparator How To Guide
196 pages
SQL Intermediate Assessment1
No ratings yet
SQL Intermediate Assessment1
9 pages
Database - Management - System - Case - Studies Hospital3
100% (1)
Database - Management - System - Case - Studies Hospital3
3 pages
Copia de Google Cloud Official Icons and Solution Architectures
No ratings yet
Copia de Google Cloud Official Icons and Solution Architectures
13 pages
NYC DEP GIS Job Posting - GreenHUB
No ratings yet
NYC DEP GIS Job Posting - GreenHUB
2 pages
Isg Prepaid Bill PDF
No ratings yet
Isg Prepaid Bill PDF
20 pages
Abacus Training
No ratings yet
Abacus Training
12 pages
Linux Commands
No ratings yet
Linux Commands
4 pages
StreamSets Whitepaper Data Advantage AI
No ratings yet
StreamSets Whitepaper Data Advantage AI
9 pages
Synopsis On: Online Story Sharing Website
No ratings yet
Synopsis On: Online Story Sharing Website
17 pages
COUMPLSK
No ratings yet
COUMPLSK
2 pages
Serverless Best Practices: Dashbird - Io
100% (1)
Serverless Best Practices: Dashbird - Io
16 pages
Py101 Sample
100% (2)
Py101 Sample
223 pages
SAP HANA HDBTable Syntax Reference en
No ratings yet
SAP HANA HDBTable Syntax Reference en
70 pages
mysql-ppt
No ratings yet
mysql-ppt
169 pages
Windows Registry Analysis
No ratings yet
Windows Registry Analysis
62 pages
Privacy in Cyberspace - Presentation
No ratings yet
Privacy in Cyberspace - Presentation
23 pages
Full Download Learning Informatica PowerCenter 10 x enterprise data warehousing and intelligent data centers Second Edition. Edition Rahul Malewar PDF DOCX
100% (5)
Full Download Learning Informatica PowerCenter 10 x enterprise data warehousing and intelligent data centers Second Edition. Edition Rahul Malewar PDF DOCX
65 pages
PRACTICA
No ratings yet
PRACTICA
16 pages
Datasheet NSO
No ratings yet
Datasheet NSO
7 pages
Creating Form Buttons and Actions
No ratings yet
Creating Form Buttons and Actions
64 pages
Amazonwebservices 2passeasy Saa-C02 Practice Test 2021-Mar-22 by Spencer 206q Vce
No ratings yet
Amazonwebservices 2passeasy Saa-C02 Practice Test 2021-Mar-22 by Spencer 206q Vce
6 pages
SAP Solution Manager With Focused Build Tricentis Test Automation
100% (1)
SAP Solution Manager With Focused Build Tricentis Test Automation
25 pages
DPA Data Domain WP PDF
No ratings yet
DPA Data Domain WP PDF
25 pages
EN - Useful Correction Reports in MM-PUR Area
No ratings yet
EN - Useful Correction Reports in MM-PUR Area
9 pages
Event Handling and GUI Programming
No ratings yet
Event Handling and GUI Programming
21 pages
Mydoom Virus
No ratings yet
Mydoom Virus
2 pages
Kamal Gulia QA Test Professional
No ratings yet
Kamal Gulia QA Test Professional
6 pages

Running A Pig Program On The CDH Single Node Cluster On An Aws Ec2 Instance

Uploaded by

Running A Pig Program On The CDH Single Node Cluster On An Aws Ec2 Instance

Uploaded by

RUNNING A PIG PROGRAM ON THE CDH SINGLE NODE

CLUSTER ON AN AWS EC2 INSTANCE

2. Copy the public IP address from the EC2 dashboard.

8. To edit them, follow the instructions below:

e. Now, click on ‘​​Restart Stale services​​’.

12. Download the files ‘Dropbox.txt’ and ‘count-words.pig’.

ii. Enter the following credentials:

Hostname​: Provide the public IP from the EC2 dashboard.

On the left-hand side of the screen, your local Windows machine is

viii. Drag and drop the files ‘dropbox-policy.txt’ and ‘count-words.pig’

[hdfs@ip-10-0-0-105 ~]$ ​hadoop fs -mkdir /user/root/

[hdfs@ip-10-0-0-105 ~]$ ​hadoop fs -ls /user/

drwxrwxrwx - mapred hadoop 0 2018-02-09 09:28 /user/history

drwxrwxr-t - hive hive 0 2018-02-09 09:30 /user/hive

drwxrwxr-x - hue hue 0 2018-02-09 09:30 /user/hue

drwxrwxr-x - oozie oozie 0 2018-02-09 09:30 /user/oozie

drwxr-xr-x -​ hdfs ​ supergroup 0 2018-02-12 05:58 /user/root

drwxr-x--x - spark spark 0 2018-02-09 09:29 /user/spark

23. Verify the same using the ‘ls’ command.

a. In count-words.pig, instead of ​LOAD 'dropbox-policy.txt'​, we will write ​LOAD

[root@ip-10-0-0-105 Pig]# ​vi count-words.pig

The output is as follows:

29. Now, to verify the output, we need to first access Hue.

31. After that, click on ‘​root​’.

32. Next, click on ‘​new-output.out​’.

33. Click on ‘​part-r-00000​’.

34. Now, you can see the output as shown below:

You might also like

e. Now, click on ‘Restart Stale services’.

Hostname: Provide the public IP from the EC2 dashboard.

[hdfs@ip-10-0-0-105 ~]$ hadoop fs -mkdir /user/root/

[hdfs@ip-10-0-0-105 ~]$ hadoop fs -ls /user/

drwxr-xr-x - hdfs supergroup 0 2018-02-12 05:58 /user/root

a. In count-words.pig, instead of LOAD 'dropbox-policy.txt', we will write LOAD

[root@ip-10-0-0-105 Pig]# vi count-words.pig

31. After that, click on ‘root’.

32. Next, click on ‘new-output.out’.

33. Click on ‘part-r-00000’.