0% found this document useful (0 votes)

38 views17 pages

Pig Setup and Test Run: by Kannan Kalidasan

This document provides instructions for setting up and running a test of Pig Latin on Hadoop. It explains that Pig Latin is a data flow language that allows processing of data on Hadoop without Java code. It describes installing Pig, setting environment variables, and running a sample Pig Latin script to load data from a file, group the data by year, count the records, and store the results. The script demonstrates basic Pig Latin keywords like LOAD, GROUP, FOREACH, and STORE.

Uploaded by

UtibeimaUkoh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views17 pages

Pig Setup and Test Run: by Kannan Kalidasan

Uploaded by

UtibeimaUkoh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Pig Setup and Test run

By Kannan Kalidasan

Pig Introduction
Pig is a data flow language ( PigLatin ) to write Hadoop operations without using MapReduce Java
code.
Pig is a layer of abstraction on top of Hadoop to simplify its use by giving a SQL-like interface to
process data on Hadoop.
Help to increase productivity by not writing many lines of Java code.
It supports a variety of data types and also support user-defined functions (UDFs) to write custom
operations in Java, Python and JavaScript.
I recommended To learn Programming Pig Allan Gates book.
Author explain the concepts in clear and simple way.

Pig Prompt is GRUNT

pig grunts
$ pig
grunt>

Pig session has two modes

Local Mode : Access to a single machine. All files are installed and run using your local host and file
system.This mode helps to debug the pig script before we process them in clusters. -x flag is used to
specify the mode.
pig -x local
MapReduce Mode : Access to a Hadoop cluster and HDFS installation. MapReduce mode is the
default mode;
To add Hadoop Conf details to Pig Class path
export PIG_CLASSPATH=$HADOOP_HOME/conf/
both below commands are same and Start the pig session in MapReduce mode.
pig or pig -x mapreduce

Note to Remember ...

Hadoop services should be running to start the pig MapReduce mode and connect to HDFS and
proceed with our work.

Pig translates the PigLatin scripts into MapReduce Jobs internally and run in hadoop cluster.

In MapReduce mode, takes file from HDFS only, and stores the results back to HDFS.

Pig Installation
1. Download the stable version of tarbal.
http://mirror.nexcess.net/apache/pig/pig-0.12.0/
pig-0.12.0.tar.gz
Release notes link
http://pig.apache.org/releases.html#Download

Pig Installation ...

2.Copy the downloaded package to /usr/local
/usr/local
kannan@kannandreams:/usr/local$ ls -ltr
total 119460
-rwxr-xr-x 1 root root 63851630 Nov 11 02:11 hadoop-1.2.1.tar.gz
drwxr-xr-x 16 hduser hadoop 4096 Nov 11 23:47 hadoop
-rwxrwxrwx 1 root root 58433159 Dec 3 00:55 pig-0.11.1.tar.gz
kannan@kannandreams:/usr/local$

Pig Installation ...

3. unzip and change the owner
sudo tar xzf pig-0.11.1.tar.gz
sudo mv pig-0.11.1 pig
sudo chown -R hduser:hadoop pig
chown command change the owner of the directory pig from root to hadoop user hduser.

4.Login to Hadoop user hduser and set the environment variables.

kannan@kannandreams:/usr/local$ su hduser
Add the below two lines in ~/.bashrc file.
export PIG_HOME=/usr/local/pig
export PATH=$PATH:$PIG_HOME/bin

Pig Installation ...

5. Source the profile file to reflect the changes
hduser@kannandreams:~$ . .bashrc
hduser@kannandreams:~$
6.check the pig command
output of the command mentioned below is not complete one.
hduser@kannandreams:~$ pig -help
Warning: $HADOOP_HOME is deprecated.
Apache Pig version 0.11.1 (r1459641)
compiled Mar 22 2013, 02:13:53

Test Run ...

7. Create a sample file for processing ( file name as pigcsv )
Extension for the file doesnt matter . it will understand based on mime type of the file.
sample file create a file in HDFS directory with the below contents
2006;
2007;
2008;
2008;
2008;
2008;
2007;

Test Run ...

8. Pig Scripts
Method 1 to run the pig script : Save the pig scripts as <<filename>>.pig ( In my case, it is pig_test.
pig ) and run as $ pig -x mapreduce pig_test.pig OR $ pig pig_test.pig
SampleRecord = LOAD /user/hduser/piginput/pigcsv
USING PigStorage(;) AS (Year:chararray);
GroupByYear = GROUP SampleRecord BY Year;
CountByYear = FOREACH GroupByYear
GENERATE CONCAT((chararray)$0,CONCAT(:,(chararray)COUNT($1)));
STORE CountByYear
INTO /user/hduser/pigoutput USING PigStorage(t);

Test Run ...

Method 2 to run the pig script : line ends with ; is considered as one statement
grunt>SampleRecord = LOAD /user/hduser/piginput/pigcsv
>> USING PigStorage(;) AS (Year:chararray);
grunt>GroupByYear = GROUP SampleRecord BY Year;
grunt>CountByYear = FOREACH GroupByYear
>>GENERATE CONCAT((chararray)$0,CONCAT(:,(chararray)COUNT($1)));
grunt>STORE CountByYear
>>INTO /user/hduser/pigoutput USING PigStorage(t);

Test Run ...

9. Output :
hduser@kannandreams:/usr/local/hadoop/bin$ hadoop fs -cat /user/hduser/pigoutput/part-r-00000
Warning: $HADOOP_HOME is deprecated.
2006:1
2007:2
2008:4
Year:1
hduser@kannandreams:/usr/local/hadoop/bin$

Script Explanation
Load the file into a variable by mentioning the delimiter (;) and Header name and its type.
Use comma to include more than one column data available in file.By Default , Pig loads files
delimited by tab. Need to explicitly mention type of delimiter character.
SampleRecord = LOAD /user/hduser/piginput/pigcsv
USING PigStorage(;) AS (Year:chararray);
Group the variable stored data by year
GroupByYear = GROUP SampleRecord BY Year;

Script Explanation ...

Count the records for each group set and generate the output as Key:Value.Its your wish how you
want to generate the file output.$0 is the group by criteria and $1 is the output of the count
CountByYear = FOREACH GroupByYear
GENERATE CONCAT((chararray)$0,CONCAT(:,(chararray)COUNT($1)));
Store the variable in a file
STORE CountByYear
INTO /user/hduser/pigoutput USING PigStorage(t);
For Complete Script commands , refer
http://pig.apache.org/docs/r0.10.0/start.html#data-results

Pig in Cloudera
Pig Editor in Cloudera are explained in my
blog.
http://kannandreams.wordpress.com/2013/12/03/pig-editor-in-cloudera/#!

Thank You !!!

mail : [email protected]
@kannanpoem on twitter
Blog: http://kannandreams.wordpress.com/about/
FB Community: www.facebook.com/groups/huge360/
HUGE - Hadoop User Group & Enthusiasts
Huge , Yes Its All about "BIG" Data
This has been created to build a group to get expertise and experts in Hadoop and Big Data .

Role and Importance of Stakeholders
100% (1)
Role and Importance of Stakeholders
10 pages
Hadoop - PIG User Material
No ratings yet
Hadoop - PIG User Material
292 pages
UNIT 5 Notes by ARUN JHAPATE
No ratings yet
UNIT 5 Notes by ARUN JHAPATE
21 pages
Pig
No ratings yet
Pig
16 pages
Hadoop Pig Presentation
No ratings yet
Hadoop Pig Presentation
33 pages
Apache Pig: For Live Hadoop Training, Please See Courses
No ratings yet
Apache Pig: For Live Hadoop Training, Please See Courses
25 pages
Pig_Notes-1
No ratings yet
Pig_Notes-1
6 pages
Unit 4
No ratings yet
Unit 4
5 pages
BigDataTraining ApachePig Intro Install
No ratings yet
BigDataTraining ApachePig Intro Install
5 pages
Pig Tutorial
No ratings yet
Pig Tutorial
22 pages
Pig Tutorial PDF
No ratings yet
Pig Tutorial PDF
22 pages
06-Pig-01-Intro-1
No ratings yet
06-Pig-01-Intro-1
23 pages
unit-4_SGS
No ratings yet
unit-4_SGS
13 pages
Unit III
No ratings yet
Unit III
118 pages
Pig
No ratings yet
Pig
12 pages
3 Pig
No ratings yet
3 Pig
77 pages
Lab07-Apache Pig V1.01
No ratings yet
Lab07-Apache Pig V1.01
7 pages
BigData Module 2
No ratings yet
BigData Module 2
41 pages
Apache PIG by Sravanthi
No ratings yet
Apache PIG by Sravanthi
31 pages
4.1_PIG_UNIT4
No ratings yet
4.1_PIG_UNIT4
55 pages
Cse 17CS82 M2 S1 PPT
No ratings yet
Cse 17CS82 M2 S1 PPT
35 pages
Pig Slides
No ratings yet
Pig Slides
46 pages
PIG A Big Data Processor
No ratings yet
PIG A Big Data Processor
49 pages
UNIT 3
No ratings yet
UNIT 3
26 pages
BIG DATA Module 2 FINAL SMI
No ratings yet
BIG DATA Module 2 FINAL SMI
44 pages
BDA-Unit 5-notes
No ratings yet
BDA-Unit 5-notes
36 pages
Big Data Processing, 2014/15: Lecture 8: Pig Latin!
No ratings yet
Big Data Processing, 2014/15: Lecture 8: Pig Latin!
58 pages
BDA Module 4 - Part 1 (Pig) 2023
No ratings yet
BDA Module 4 - Part 1 (Pig) 2023
34 pages
BDC Output 7
No ratings yet
BDC Output 7
9 pages
Pig Hive
No ratings yet
Pig Hive
72 pages
Unit-V Pig Programming
No ratings yet
Unit-V Pig Programming
123 pages
pig skb
No ratings yet
pig skb
7 pages
Unit 5
No ratings yet
Unit 5
16 pages
Shoaib Program From 7
No ratings yet
Shoaib Program From 7
17 pages
BDP U4
No ratings yet
BDP U4
58 pages
Big Data Unit IV
No ratings yet
Big Data Unit IV
19 pages
Pig
No ratings yet
Pig
59 pages
Unit 5 Lecture No-2(PIG)
No ratings yet
Unit 5 Lecture No-2(PIG)
94 pages
Unit 5 Lecture No-2(PIG)
No ratings yet
Unit 5 Lecture No-2(PIG)
101 pages
Introduction To Pig: SESSION 2016-2017
No ratings yet
Introduction To Pig: SESSION 2016-2017
44 pages
Ba1 3
No ratings yet
Ba1 3
7 pages
UNIT 5 Complete Notes
No ratings yet
UNIT 5 Complete Notes
21 pages
32 BDA Exp5
No ratings yet
32 BDA Exp5
33 pages
Unit V-Apache Pig
No ratings yet
Unit V-Apache Pig
10 pages
BDA-V
No ratings yet
BDA-V
10 pages
Hadoop Lab Instructions and Programs
No ratings yet
Hadoop Lab Instructions and Programs
7 pages
Pig_2
No ratings yet
Pig_2
63 pages
Unit IV EBDP 22
No ratings yet
Unit IV EBDP 22
97 pages
Lab 5
No ratings yet
Lab 5
9 pages
PIG
No ratings yet
PIG
9 pages
Pig
No ratings yet
Pig
12 pages
Unit No. 8
No ratings yet
Unit No. 8
24 pages
Pig Tutorial For Beginners - Orzota
No ratings yet
Pig Tutorial For Beginners - Orzota
5 pages
Hadoop Pig
No ratings yet
Hadoop Pig
111 pages
Apache PIG.pptx
No ratings yet
Apache PIG.pptx
41 pages
Unit Iv Part - 2
No ratings yet
Unit Iv Part - 2
59 pages
Bda - Module Ii
No ratings yet
Bda - Module Ii
239 pages
PIG: A Big Data Processor: Tushar B. Kute
No ratings yet
PIG: A Big Data Processor: Tushar B. Kute
50 pages
Unit IV - Pig PDF
No ratings yet
Unit IV - Pig PDF
79 pages
Apache Pig
100% (2)
Apache Pig
80 pages
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
Sentenses
No ratings yet
Sentenses
4 pages
Final Exam Sem 1 - 2
No ratings yet
Final Exam Sem 1 - 2
15 pages
Top 30 Penetration Tester (Pentester) Interview Questions and Answers
100% (1)
Top 30 Penetration Tester (Pentester) Interview Questions and Answers
13 pages
MR P.BREWER 2 - 090606
No ratings yet
MR P.BREWER 2 - 090606
3 pages
Isc N-Channel MOSFET Transistor: IRF3205 IIRF3205
No ratings yet
Isc N-Channel MOSFET Transistor: IRF3205 IIRF3205
2 pages
NCP Mathematics PG 9-12
No ratings yet
NCP Mathematics PG 9-12
72 pages
Extrusion and Drawing: Manufacturing Processes
100% (1)
Extrusion and Drawing: Manufacturing Processes
41 pages
Nagios Server Configuration
No ratings yet
Nagios Server Configuration
12 pages
Euclidean Algorithm Proof PDF
No ratings yet
Euclidean Algorithm Proof PDF
2 pages
Assist Calibrate 320GC
No ratings yet
Assist Calibrate 320GC
10 pages
Analysis and Design of Single Curvature Double Layer Tensegrity Grid
No ratings yet
Analysis and Design of Single Curvature Double Layer Tensegrity Grid
4 pages
Welding Defects, Causes & Correction: Leigh Baughurst
No ratings yet
Welding Defects, Causes & Correction: Leigh Baughurst
3 pages
Gas Metal Arc Welding (GMAW)
No ratings yet
Gas Metal Arc Welding (GMAW)
16 pages
Geography Ss2 & Phy Ss1
No ratings yet
Geography Ss2 & Phy Ss1
1 page
TMS470 Assembly Language Tools User&#39 S Guide
No ratings yet
TMS470 Assembly Language Tools User&#39 S Guide
336 pages
E-Book of Practical Assignments 2007-08
100% (1)
E-Book of Practical Assignments 2007-08
103 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
8 pages
APA Samples PDF
No ratings yet
APA Samples PDF
19 pages
03 Reference Material HMT ME302 Steady Heat Conduction PDF
No ratings yet
03 Reference Material HMT ME302 Steady Heat Conduction PDF
38 pages
CH 3
No ratings yet
CH 3
8 pages
Steganography Software
No ratings yet
Steganography Software
18 pages
Stegavision- Attention
No ratings yet
Stegavision- Attention
2 pages
ABA Methodolgy For Grid System
No ratings yet
ABA Methodolgy For Grid System
1 page
Power & Distribution X-Mer
No ratings yet
Power & Distribution X-Mer
5 pages
Biologic Width-A Review
No ratings yet
Biologic Width-A Review
9 pages
Microsoft DHCP For WLAN8100
No ratings yet
Microsoft DHCP For WLAN8100
26 pages
1 Advanced Data Analysis-Course Outline
No ratings yet
1 Advanced Data Analysis-Course Outline
7 pages
The First Calculating Machines
No ratings yet
The First Calculating Machines
13 pages
Semi-Detailed Lesson Plan in Earth Science Combi 4-Caregiving Classroom Observation 1 S.Y. 2021-2022
100% (1)
Semi-Detailed Lesson Plan in Earth Science Combi 4-Caregiving Classroom Observation 1 S.Y. 2021-2022
4 pages

Pig Setup and Test Run: by Kannan Kalidasan

Uploaded by

Pig Setup and Test Run: by Kannan Kalidasan

Uploaded by

Pig Setup and Test run

Pig Prompt is GRUNT

Pig session has two modes

Note to Remember ...

Pig Installation ...

Pig Installation ...

4.Login to Hadoop user hduser and set the environment variables.

Pig Installation ...

Test Run ...

Test Run ...

Test Run ...

Test Run ...

Script Explanation ...

Thank You !!!

You might also like