0% found this document useful (0 votes)

2 views

Apache Pig

Apache Pig is a high-level platform for analyzing large datasets, primarily used with Hadoop, utilizing a language called Pig Latin for data manipulation. It simplifies MapReduce tasks, allowing programmers to write less code and handle both structured and unstructured data efficiently. Developed initially at Yahoo in 2006 and open-sourced in 2007, Apache Pig has become a key tool for data scientists in ad-hoc processing and quick prototyping.

Uploaded by

kajalyadav102703

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Apache Pig

Uploaded by

kajalyadav102703

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Apache Pig - Overview https://www.tutorialspoint.com/apache_pig/apache_pig_overview.

htm

Apache Pig is an abstraction over MapReduce. It is a tool/platform which is used to

analyze larger sets of data representing them as data flows. Pig is generally used with
Hadoop; we can perform all the data manipulation operations in Hadoop using Apache
Pig.

To write data analysis programs, Pig provides a high-level language known as Pig
Latin. This language provides various operators using which programmers can develop
their own functions for reading, writing, and processing data.

To analyze data using Apache Pig, programmers need to write scripts using Pig Latin
language. All these scripts are internally converted to Map and Reduce tasks. Apache
Pig has a component known as Pig Engine that accepts the Pig Latin scripts as input
and converts those scripts into MapReduce jobs.

Programmers who are not so good at Java normally used to struggle working with
Hadoop, especially while performing any MapReduce tasks. Apache Pig is a boon for all
such programmers.

Using Pig Latin, programmers can perform MapReduce tasks easily without
having to type complex codes in Java.

Apache Pig uses multi-query approach, thereby reducing the length of codes.
For example, an operation that would require you to type 200 lines of code
(LoC) in Java can be easily done by typing as less as just 10 LoC in Apache Pig.
Ultimately Apache Pig reduces the development time by almost 16 times.

Pig Latin is SQL-like language and it is easy to learn Apache Pig when you are
familiar with SQL.

Apache Pig provides many built-in operators to support data operations like
joins, filters, ordering, etc. In addition, it also provides nested data types like
tuples, bags, and maps that are missing from MapReduce.

Explore our latest online courses and learn new skills at your own pace. Enroll and
become a certified expert to boost your career.

1 of 4 10/14/2024, 10:00 AM
Apache Pig - Overview https://www.tutorialspoint.com/apache_pig/apache_pig_overview.htm

Apache Pig comes with the following features −

Rich set of operators − It provides many operators to perform operations like

join, sort, filer, etc.

Ease of programming − Pig Latin is similar to SQL and it is easy to write a Pig
script if you are good at SQL.

Optimization opportunities − The tasks in Apache Pig optimize their

execution automatically, so the programmers need to focus only on semantics
of the language.

Extensibility − Using the existing operators, users can develop their own
functions to read, process, and write data.

UDF’s − Pig provides the facility to create User-defined Functions in other

programming languages such as Java and invoke or embed them in Pig Scripts.

Handles all kinds of data − Apache Pig analyzes all kinds of data, both
structured as well as unstructured. It stores the results in HDFS.

Listed below are the major differences between Apache Pig and MapReduce.

Apache Pig MapReduce

MapReduce is a data processing

Apache Pig is a data flow language.
paradigm.

It is a high level language. MapReduce is low level and rigid.

It is quite difficult in MapReduce to

Performing a Join operation in Apache Pig is
perform a Join operation between
pretty simple.
datasets.

Any novice programmer with a basic

Exposure to Java is must to work
knowledge of SQL can work conveniently
with MapReduce.
with Apache Pig.

Apache Pig uses multi-query approach, MapReduce will require almost 20

thereby reducing the length of the codes to a times more the number of lines to
great extent. perform the same task.

There is no need for compilation. On

MapReduce jobs have a long
execution, every Apache Pig operator is
compilation process.
converted internally into a MapReduce job.

2 of 4 10/14/2024, 10:00 AM
Apache Pig - Overview https://www.tutorialspoint.com/apache_pig/apache_pig_overview.htm

Listed below are the major differences between Apache Pig and SQL.

Pig SQL

Pig Latin is a procedural language. SQL is a declarative language.

In Apache Pig, schema is optional. We

can store data without designing a
Schema is mandatory in SQL.
schema (values are stored as $01, $02
etc.)

The data model in Apache Pig is nested The data model used in SQL is flat
relational. relational.

Apache Pig provides limited opportunity There is more opportunity for query
for Query optimization. optimization in SQL.

In addition to above differences, Apache Pig Latin −

Allows splits in the pipeline.

Allows developers to store data anywhere in the pipeline.

Declares execution plans.

Provides operators to perform ETL (Extract, Transform, and Load) functions.

Both Apache Pig and Hive are used to create MapReduce jobs. And in some cases, Hive
operates on HDFS in a similar way Apache Pig does. In the following table, we have
listed a few significant points that set Apache Pig apart from Hive.

Apache Pig Hive

Apache Pig uses a language called Pig Hive uses a language called HiveQL. It
Latin. It was originally created at Yahoo. was originally created at Facebook.

Pig Latin is a data flow language. HiveQL is a query processing language.

Pig Latin is a procedural language and it

HiveQL is a declarative language.
fits in pipeline paradigm.

3 of 4 10/14/2024, 10:00 AM
Apache Pig - Overview https://www.tutorialspoint.com/apache_pig/apache_pig_overview.htm

Apache Pig can handle structured,

Hive is mostly for structured data.
unstructured, and semi-structured data.

Apache Pig is generally used by data scientists for performing tasks involving ad-hoc
processing and quick prototyping. Apache Pig is used −

To process huge data sources such as web logs.

To perform data processing for search platforms.

To process time sensitive data loads.

In 2006, Apache Pig was developed as a research project at Yahoo, especially to

create and execute MapReduce jobs on every dataset. In 2007, Apache Pig was open
sourced via Apache incubator. In 2008, the first release of Apache Pig came out. In
2010, Apache Pig graduated as an Apache top-level project.

4 of 4 10/14/2024, 10:00 AM

Google Hacking Database
83% (18)
Google Hacking Database
91 pages
Dangerous Google - Searching For Secrets PDF
88% (26)
Dangerous Google - Searching For Secrets PDF
12 pages
Download ebooks file The Volatility Edge in Options Trading New Technical Strategies for Investing in Unstable Markets 1st Edition Jeff Augen all chapters
No ratings yet
Download ebooks file The Volatility Edge in Options Trading New Technical Strategies for Investing in Unstable Markets 1st Edition Jeff Augen all chapters
55 pages
Dangerous Google Searching For Secrets
No ratings yet
Dangerous Google Searching For Secrets
12 pages
Google Hacking Database
No ratings yet
Google Hacking Database
91 pages
David Amos, Dan Bader, Joanna Jablonski, Fletcher Heisler Python
100% (15)
David Amos, Dan Bader, Joanna Jablonski, Fletcher Heisler Python
643 pages
Understanding Database Types - by Alex Xu
No ratings yet
Understanding Database Types - by Alex Xu
13 pages
UCC-1 Financing Statement
87% (39)
UCC-1 Financing Statement
94 pages
Policy Document Ucc Redemption Understanding The Process Further
80% (20)
Policy Document Ucc Redemption Understanding The Process Further
37 pages
How To Use Google Hack
100% (1)
How To Use Google Hack
4 pages
Hackers Black Book (2011-Edition)
No ratings yet
Hackers Black Book (2011-Edition)
6 pages
PayPal Hacks
100% (1)
PayPal Hacks
6 pages
School Mooe Budget Proposal Fy 2019
100% (6)
School Mooe Budget Proposal Fy 2019
12 pages
Dark Web Market Price Index Hacking Tools July 2018 Top10VPN2
91% (11)
Dark Web Market Price Index Hacking Tools July 2018 Top10VPN2
7 pages
Kali Linux Tools Descriptions
100% (2)
Kali Linux Tools Descriptions
26 pages
Allison, Berkowitz - 2008 - SQL For Microsoft Access PDF
100% (1)
Allison, Berkowitz - 2008 - SQL For Microsoft Access PDF
393 pages
canadianResumeTemplate 1
No ratings yet
canadianResumeTemplate 1
2 pages
Hackers Favorite Search Queries 4
100% (1)
Hackers Favorite Search Queries 4
6 pages
3006-0510-2019-07-23 - Method Statements PDF
No ratings yet
3006-0510-2019-07-23 - Method Statements PDF
1 page
Unit5 Bigdatanotes
No ratings yet
Unit5 Bigdatanotes
52 pages
Unit 5
No ratings yet
Unit 5
76 pages
BDA_UNIT-4-PIG-Notes
No ratings yet
BDA_UNIT-4-PIG-Notes
9 pages
Unit - V PIG Hadoop & Big Data: Pig Latin. This Language Provides Various Operators Using Which Programmers
No ratings yet
Unit - V PIG Hadoop & Big Data: Pig Latin. This Language Provides Various Operators Using Which Programmers
9 pages
BDA-Unit 5-notes
No ratings yet
BDA-Unit 5-notes
36 pages
BDP U4
No ratings yet
BDP U4
58 pages
5 PIG and HIVE
No ratings yet
5 PIG and HIVE
81 pages
BDA_HIVE & PIG-Other Notes in Detail
No ratings yet
BDA_HIVE & PIG-Other Notes in Detail
162 pages
pig
No ratings yet
pig
23 pages
What Is Apache Pig
No ratings yet
What Is Apache Pig
8 pages
Pig
No ratings yet
Pig
6 pages
BDA - Unit-4 Part 1
No ratings yet
BDA - Unit-4 Part 1
47 pages
3 Pig
No ratings yet
3 Pig
77 pages
Pig and Pig Latin
No ratings yet
Pig and Pig Latin
16 pages
Introduction To Apache Pig: Geeksforgeeks
No ratings yet
Introduction To Apache Pig: Geeksforgeeks
5 pages
UNIT 5 Complete Notes
No ratings yet
UNIT 5 Complete Notes
21 pages
Unit 4 Bba
No ratings yet
Unit 4 Bba
10 pages
unit-4-apachepig-210825041412
No ratings yet
unit-4-apachepig-210825041412
16 pages
Unit IV - Big Data Programming
No ratings yet
Unit IV - Big Data Programming
17 pages
4th Unit DM
No ratings yet
4th Unit DM
17 pages
Notes - 5 Unit Big Data
No ratings yet
Notes - 5 Unit Big Data
22 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Apache PIG.pptx
No ratings yet
Apache PIG.pptx
41 pages
bda-unit-4-060115-big-data-analytics-unit-4
No ratings yet
bda-unit-4-060115-big-data-analytics-unit-4
19 pages
KCS 061 - Big Data - Unit V
No ratings yet
KCS 061 - Big Data - Unit V
17 pages
Apache Pig
No ratings yet
Apache Pig
6 pages
PIG: A Big Data Processor: Tushar B. Kute
No ratings yet
PIG: A Big Data Processor: Tushar B. Kute
50 pages
PIG A Big Data Processor
No ratings yet
PIG A Big Data Processor
49 pages
Pig Full Lecture
No ratings yet
Pig Full Lecture
38 pages
Big Data Unit IV
No ratings yet
Big Data Unit IV
19 pages
Unit V-Apache Pig
No ratings yet
Unit V-Apache Pig
10 pages
Unit No. 8
No ratings yet
Unit No. 8
24 pages
Notes Unit 5 Bigdata
No ratings yet
Notes Unit 5 Bigdata
21 pages
PIg in BIg Data
No ratings yet
PIg in BIg Data
28 pages
PIg in BIg Data
No ratings yet
PIg in BIg Data
28 pages
Apache Pig
No ratings yet
Apache Pig
21 pages
Big Data Notes Pig
No ratings yet
Big Data Notes Pig
38 pages
Unit 5
No ratings yet
Unit 5
39 pages
Pig and Pig Latin
No ratings yet
Pig and Pig Latin
28 pages
Unit 4
No ratings yet
Unit 4
29 pages
Unit-V Pig Programming
No ratings yet
Unit-V Pig Programming
123 pages
Hadoop Pig
No ratings yet
Hadoop Pig
111 pages
Apache Pig Handy Notes Lab
No ratings yet
Apache Pig Handy Notes Lab
11 pages
BDA Module 4 - Part 1 (Pig) 2023
No ratings yet
BDA Module 4 - Part 1 (Pig) 2023
34 pages
pig skb
No ratings yet
pig skb
7 pages
Big_Data_Unit-5
No ratings yet
Big_Data_Unit-5
81 pages
Nosql 24 011 Pig
No ratings yet
Nosql 24 011 Pig
41 pages
Scet Unit 5
No ratings yet
Scet Unit 5
9 pages
Unit-3 Bda Kalyan
No ratings yet
Unit-3 Bda Kalyan
1 page
Pig Viva Ques
No ratings yet
Pig Viva Ques
6 pages
L Apachepigdataquery PDF
No ratings yet
L Apachepigdataquery PDF
10 pages
bda unit 4
No ratings yet
bda unit 4
16 pages
IMTC634 - Data Science - Chapter 16
No ratings yet
IMTC634 - Data Science - Chapter 16
20 pages
Emailing Pig PDF
No ratings yet
Emailing Pig PDF
23 pages
Apache Pig in noSql Databases
No ratings yet
Apache Pig in noSql Databases
5 pages
Apache Pig: Pig Is The Abstraction Over Mapreduce
No ratings yet
Apache Pig: Pig Is The Abstraction Over Mapreduce
4 pages
Pig
No ratings yet
Pig
16 pages
CH 6 BDA
No ratings yet
CH 6 BDA
10 pages
Pig Slides
No ratings yet
Pig Slides
46 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Google Hacking Database PDF
0% (1)
Google Hacking Database PDF
100 pages
SQL Crash Course
No ratings yet
SQL Crash Course
17 pages
Useful Google Hacks
100% (4)
Useful Google Hacks
7 pages
Full download Network Security and Cryptography Sarhan M. Musa pdf docx
No ratings yet
Full download Network Security and Cryptography Sarhan M. Musa pdf docx
40 pages
Introduction To Database Systems
No ratings yet
Introduction To Database Systems
42 pages
Excel Cheat Sheet: Travis Cuzick
100% (1)
Excel Cheat Sheet: Travis Cuzick
15 pages
TITLE 28 United States Code Sec. 3002
91% (11)
TITLE 28 United States Code Sec. 3002
77 pages
Microsoft Access For Beginners PDF
100% (2)
Microsoft Access For Beginners PDF
196 pages
Mythic Magazine #015
100% (3)
Mythic Magazine #015
34 pages
SFDSFD401 - Basics and Fundamentals of Database
No ratings yet
SFDSFD401 - Basics and Fundamentals of Database
77 pages
Master Cyber Digital Forensics
50% (2)
Master Cyber Digital Forensics
114 pages
Record Keeping and Documentation
100% (4)
Record Keeping and Documentation
18 pages
JCL Reference
No ratings yet
JCL Reference
722 pages
Plant Tissue Culture
No ratings yet
Plant Tissue Culture
9 pages
Differentiation Notes.
100% (1)
Differentiation Notes.
22 pages
Vocabulary - Exercise - Unit 8
No ratings yet
Vocabulary - Exercise - Unit 8
2 pages
EKA3 - 1 Pag.20
No ratings yet
EKA3 - 1 Pag.20
30 pages
BGP, Ospf, Ras, Rommon, Password Recovery
No ratings yet
BGP, Ospf, Ras, Rommon, Password Recovery
3 pages
[FREE PDF sample] The Nature of Endangerment in India: Tigers, 'Tribes', Extermination & Conservation, 1818-2020 Ezra Rashkow ebooks
100% (1)
[FREE PDF sample] The Nature of Endangerment in India: Tigers, 'Tribes', Extermination & Conservation, 1818-2020 Ezra Rashkow ebooks
57 pages
Logcat Home Fota Update Log
No ratings yet
Logcat Home Fota Update Log
970 pages
MAN B&W Service Letters
No ratings yet
MAN B&W Service Letters
8 pages
SDLC Interview Questions
No ratings yet
SDLC Interview Questions
8 pages
TOIEC Grammar - Relative Clauses
100% (1)
TOIEC Grammar - Relative Clauses
6 pages
Further Topics in Industry and Competitive Analysis
No ratings yet
Further Topics in Industry and Competitive Analysis
21 pages
Big Earn-3
No ratings yet
Big Earn-3
15 pages
Acc 321
No ratings yet
Acc 321
25 pages
WINSEM2018-19 - BIT1004 - ETH - SMV116 - VL2018195002513 - Reference Material I - Terms and Terminology
No ratings yet
WINSEM2018-19 - BIT1004 - ETH - SMV116 - VL2018195002513 - Reference Material I - Terms and Terminology
41 pages
Computer1 - Copy (2)
No ratings yet
Computer1 - Copy (2)
11 pages
Aleatoric Elements - From Boxed to Out of the Box Notation
No ratings yet
Aleatoric Elements - From Boxed to Out of the Box Notation
9 pages
Pullman - Wa Body Camera Policy
No ratings yet
Pullman - Wa Body Camera Policy
5 pages
Cape Biology Unit 2 Lab PDF Free
No ratings yet
Cape Biology Unit 2 Lab PDF Free
4 pages
Acr Adm
No ratings yet
Acr Adm
2 pages
Addison Logistics PVT, LTD
No ratings yet
Addison Logistics PVT, LTD
15 pages
Action Plan in Campus Journalism
100% (1)
Action Plan in Campus Journalism
7 pages
Application For Admission To MBA
No ratings yet
Application For Admission To MBA
1 page
Dock - Io: Decentralized Data Exchange Powered by Ethereum
No ratings yet
Dock - Io: Decentralized Data Exchange Powered by Ethereum
25 pages
Steam
No ratings yet
Steam
3 pages
Students' Difficulties Understanding A Specific Topic or Subject
100% (1)
Students' Difficulties Understanding A Specific Topic or Subject
7 pages
CBME2 - Quiz No. 1 - Finals PDF
No ratings yet
CBME2 - Quiz No. 1 - Finals PDF
3 pages
Carboxylic Acid and Amines Worksheet PDF
No ratings yet
Carboxylic Acid and Amines Worksheet PDF
22 pages
Q18021 0100D PK2 PD RPT PM 01 - REV0 - Infrastructure Design Development Report
0% (1)
Q18021 0100D PK2 PD RPT PM 01 - REV0 - Infrastructure Design Development Report
266 pages

Apache Pig

Uploaded by

Apache Pig

Uploaded by

Apache Pig - Overview https://www.tutorialspoint.com/apache_pig/apache_pig_overview.

Apache Pig is an abstraction over MapReduce. It is a tool/platform which is used to

Apache Pig comes with the following features −

Rich set of operators − It provides many operators to perform operations like

Optimization opportunities − The tasks in Apache Pig optimize their

UDF’s − Pig provides the facility to create User-defined Functions in other

Apache Pig MapReduce

MapReduce is a data processing

It is a high level language. MapReduce is low level and rigid.

It is quite difficult in MapReduce to

Any novice programmer with a basic

Apache Pig uses multi-query approach, MapReduce will require almost 20

There is no need for compilation. On

Pig Latin is a procedural language. SQL is a declarative language.

In Apache Pig, schema is optional. We

In addition to above differences, Apache Pig Latin −

Allows splits in the pipeline.

Allows developers to store data anywhere in the pipeline.

Declares execution plans.

Provides operators to perform ETL (Extract, Transform, and Load) functions.

Apache Pig Hive

Pig Latin is a data flow language. HiveQL is a query processing language.

Pig Latin is a procedural language and it

Apache Pig can handle structured,

To process huge data sources such as web logs.

To perform data processing for search platforms.

To process time sensitive data loads.

In 2006, Apache Pig was developed as a research project at Yahoo, especially to

You might also like