BDA - Unit-4 Part 1
BDA - Unit-4 Part 1
DUMP:
• The DUMP operator is used to run Pig Latin
statements and display the results on the screen.
DESCRIBE:
Use the DESCRIBE operator to review the schema of a
particular relation.
• The DESCRIBE operator is best used for debugging a
script.
Data processing Operators in Pig
ILLUSTRATE:
• ILLUSTRATE operator is used to review how data is
transformed through a sequence of Pig Latin
statements.
• ILLUSTRATE command is your best friend when it
comes to debugging a script.
EXPLAIN:
The EXPLAIN operator is used to display the logical,
physical, and MapReduce execution plans of a relation.
Types of UDF’s in Java
• Writing UDF’s using Java, you can create and
use the following three types of functions −
• Filter Functions −
The filter functions are used as conditions in
filter statements. These functions accept a Pig
value as input and return a Boolean value.
Types of UDF’s in Java
• Eval Functions −
The Eval functions are used in FOREACH-
GENERATE statements. These functions accept a
Pig value as input and return a Pig result.
• Algebraic Functions −
The Algebraic functions act on inner bags in a
FOREACHGENERATE statement. These functions
are used to perform full MapReduce operations
on an inner bag.
Supporting languages
• The UDF support is provided in six programming
languages, namely, Java, Jython, Python, JavaScript,
Ruby and Groovy.
• For writing UDF’s, complete support is provided in
Java and limited support is provided in all the
remaining languages.
• Using Java, we can write UDF’s involving all parts of
the processing like data load/store, column
transformation, and aggregation.
Supporting languages