SlideShare a Scribd company logo
PIG
Relational Operators - II
Foreach, Filter, Join, Co-
Group, Union
Relational operator: foreach
 foreach the name itself describes for each record do
something. It is similar to For-Loop for specifying the
iteration that is executed repeatedly.
 Example: select few columns
grunt> a =foreach dataTransaction Generate $0,$1,$2 ;
It can also be used for various arithmetic operations such as
grunt> A= FOREACH dataTransaction Generate $0,($3+$4)
as S;
or
grunt> a =foreach dataTransaction Generate $0,
(TransAmt1+TransaAmt2) as S;
Rupak Roy
grunt > B= FOREACH A GENERATE $1/100;
or
grunt> b = foreach A GENERATE ($1/100) as D
C= FOREACH B GENERATE ( (D >50)?’above’ :
‘below’);
or
C= foreach B generate ( (D==50)?’Equal’ :
((D>50)?’above’:’down’));
Rupak Roy
Relational Operators: filter
 It is used to select the required tuple based on conditions.
 Or simply we can say filter helps to remove unwanted data/records based
on requirements.
Example such as:
grunt> F = Filter dataTransaction by TransAmt1 > 500;
Or
grunt> F1 = filter dataTransaction by (($4+$5)/100) > 2 ;
Or
grunt> F2 = filter dataTransaction by $6 == ‘Nunavut’;
Or
grunt> F3 = filter data Transaction by $1 MATCHES ‘ Car.*’;
#it will give all the names that starts with CA….
Or
grunt> F4 = filter dataTRansaction by NOT $1 MATCHES ‘Car.*’;
#it will give all the names that doesnot starts with CA
Rupak Roy
Relational Operators: filter
Or
grunt>F5 = filter dataTransaction by CustomerName MATCHES ‘Ca.*s’;
#it will filter the records based on names starting with ‘Ca’ and ends with
‘s ’ . To represent any number of characters we use * and in this case we
want any number of characters before ‘s’but after Ca
Or
grunt> F5 = filter dataTransaction by CustomerName MATCHES
‘ .*(nica|los) .* ‘
#now here the dot start ( .* ) means it can have any number of characters
before and after .*(nica or los) .*
nica = MONICA Federle
los = Carlos Daly
Rupak Roy
Relational operators: Join
 Join Operator is used when we have to combine
two or more datasets.
 Joining the two or more datasets is done based
on a common key from the datasets.
 Joins can be of 3 types
1. Self-join
2. Inner-join
3. Outer-join – left join, right join and full join
Rupak Roy
Self – join
 Self join is used for joining a table itself.
Let’s understand this with the help of an example:
#Load the same dataset under different Alias name:
grunt> join1= LOAD ‘/home/hduser/datasets/join1.csv’
using PigStorage(‘,’) as ( CustomerNAme:chararray,
Transaction_ID:bytearray, ProductName: chararray);
grunt> join11= LOAD
‘/home/hduser/datasets/join1.csv’ using PigStorage(‘,’)
as ( CustomerNAme:chararray,
Transaction_ID:bytearray, ProductName: chararray);
Rupak Roy
#perform Self-join using JOIN operator
grunt> selfjoin = JOIN join1 by Transaction_ID, join11
by Transaction_ID;
grunt> dump selfjoin;
Rupak Roy
Inner-join
 Is also known as equijoin.
 Inner join returns rows when there is a match in both
tables based on a common key or a value.
#Load data2
grunt> join2= LOAD ‘/home/hduser/datasets/join2.csv’
using PigStorage(‘,’) as ( CustomerNAme:chararray,
Transaction_ID:bytearray, Department: chararray);
grunt> innerjoin = JOIN join1 by Transaction_ID, join2 by
Transaction_ID;
grunt> dump innerjoin;
Rupak Roy
Outer Join
 Left Outer Join returns all rows
from the left table, even if there is no
match in the right table and
it will take only the values from the right table that matches
with the left table.
grunt> leftouter = JOIN join1 by Transaction_ID LEFT OUTER, join2 BY Transaction_ID;
 Right Outer Join: is the opposite of Left Outer Join. It returns all
the rows from the right table even if there are no matches in
the left table and it will take only the values from the left table
that matches with the
right table
grunt> rightouter =JOIN join1 by Transaction_ID
RIGHT OUTER ,
join2 by Transaction_ID;
Rupak Roy
Outer Join
 Full Outer Join: returns all the rows from
both the tables when there is a match in
one of the relations.
grunt> fullouter = JOIN join1 by
Transaction_ID FULL OUTER, join2 BY
Transaction_ID;
Rupak Roy
Joins are one of the important operators
Rupak Roy
CO-Group: which essentially performs a join and
a group at the same time.
COGROUP on multiple datasets results in a record
with a key dataset.
To perform COGROUP type:
grunt> COGROUP join1 on Transaction_ID, join2 on
Transaction_ID;
Rupak Roy
Relational Operator: UNION
 Is to merge the contents of two and more datasets.
grunt> U = UNION join1, join2;
dump U;
What if we want to merge two datasets that has different schemas exampe:
join1= LOAD ‘/home/hduser/datasets/join1.csv’ using PigStorage(‘,’) as
( CustomerNAme:chararray, Transaction_ID:chararray, Department: chararray);
join1u= LOAD ‘/home/hduser/datasets/join1.csv’ using PigStorage(‘,’) as
( CustomerNAme:chararray, Transaction_ID:int, Department: chararray);
join2= LOAD ‘/home/hduser/datasets/join2.csv’ using PigStorage(‘,’) as
( CustomerNAme:chararray, Transaction_ID:chararray, Department: chararray);
Unioned= UNION join1u,join2 ;
Describe Unioned; it will through an error ‘cannot cast to byte array ‘ due to different data
types of transaction ID.
Rupak Roy
 It will be very tedious and time consuming to go
back and forth and load the data to change the
schema. We can also explicitly define the schema
while using relational queries without disturbing the
original schema.
grunt> joinM= FOREACH join2 generate $0,(int)$1,$2;
unioned = UNION joinM, join1u;
describe unioned;
Alternatively to perform UNION for incompatible
data type using ONSCHEMA;
grunt>U= UNION ONSCHEMA join1u, join2;
Rupak Roy
Relational Operator: RANK
 Returns rank to each tuple with a relation;
Example:
grunt> vi names
Zara,1,F
David,2,F
David,2,T
Alan,2,M
Calvin,3,M
Alan,5,M
Chris,8,M
Ellie ,7,F
Bob,8,M
Carlos,2,M
Then press ‘ ESC’ key then type ‘ :wq! ‘ to save
grunt> names = load ‘/home/hduser/datasets/names’ using PigStorage (‘,’) as
( n1:charrray,n2:int,n3:chararray);
grunt> DUMP names;
Rupak Roy
grunt> ranked = RANK names;
grunt> dump ranked;
(1, Zara,1,F)
(2, David,2,F)
(3 David,2,T)
(4 Alan,2,M)
(5, Calvin,2,M)
(6, Alan,5,M)
(7, Chris,8,M)
(8, Ellie ,7,F)
(9, Bob,8,M)
(10,Carlos,2,F)
We can also implement rank using two fields, each one with
different sorting order.
grunt> ranked2 = RANK names by N1 ASC, N2 DESC;
grunt> dump ranked2;
Rupak Roy
 Sometimes we might encounter the RANK has been
assigned to 2 fields or 2 records with a same rank.
 To overcome the issue we have a small function call
DENSE
grunt> rankedG = RANK names by N1 DESC, N2 ASC DENSE;
(1,Zara,1,F)
(2,Elie,7,F)
(3,David,2,F)
(3,David,2,T)
(4,Chris,8,M)
(5,Carlos,2,F)
(6,Calvin,3,M)
(7,bob,8,M)
(8,Alan,2,M)
(9,Alan,5,M)
Rupak Roy
Next
 We will learn UDF (User Define Function).
Rupak Roy

More Related Content

What's hot (19)

PDF
Python list
Mohammed Sikander
 
PDF
[1062BPY12001] Data analysis with R / April 26
Kevin Chun-Hsien Hsu
 
PPTX
Chapter 2 grouping,scalar and aggergate functions,joins inner join,outer join
baabtra.com - No. 1 supplier of quality freshers
 
PDF
SQL Functions and Operators
Mohan Kumar.R
 
PDF
R Programming: Importing Data In R
Rsquared Academy
 
PDF
Data manipulation on r
Abhik Seal
 
PDF
Python Variable Types, List, Tuple, Dictionary
Soba Arjun
 
PDF
R factors
Learnbay Datascience
 
PDF
Read data from Excel spreadsheets into R
Rsquared Academy
 
PDF
Python set
Mohammed Sikander
 
PDF
Python Workshop Part 2. LUG Maniapl
Ankur Shrivastava
 
PPTX
List in Python
Siddique Ibrahim
 
DOCX
Technical
ved prakash
 
PDF
Data handling in r
Abhik Seal
 
PDF
New features in Ruby 2.4
Ireneusz Skrobiś
 
PPTX
2. R-basics, Vectors, Arrays, Matrices, Factors
krishna singh
 
PDF
Data type list_methods_in_python
deepalishinkar1
 
PPTX
ABAP 7.x New Features and Commands
Dr. Kerem Koseoglu
 
Python list
Mohammed Sikander
 
[1062BPY12001] Data analysis with R / April 26
Kevin Chun-Hsien Hsu
 
Chapter 2 grouping,scalar and aggergate functions,joins inner join,outer join
baabtra.com - No. 1 supplier of quality freshers
 
SQL Functions and Operators
Mohan Kumar.R
 
R Programming: Importing Data In R
Rsquared Academy
 
Data manipulation on r
Abhik Seal
 
Python Variable Types, List, Tuple, Dictionary
Soba Arjun
 
Read data from Excel spreadsheets into R
Rsquared Academy
 
Python set
Mohammed Sikander
 
Python Workshop Part 2. LUG Maniapl
Ankur Shrivastava
 
List in Python
Siddique Ibrahim
 
Technical
ved prakash
 
Data handling in r
Abhik Seal
 
New features in Ruby 2.4
Ireneusz Skrobiś
 
2. R-basics, Vectors, Arrays, Matrices, Factors
krishna singh
 
Data type list_methods_in_python
deepalishinkar1
 
ABAP 7.x New Features and Commands
Dr. Kerem Koseoglu
 

Similar to Apache Pig Relational Operators - II (20)

PPTX
Session 04 -Pig Continued
AnandMHadoop
 
PPTX
Pig statements
Ganesh Sanap
 
PPTX
Apache pig
Jigar Parekh
 
PDF
pig intro.pdf
ssuser2d043c
 
PPTX
Understanding Pig and Hive in Apache Hadoop
mohindrachinmay
 
PPTX
PigHive.pptx
DenizDural2
 
PPTX
Pig_Presentation
Arjun Shah
 
PPTX
PigHive presentation and hive impor.pptx
Rahul Borate
 
PPTX
Unit_9.pptx
BhagyasriPatel1
 
PPTX
PigHive.pptx
KeerthiChukka
 
PPTX
04 pig data operations
Subhas Kumar Ghosh
 
PPTX
lecture 4 Relational Algebra my sql work
wwcd090
 
PDF
0808.pdf
ssuser0562f1
 
PDF
1695304562_RELATIONAL_ALGEBRA.pdf
Kavinilaa
 
PPT
lecture2.ppt
ImXaib
 
PPT
lecture2.ppt
BALAMURUGANK63
 
PPTX
joins and subqueries in big data analysis
SanSan149
 
PPT
Ch7
muteddy
 
PPTX
OracleSQLraining.pptx
Rajendra Jain
 
PPTX
Database Management System Review
Kaya Ota
 
Session 04 -Pig Continued
AnandMHadoop
 
Pig statements
Ganesh Sanap
 
Apache pig
Jigar Parekh
 
pig intro.pdf
ssuser2d043c
 
Understanding Pig and Hive in Apache Hadoop
mohindrachinmay
 
PigHive.pptx
DenizDural2
 
Pig_Presentation
Arjun Shah
 
PigHive presentation and hive impor.pptx
Rahul Borate
 
Unit_9.pptx
BhagyasriPatel1
 
PigHive.pptx
KeerthiChukka
 
04 pig data operations
Subhas Kumar Ghosh
 
lecture 4 Relational Algebra my sql work
wwcd090
 
0808.pdf
ssuser0562f1
 
1695304562_RELATIONAL_ALGEBRA.pdf
Kavinilaa
 
lecture2.ppt
ImXaib
 
lecture2.ppt
BALAMURUGANK63
 
joins and subqueries in big data analysis
SanSan149
 
Ch7
muteddy
 
OracleSQLraining.pptx
Rajendra Jain
 
Database Management System Review
Kaya Ota
 
Ad

More from Rupak Roy (20)

PDF
Hierarchical Clustering - Text Mining/NLP
Rupak Roy
 
PDF
Clustering K means and Hierarchical - NLP
Rupak Roy
 
PDF
Network Analysis - NLP
Rupak Roy
 
PDF
Topic Modeling - NLP
Rupak Roy
 
PDF
Sentiment Analysis Practical Steps
Rupak Roy
 
PDF
NLP - Sentiment Analysis
Rupak Roy
 
PDF
Text Mining using Regular Expressions
Rupak Roy
 
PDF
Introduction to Text Mining
Rupak Roy
 
PDF
Apache Hbase Architecture
Rupak Roy
 
PDF
Introduction to Hbase
Rupak Roy
 
PDF
Apache Hive Table Partition and HQL
Rupak Roy
 
PDF
Installing Apache Hive, internal and external table, import-export
Rupak Roy
 
PDF
Introductive to Hive
Rupak Roy
 
PDF
Scoop Job, import and export to RDBMS
Rupak Roy
 
PDF
Apache Scoop - Import with Append mode and Last Modified mode
Rupak Roy
 
PDF
Introduction to scoop and its functions
Rupak Roy
 
PDF
Introduction to Flume
Rupak Roy
 
PDF
Apache PIG casting, reference
Rupak Roy
 
PDF
Pig Latin, Data Model with Load and Store Functions
Rupak Roy
 
PDF
Introduction to PIG components
Rupak Roy
 
Hierarchical Clustering - Text Mining/NLP
Rupak Roy
 
Clustering K means and Hierarchical - NLP
Rupak Roy
 
Network Analysis - NLP
Rupak Roy
 
Topic Modeling - NLP
Rupak Roy
 
Sentiment Analysis Practical Steps
Rupak Roy
 
NLP - Sentiment Analysis
Rupak Roy
 
Text Mining using Regular Expressions
Rupak Roy
 
Introduction to Text Mining
Rupak Roy
 
Apache Hbase Architecture
Rupak Roy
 
Introduction to Hbase
Rupak Roy
 
Apache Hive Table Partition and HQL
Rupak Roy
 
Installing Apache Hive, internal and external table, import-export
Rupak Roy
 
Introductive to Hive
Rupak Roy
 
Scoop Job, import and export to RDBMS
Rupak Roy
 
Apache Scoop - Import with Append mode and Last Modified mode
Rupak Roy
 
Introduction to scoop and its functions
Rupak Roy
 
Introduction to Flume
Rupak Roy
 
Apache PIG casting, reference
Rupak Roy
 
Pig Latin, Data Model with Load and Store Functions
Rupak Roy
 
Introduction to PIG components
Rupak Roy
 
Ad

Recently uploaded (20)

DOCX
MUSIC AND ARTS 5 DLL MATATAG LESSON EXEMPLAR QUARTER 1_Q1_W1.docx
DianaValiente5
 
PDF
Free eBook ~100 Common English Proverbs (ebook) pdf.pdf
OH TEIK BIN
 
PDF
TLE 8 QUARTER 1 MODULE WEEK 1 MATATAG CURRICULUM
denniseraya1997
 
PPTX
Comparing Translational and Rotational Motion.pptx
AngeliqueTolentinoDe
 
PPTX
How to Add a Custom Button in Odoo 18 POS Screen
Celine George
 
PPTX
Lesson 1 Cell (Structures, Functions, and Theory).pptx
marvinnbustamante1
 
PPTX
week 1-2.pptx yueojerjdeiwmwjsweuwikwswiewjrwiwkw
rebznelz
 
PDF
The Power of Compound Interest (Stanford Initiative for Financial Decision-Ma...
Stanford IFDM
 
PDF
CAD25 Gbadago and Fafa Presentation Revised-Aston Business School, UK.pdf
Kweku Zurek
 
PDF
Rapid Mathematics Assessment Score sheet for all Grade levels
DessaCletSantos
 
PPTX
ESP 10 Edukasyon sa Pagpapakatao PowerPoint Lessons Quarter 1.pptx
Sir J.
 
PDF
COM and NET Component Services 1st Edition Juval Löwy
kboqcyuw976
 
PDF
Quiz Night Live May 2025 - Intra Pragya Online General Quiz
Pragya - UEM Kolkata Quiz Club
 
PPTX
Natural Language processing using nltk.pptx
Ramakrishna Reddy Bijjam
 
PDF
Gladiolous Cultivation practices by AKL.pdf
kushallamichhame
 
PPTX
How to Create & Manage Stages in Odoo 18 Helpdesk
Celine George
 
PPTX
SYMPATHOMIMETICS[ADRENERGIC AGONISTS] pptx
saip95568
 
PPTX
How to Setup Automatic Reordering Rule in Odoo 18 Inventory
Celine George
 
PPTX
Connecting Linear and Angular Quantities in Human Movement.pptx
AngeliqueTolentinoDe
 
PDF
TechSoup Microsoft Copilot Nonprofit Use Cases and Live Demo - 2025.06.25.pdf
TechSoup
 
MUSIC AND ARTS 5 DLL MATATAG LESSON EXEMPLAR QUARTER 1_Q1_W1.docx
DianaValiente5
 
Free eBook ~100 Common English Proverbs (ebook) pdf.pdf
OH TEIK BIN
 
TLE 8 QUARTER 1 MODULE WEEK 1 MATATAG CURRICULUM
denniseraya1997
 
Comparing Translational and Rotational Motion.pptx
AngeliqueTolentinoDe
 
How to Add a Custom Button in Odoo 18 POS Screen
Celine George
 
Lesson 1 Cell (Structures, Functions, and Theory).pptx
marvinnbustamante1
 
week 1-2.pptx yueojerjdeiwmwjsweuwikwswiewjrwiwkw
rebznelz
 
The Power of Compound Interest (Stanford Initiative for Financial Decision-Ma...
Stanford IFDM
 
CAD25 Gbadago and Fafa Presentation Revised-Aston Business School, UK.pdf
Kweku Zurek
 
Rapid Mathematics Assessment Score sheet for all Grade levels
DessaCletSantos
 
ESP 10 Edukasyon sa Pagpapakatao PowerPoint Lessons Quarter 1.pptx
Sir J.
 
COM and NET Component Services 1st Edition Juval Löwy
kboqcyuw976
 
Quiz Night Live May 2025 - Intra Pragya Online General Quiz
Pragya - UEM Kolkata Quiz Club
 
Natural Language processing using nltk.pptx
Ramakrishna Reddy Bijjam
 
Gladiolous Cultivation practices by AKL.pdf
kushallamichhame
 
How to Create & Manage Stages in Odoo 18 Helpdesk
Celine George
 
SYMPATHOMIMETICS[ADRENERGIC AGONISTS] pptx
saip95568
 
How to Setup Automatic Reordering Rule in Odoo 18 Inventory
Celine George
 
Connecting Linear and Angular Quantities in Human Movement.pptx
AngeliqueTolentinoDe
 
TechSoup Microsoft Copilot Nonprofit Use Cases and Live Demo - 2025.06.25.pdf
TechSoup
 

Apache Pig Relational Operators - II

  • 1. PIG Relational Operators - II Foreach, Filter, Join, Co- Group, Union
  • 2. Relational operator: foreach  foreach the name itself describes for each record do something. It is similar to For-Loop for specifying the iteration that is executed repeatedly.  Example: select few columns grunt> a =foreach dataTransaction Generate $0,$1,$2 ; It can also be used for various arithmetic operations such as grunt> A= FOREACH dataTransaction Generate $0,($3+$4) as S; or grunt> a =foreach dataTransaction Generate $0, (TransAmt1+TransaAmt2) as S; Rupak Roy
  • 3. grunt > B= FOREACH A GENERATE $1/100; or grunt> b = foreach A GENERATE ($1/100) as D C= FOREACH B GENERATE ( (D >50)?’above’ : ‘below’); or C= foreach B generate ( (D==50)?’Equal’ : ((D>50)?’above’:’down’)); Rupak Roy
  • 4. Relational Operators: filter  It is used to select the required tuple based on conditions.  Or simply we can say filter helps to remove unwanted data/records based on requirements. Example such as: grunt> F = Filter dataTransaction by TransAmt1 > 500; Or grunt> F1 = filter dataTransaction by (($4+$5)/100) > 2 ; Or grunt> F2 = filter dataTransaction by $6 == ‘Nunavut’; Or grunt> F3 = filter data Transaction by $1 MATCHES ‘ Car.*’; #it will give all the names that starts with CA…. Or grunt> F4 = filter dataTRansaction by NOT $1 MATCHES ‘Car.*’; #it will give all the names that doesnot starts with CA Rupak Roy
  • 5. Relational Operators: filter Or grunt>F5 = filter dataTransaction by CustomerName MATCHES ‘Ca.*s’; #it will filter the records based on names starting with ‘Ca’ and ends with ‘s ’ . To represent any number of characters we use * and in this case we want any number of characters before ‘s’but after Ca Or grunt> F5 = filter dataTransaction by CustomerName MATCHES ‘ .*(nica|los) .* ‘ #now here the dot start ( .* ) means it can have any number of characters before and after .*(nica or los) .* nica = MONICA Federle los = Carlos Daly Rupak Roy
  • 6. Relational operators: Join  Join Operator is used when we have to combine two or more datasets.  Joining the two or more datasets is done based on a common key from the datasets.  Joins can be of 3 types 1. Self-join 2. Inner-join 3. Outer-join – left join, right join and full join Rupak Roy
  • 7. Self – join  Self join is used for joining a table itself. Let’s understand this with the help of an example: #Load the same dataset under different Alias name: grunt> join1= LOAD ‘/home/hduser/datasets/join1.csv’ using PigStorage(‘,’) as ( CustomerNAme:chararray, Transaction_ID:bytearray, ProductName: chararray); grunt> join11= LOAD ‘/home/hduser/datasets/join1.csv’ using PigStorage(‘,’) as ( CustomerNAme:chararray, Transaction_ID:bytearray, ProductName: chararray); Rupak Roy
  • 8. #perform Self-join using JOIN operator grunt> selfjoin = JOIN join1 by Transaction_ID, join11 by Transaction_ID; grunt> dump selfjoin; Rupak Roy
  • 9. Inner-join  Is also known as equijoin.  Inner join returns rows when there is a match in both tables based on a common key or a value. #Load data2 grunt> join2= LOAD ‘/home/hduser/datasets/join2.csv’ using PigStorage(‘,’) as ( CustomerNAme:chararray, Transaction_ID:bytearray, Department: chararray); grunt> innerjoin = JOIN join1 by Transaction_ID, join2 by Transaction_ID; grunt> dump innerjoin; Rupak Roy
  • 10. Outer Join  Left Outer Join returns all rows from the left table, even if there is no match in the right table and it will take only the values from the right table that matches with the left table. grunt> leftouter = JOIN join1 by Transaction_ID LEFT OUTER, join2 BY Transaction_ID;  Right Outer Join: is the opposite of Left Outer Join. It returns all the rows from the right table even if there are no matches in the left table and it will take only the values from the left table that matches with the right table grunt> rightouter =JOIN join1 by Transaction_ID RIGHT OUTER , join2 by Transaction_ID; Rupak Roy
  • 11. Outer Join  Full Outer Join: returns all the rows from both the tables when there is a match in one of the relations. grunt> fullouter = JOIN join1 by Transaction_ID FULL OUTER, join2 BY Transaction_ID; Rupak Roy
  • 12. Joins are one of the important operators Rupak Roy
  • 13. CO-Group: which essentially performs a join and a group at the same time. COGROUP on multiple datasets results in a record with a key dataset. To perform COGROUP type: grunt> COGROUP join1 on Transaction_ID, join2 on Transaction_ID; Rupak Roy
  • 14. Relational Operator: UNION  Is to merge the contents of two and more datasets. grunt> U = UNION join1, join2; dump U; What if we want to merge two datasets that has different schemas exampe: join1= LOAD ‘/home/hduser/datasets/join1.csv’ using PigStorage(‘,’) as ( CustomerNAme:chararray, Transaction_ID:chararray, Department: chararray); join1u= LOAD ‘/home/hduser/datasets/join1.csv’ using PigStorage(‘,’) as ( CustomerNAme:chararray, Transaction_ID:int, Department: chararray); join2= LOAD ‘/home/hduser/datasets/join2.csv’ using PigStorage(‘,’) as ( CustomerNAme:chararray, Transaction_ID:chararray, Department: chararray); Unioned= UNION join1u,join2 ; Describe Unioned; it will through an error ‘cannot cast to byte array ‘ due to different data types of transaction ID. Rupak Roy
  • 15.  It will be very tedious and time consuming to go back and forth and load the data to change the schema. We can also explicitly define the schema while using relational queries without disturbing the original schema. grunt> joinM= FOREACH join2 generate $0,(int)$1,$2; unioned = UNION joinM, join1u; describe unioned; Alternatively to perform UNION for incompatible data type using ONSCHEMA; grunt>U= UNION ONSCHEMA join1u, join2; Rupak Roy
  • 16. Relational Operator: RANK  Returns rank to each tuple with a relation; Example: grunt> vi names Zara,1,F David,2,F David,2,T Alan,2,M Calvin,3,M Alan,5,M Chris,8,M Ellie ,7,F Bob,8,M Carlos,2,M Then press ‘ ESC’ key then type ‘ :wq! ‘ to save grunt> names = load ‘/home/hduser/datasets/names’ using PigStorage (‘,’) as ( n1:charrray,n2:int,n3:chararray); grunt> DUMP names; Rupak Roy
  • 17. grunt> ranked = RANK names; grunt> dump ranked; (1, Zara,1,F) (2, David,2,F) (3 David,2,T) (4 Alan,2,M) (5, Calvin,2,M) (6, Alan,5,M) (7, Chris,8,M) (8, Ellie ,7,F) (9, Bob,8,M) (10,Carlos,2,F) We can also implement rank using two fields, each one with different sorting order. grunt> ranked2 = RANK names by N1 ASC, N2 DESC; grunt> dump ranked2; Rupak Roy
  • 18.  Sometimes we might encounter the RANK has been assigned to 2 fields or 2 records with a same rank.  To overcome the issue we have a small function call DENSE grunt> rankedG = RANK names by N1 DESC, N2 ASC DENSE; (1,Zara,1,F) (2,Elie,7,F) (3,David,2,F) (3,David,2,T) (4,Chris,8,M) (5,Carlos,2,F) (6,Calvin,3,M) (7,bob,8,M) (8,Alan,2,M) (9,Alan,5,M) Rupak Roy
  • 19. Next  We will learn UDF (User Define Function). Rupak Roy