0% found this document useful (0 votes)

8 views

Pig

Uploaded by

sumeetmkhetan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Pig

Uploaded by

sumeetmkhetan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 6

=================================================================

PIG Architecture
=================================================================
Two modes :

1) Local - use for files on local system. processing is done on local machine
2)Map reduce - use for HDFS i.e shared file system. processing is done in
distributed manner i.e on multiple michines/nodes.
3)Hcatlog - pig -useHCatlog

=================================================================
DATA INGESTION (SCHEMALESS LOADING)
=================================================================

To go to pig :

pig -x local

EMP = LOAD '/home/itelligence/Dataset/CSV/EMP.csv' USING PigStorage(',');

DUMP EMP

SELECT = GENERATE
FROM = FOREACH

A= FOREACH EMP GENERATE $1,$2; ---------------FOR 1 and 2 COLUMN

A= FOREACH EMP GENERATE $2..$5; ------FOR COLUMNS from 2 to 5
A= FOREACH EMP GENERATE *; -------------FOR ALL columns

=================================================================
DATA INGESTION (SCHEMA BASED LOADING)
=================================================================

DEFAULT DATATYPE IN PIG IS 'BYTEARRAY'

i.e if you describe any variable it will show its datatype as 'BYTEARRAY'

EMP = LOAD '/home/itelligence/Dataset/csv/EMP.csv' USING PigStorage(',') as

(EMPLOYEE_ID:INT,FIRST_NAME:CHARARRAY,LAST_NAME:CHARARRAY,EMAIL:CHARARRAY,PHONE_NUM
BER:CHARARRAY,HIRE_DATE:CHARARRAY,JOB_ID:CHARARRAY,SALARY:INT,MANAGER_ID:INT,DEPART
MENT_ID:INT);

DESCRIBE B ----TO SEE THE SCHEMA

=================================================================
FILTERING
=================================================================

B = FILTER EMP BY SALARY>5000;

B =FILTER EMP BY SUBSTRING(FIRST_NAME,0,1) =='A' OR SUBSTRING(FIRST_NAME,0,1) =='a'

B = foreach EMP generate FIRST_NAME, SUBSTRING(FIRST_NAME,0,1) =='A'

PIG FUNCTION LIBRARY :

/home/itelligence/pig-0.11.1/docs/func.html
ORDER AMMOUNT BETWEEN 200 AND 400 IN FILE ORDER.CSV
ORDER AMMOUNT== $17

A = Load '/home/itelligence/orders.csv' using PigStorage(',');

B = FOREACH A generate $0,$17;

B1= FILTER B BY SUBSTRING($1,0,1) == '$';

C= FOREACH A generate $0,$18;

C1= FILTER C BY SUBSTRING($1,0,1) == '$';

D = FOREACH A generate $0,$19;

D1= FILTER D BY SUBSTRING($1,0,1) == '$';

E = UNION B1,C1,D1;

F = FOREACH E GENERATE $0, (INT)SUBSTRING(TRIM($1),1,20);

G = FILTER F BY $1 >200 AND $1<4000;

Dump G ;
=================================================================
LOAD DATA WITH MULTIPLE DELIMITER
=================================================================

When there is multiple delimiter present in file like (,:..) then we first load the
file to pig without PigStorage and then using REGEX_EXTRACT_ALL function we divide
the fields in coma seperated

A = LOAD '/home/itelligence/Untitled Document 1' ;

B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL($0,'(.) (.),(.):(.)'));

STORE B INTO '/home/itelligence/sumeet' USING PigStorage(',');

=================================================================
Group by
=================================================================

EMP = LOAD '/home/itelligence/Dataset/CSV/EMP.csv' USING PigStorage(',');

A = GROUP EMP BY $9;

To find the number of employee in each department ($0) :

B = FOREACH A GENERATE $0, COUNT($1);

DUMP B;

To find the avg salary($7) of employee in each department ($0) :

C = FROEACH A GENERATE $0,AVG($1.$7);

DUMP C;

==============================================================
SCRIPT TO FILTER ONLY ERROR CODE AND ERROR MESSAGE FROM THE LOG FILE :
LOG = LOAD '/home/itelligence/pig_1491722865605.log';

A = FILTER LOG BY SUBSTRING($0,0,5) == 'ERROR';

B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL($0,'ERROR (.):(.)'));

DUMP B;
=================================================================
JOINS
=================================================================

FOR LEFT OUTER AND RIGHT OUTER JOINS YOU NEED TO SPECIFY THE SECEMA OF THE FILES
WHILE LOADING THE FILES

FOR LEFT OUTER SPECIFY SCHEMA FOR LEFT FILE AND VICE VERSA

EMP = LOAD '/home/itelligence/Dataset/CSV/EMP.csv' USING PigStorage (',') AS ();

DEP = LOAD '/home/itelligence/DEPT' USING PigStorage (',') AS ();
JOIN_INNER = JOIN EMP BY $9,DEP BY $0;
JOIN_LO = JOIN EMP BY $9 LEFT OUTER,DEP BY $0;
JOIN_RO = JOIN EMP BY $9 RIGHT OUTER,DEP BY $0;
JOIN_FO = JOIN EMP BY $9 FULL OUTER,DEP BY $0;

REPLICATED JOIN :

IN THIS CASE WHEN WE JOIN THE TWO TABLES THE SMALLER TABLE IS COPPIED TO THE NODE
WHERE ALL THE BOLCKS OF THE LARGER TABLE IS PLACED SO THAT LOOK UP CAN BE
PERFORMED. USED FOR PERFORMANCE OPTIMIZATION.

SKEWED JOIN :

MERG JOIN :

============================SCHEMA ON READ=======================

=================================================================
COGROUP
=================================================================

CG = COGROUP EMP BY $9,DEPT BY $0;

A = FOREACH CG GENERATE $0,COUNT($1),COUNT($2);

=================================================================
JSONLOADER
=================================================================

IN ORDER TO LOAD THE JSON FILE WE USE JSONLOADER INSTEAD OF PigStorage

WHILE USING JSONLOADER SCHEMA IS SPECIFIED IN SINGLE QUOTES ('')

A = LOAD '<JSONFILE PATH>' USING JsonLoader;

DUMP A;
=================================================================
UDF
=================================================================

CREATE PROJECT ==> ADD LIBRARIES(HADOOP LIBRARIES,PIG LIBRARIES) ==> CREATE PACKAGE
==> CREATE CLASS ==> PASTE PROGRAM ==> EXPORT TO JAR

now to tell pig about udf location :

package myudfs;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
public class ToUpper extends EvalFunc<String>
{ public String exec(Tuple input) throws IOException
{
if (input == null || input.size() == 0)
return null;

else {
String str = input.get(0).toString();
return str.toUpperCase();
}
}

=================================================================
PIGYBANK
=================================================================

contains predefined UDF. First register the piggybank in pig and then we can use
any UDF in pigybank

=================================================================
XML FILE LOADING TO PIG
=================================================================

A = LOAD '/home/itelligence/hadoop-1.2.0/conf/hdfs-site.xml' USING

org.apache.pig.piggybank.storage.XMLLoader('property');

B = FOREACH A GENERATE
FLATTEN(REGEX_EXTRACT_ALL($0,'<property>\\s*<name>(.*)</name>\\s*<value>(.*)</
value>\\s*</property>'));

here \\s* is used to remove the spaces(multiple spaces)d

use \\s to remove single space
use \s to insert spaces
=================================================================
DYNAMIC PATH
=================================================================

(writing shell comands in file and then executing file directly either from
hadoop($) prompt or grunt prompt)

from $ prompt :

pig -x local --param PATH=/home/itelligence/Desktop/yatra.txt

/home/itelligence/Desktop/script_ns.pig

FROM GRUNT SHELL:

exec /home/itelligence/Desktop/script_ns.pig
=================================================================
WORD COUNT IN FLAT FILE
=================================================================

A = LOAD 'FILE PATH';

B = FOREACH A GENERATE FLATTEN(TOKENIZE($0));

B1 = FOREACH B GENERATE REPLACE ($0,'[^a-zA-Z0-9]','');

---------------this is the generalised expression to replace all the special

characters with null

C = GROUP B BY $0;

D = FOREACH C GENERATE $0, COUNT($1);

=======================================================
START-UP

employees = LOAD '/home/itelligence/Dataset/CSV/EMP.csv' USING PigStorage(',') AS

(EMPLOYEE_ID:int,FIRST_NAME:chararray,LAST_NAME:chararray,
EMAIL:chararray,PHONE_NUMBER:chararray,HIRE_DATE:chararray,
JOB_ID:chararray,SALARY:int,MANAGER_ID:int,DEPARTMENT_ID:int);

DESCRIBE employees;

EMP_FILTERED = FILTER employees BY SALARY > 5000;

NESTED = FOREACH EMP_FILTERED GENERATE HIRE_DATE, SUBSTRING(HIRE_DATE,0,2) AS DAY,

SUBSTRING(HIRE_DATE,3,6) AS MONTH;

GRP = GROUP NESTED BY (MONTH, DAY);

M = FOREACH GRP GENERATE group, AVG(EMP_FILTERED.SALARY), COUNT(EMP_FILTERED.$0);

STORE employees INTO '/home/itelligence/Dataset/PigEmitted' USING PigStorage('~','-

schema');

==============================================================
JOIN

TRANSACTIONS = load 'data/transactions' using PigStorage('\t') as (id:int,

product:int, user:int, purchase_amount:double, description:chararray);

A = JOIN TRANSACTIONS by user LEFT OUTER, USERS by id;

B = GROUP A by product;

C = FOREACH B {
LOCS = DISTINCT A.location;
GENERATE group, COUNT(LOCS) as location_count;
};

DUMP C
==============================================================
WORDCOUNT

A = load '------/input.txt';
B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;
C = group B by word;
D = foreach C generate COUNT(B), group;
store D into '------/wordcount';

==============================================================
-- Problem Stmt : find the number of items bought by each customer
-- which item he/she bought highest time.
-- load the input data :: Schema ( customerId , itemId , order Date, delivery
Date );

orders = load '/testData100k' using PigStorage(',') as (cstrId:int, itmId:int,

orderDate: long, deliveryDate: long );

-- group by customer-id and item-id

grpd_cstr_itm = group orders by (cstrId,itmId);

grpd_cstr_itm_cnt = foreach grpd_cstr_itm generate group.cstrId as cstrId,
group.itmId as itmId, COUNT(orders) as itmCnt;

-- group by cstrId

grpd_cstr = group grpd_cstr_itm_cnt by cstrId ;

describe grpd_cstr;

-- grpd_cstr: {group: int,grpd_cstr_itm_cnt: {cstrId: int,itmId: int,itmCnt: long}}

-- iterate over grpd_cstr_itm and find total number of items bought by customer and
which item he/or she bought higest time.
result = foreach grpd_cstr{
total_orders = SUM(grpd_cstr_itm_cnt.itmCnt);
srtd_orders = order grpd_cstr_itm_cnt by itmCnt desc;
higest_bought = limit srtd_orders 1;
generate FLATTEN(higest_bought),total_orders as totalCnt;
};
-- result will contains ( customer_id , itm_id_bought_higest_times,
number_of_times_it_bought, total_items);
describe result;
-- result: {higest_bought::cstrId: int,higest_bought::itmId:
int,higest_bought::itmCnt: long,totalCnt: long}

==============================================================

Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction To Pig
67% (3)
Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction To Pig
34 pages
Argox PPLA&PPLB
No ratings yet
Argox PPLA&PPLB
96 pages
Pig Commands
No ratings yet
Pig Commands
9 pages
EMP1.txt (Id:int, Name:chararray, Dept:chararray, Salary:int)
No ratings yet
EMP1.txt (Id:int, Name:chararray, Dept:chararray, Salary:int)
2 pages
Pig Practicals
No ratings yet
Pig Practicals
4 pages
Sai PIG Practicals PDF
No ratings yet
Sai PIG Practicals PDF
6 pages
Lecture 18
No ratings yet
Lecture 18
20 pages
Pig Expt 5
No ratings yet
Pig Expt 5
4 pages
Module 4 - Pig
No ratings yet
Module 4 - Pig
65 pages
7 Ibiz Pig Workouts
No ratings yet
7 Ibiz Pig Workouts
7 pages
Pig Hive
No ratings yet
Pig Hive
58 pages
Pig Hive
No ratings yet
Pig Hive
59 pages
RTAP
No ratings yet
RTAP
38 pages
Apache Pig
No ratings yet
Apache Pig
61 pages
Apache Pig
No ratings yet
Apache Pig
28 pages
Pig_2
No ratings yet
Pig_2
63 pages
UNIT V
No ratings yet
UNIT V
30 pages
Lab 7
No ratings yet
Lab 7
2 pages
Unit IV EBDP 22
No ratings yet
Unit IV EBDP 22
97 pages
Pigex
No ratings yet
Pigex
3 pages
Apache PIG by Sravanthi
No ratings yet
Apache PIG by Sravanthi
31 pages
Unit 5 Lecture No-2(PIG)
No ratings yet
Unit 5 Lecture No-2(PIG)
94 pages
Unit 5 Lecture No-2(PIG)
No ratings yet
Unit 5 Lecture No-2(PIG)
101 pages
Hadoop Pig Presentation
No ratings yet
Hadoop Pig Presentation
33 pages
Demonstration: Understanding Pig: HDP Developer: Apache Pig and Hive
No ratings yet
Demonstration: Understanding Pig: HDP Developer: Apache Pig and Hive
26 pages
Unit 5
No ratings yet
Unit 5
16 pages
Pig Hive
No ratings yet
Pig Hive
72 pages
Introduction To Pig: SESSION 2016-2017
No ratings yet
Introduction To Pig: SESSION 2016-2017
44 pages
Session 3.3
No ratings yet
Session 3.3
30 pages
Extra
No ratings yet
Extra
3 pages
BigData Module 2
No ratings yet
BigData Module 2
41 pages
BDH_practical_08_29
No ratings yet
BDH_practical_08_29
3 pages
ABP W9-W10 Big Data Analytics Lab-PIG
No ratings yet
ABP W9-W10 Big Data Analytics Lab-PIG
11 pages
Default - Parallel: You Can Set The Number of Reducers For A Map Job by Passing Any Whole Number As A
No ratings yet
Default - Parallel: You Can Set The Number of Reducers For A Map Job by Passing Any Whole Number As A
6 pages
Pig
No ratings yet
Pig
12 pages
2023MCS320004 HEMANTH TARRA - Assignment -9
No ratings yet
2023MCS320004 HEMANTH TARRA - Assignment -9
4 pages
Apache Pig
100% (2)
Apache Pig
80 pages
Hadoop Week 5
No ratings yet
Hadoop Week 5
78 pages
Unit IV - Pig PDF
No ratings yet
Unit IV - Pig PDF
79 pages
exP 5,6
No ratings yet
exP 5,6
12 pages
Pig - Lab Demonstrations Explore!: Woha! Pig Is Supercool!
No ratings yet
Pig - Lab Demonstrations Explore!: Woha! Pig Is Supercool!
4 pages
Bda - Module Ii
No ratings yet
Bda - Module Ii
239 pages
Lab07-Apache Pig V1.01
No ratings yet
Lab07-Apache Pig V1.01
7 pages
BDA Unit-4-PPT
No ratings yet
BDA Unit-4-PPT
98 pages
BDC Final Record
No ratings yet
BDC Final Record
36 pages
Mortar Pig Cheat Sheet
50% (2)
Mortar Pig Cheat Sheet
13 pages
BIG DATA Module 2 FINAL SMI
No ratings yet
BIG DATA Module 2 FINAL SMI
44 pages
BDA Unit - IV
No ratings yet
BDA Unit - IV
81 pages
Pig Practical: Mcjjcbek/View?Usp Sharing
No ratings yet
Pig Practical: Mcjjcbek/View?Usp Sharing
10 pages
BDA-V
No ratings yet
BDA-V
10 pages
Apache PIG.pptx
No ratings yet
Apache PIG.pptx
41 pages
bc ca1,2
No ratings yet
bc ca1,2
31 pages
Pig Exercise 1
No ratings yet
Pig Exercise 1
10 pages
Creating and Managing Database
No ratings yet
Creating and Managing Database
2 pages
Bda Record 18071a0597-1
No ratings yet
Bda Record 18071a0597-1
28 pages
PIG Interview Qusetions
No ratings yet
PIG Interview Qusetions
15 pages
150+ C Pattern Programs
From Everand
150+ C Pattern Programs
Hernando Abella
No ratings yet
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Azure For Starters
From Everand
Azure For Starters
Chinmoy Mukherjee
No ratings yet
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
DSP Processor Fundamentals
No ratings yet
DSP Processor Fundamentals
58 pages
Unit V Functional and Logic Programs: Contents:-Language Specific Compilation: Object Oriented
No ratings yet
Unit V Functional and Logic Programs: Contents:-Language Specific Compilation: Object Oriented
91 pages
LO 1. Prepare Hand Tools and Equipment For Computer Systems Servicing
No ratings yet
LO 1. Prepare Hand Tools and Equipment For Computer Systems Servicing
7 pages
K8ssandra Workshop Feb 2021
No ratings yet
K8ssandra Workshop Feb 2021
80 pages
Unit 1 Windows Introduction: 105 Office Automation Tools
100% (1)
Unit 1 Windows Introduction: 105 Office Automation Tools
26 pages
Cs601 Final Term Solved Mcqs (Visit Vurank For More) : Which Is Not An Element of Protocol
No ratings yet
Cs601 Final Term Solved Mcqs (Visit Vurank For More) : Which Is Not An Element of Protocol
19 pages
INFORMATICA-Performance Tuning
100% (9)
INFORMATICA-Performance Tuning
21 pages
The Concept of Maximal Frequent Itemsets
No ratings yet
The Concept of Maximal Frequent Itemsets
18 pages
Melsec q 시리즈 이더넷 무수순통신
No ratings yet
Melsec q 시리즈 이더넷 무수순통신
24 pages
Data Scanning Devices
83% (6)
Data Scanning Devices
3 pages
Computer Science Paper 1 SL Markscheme
No ratings yet
Computer Science Paper 1 SL Markscheme
11 pages
XP-M8VM800 1001
No ratings yet
XP-M8VM800 1001
43 pages
Intranet Server Windows XAMPP
No ratings yet
Intranet Server Windows XAMPP
36 pages
Mini Project Report: Visual Applications of Sorting Algorithms
100% (2)
Mini Project Report: Visual Applications of Sorting Algorithms
25 pages
General Register Organization
100% (1)
General Register Organization
9 pages
Uafi Maxware 3.3 Guide
No ratings yet
Uafi Maxware 3.3 Guide
112 pages
735 IM SettingsSheets 20160628
No ratings yet
735 IM SettingsSheets 20160628
32 pages
Lecture 1
No ratings yet
Lecture 1
40 pages
Module1 ADBMS
No ratings yet
Module1 ADBMS
99 pages
Image Compression Unit 5
No ratings yet
Image Compression Unit 5
52 pages
Working With Infosets User Groups Query in Detail
No ratings yet
Working With Infosets User Groups Query in Detail
18 pages
Open Office Base
No ratings yet
Open Office Base
157 pages
Using Authselect On A Red Hat Enterprise Linux Host
100% (1)
Using Authselect On A Red Hat Enterprise Linux Host
40 pages
(7) Psv Eeprom 16f818
No ratings yet
(7) Psv Eeprom 16f818
32 pages
CH 05 - Dcc10e
No ratings yet
CH 05 - Dcc10e
55 pages
PHP Index
No ratings yet
PHP Index
3 pages
Parallel Databases
No ratings yet
Parallel Databases
11 pages
DBMS Notes
No ratings yet
DBMS Notes
12 pages
Businessnet File Format Description Sta
No ratings yet
Businessnet File Format Description Sta
11 pages

Pig

Uploaded by

Pig

Uploaded by

=================================================================

EMP = LOAD '/home/itelligence/Dataset/CSV/EMP.csv' USING PigStorage(',');

A= FOREACH EMP GENERATE $1,$2; ---------------FOR 1 and 2 COLUMN

DEFAULT DATATYPE IN PIG IS 'BYTEARRAY'

EMP = LOAD '/home/itelligence/Dataset/csv/EMP.csv' USING PigStorage(',') as

DESCRIBE B ----TO SEE THE SCHEMA

B = FILTER EMP BY SALARY>5000;

B =FILTER EMP BY SUBSTRING(FIRST_NAME,0,1) =='A' OR SUBSTRING(FIRST_NAME,0,1) =='a'

B = foreach EMP generate FIRST_NAME, SUBSTRING(FIRST_NAME,0,1) =='A'

PIG FUNCTION LIBRARY :

A = Load '/home/itelligence/orders.csv' using PigStorage(',');

B = FOREACH A generate $0,$17;

C= FOREACH A generate $0,$18;

D = FOREACH A generate $0,$19;

F = FOREACH E GENERATE $0, (INT)SUBSTRING(TRIM($1),1,20);

G = FILTER F BY $1 >200 AND $1<4000;

A = LOAD '/home/itelligence/Untitled Document 1' ;

B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL($0,'(.*) (.*),(.*):(.*)'));

STORE B INTO '/home/itelligence/sumeet' USING PigStorage(',');

EMP = LOAD '/home/itelligence/Dataset/CSV/EMP.csv' USING PigStorage(',');

A = GROUP EMP BY $9;

To find the number of employee in each department ($0) :

To find the avg salary($7) of employee in each department ($0) :

A = FILTER LOG BY SUBSTRING($0,0,5) == 'ERROR';

B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL($0,'ERROR (.*):(.*)'));

EMP = LOAD '/home/itelligence/Dataset/CSV/EMP.csv' USING PigStorage (',') AS ();

CG = COGROUP EMP BY $9,DEPT BY $0;

A = FOREACH CG GENERATE $0,COUNT($1),COUNT($2);

IN ORDER TO LOAD THE JSON FILE WE USE JSONLOADER INSTEAD OF PigStorage

A = LOAD '<JSONFILE PATH>' USING JsonLoader;

now to tell pig about udf location :

A = LOAD '/home/itelligence/hadoop-1.2.0/conf/hdfs-site.xml' USING

here \\s* is used to remove the spaces(multiple spaces)d

pig -x local --param PATH=/home/itelligence/Desktop/yatra.txt

FROM GRUNT SHELL:

A = LOAD 'FILE PATH';

B = FOREACH A GENERATE FLATTEN(TOKENIZE($0));

B1 = FOREACH B GENERATE REPLACE ($0,'[^a-zA-Z0-9]','');

---------------this is the generalised expression to replace all the special

D = FOREACH C GENERATE $0, COUNT($1);

employees = LOAD '/home/itelligence/Dataset/CSV/EMP.csv' USING PigStorage(',') AS

EMP_FILTERED = FILTER employees BY SALARY > 5000;

NESTED = FOREACH EMP_FILTERED GENERATE HIRE_DATE, SUBSTRING(HIRE_DATE,0,2) AS DAY,

GRP = GROUP NESTED BY (MONTH, DAY);

M = FOREACH GRP GENERATE group, AVG(EMP_FILTERED.SALARY), COUNT(EMP_FILTERED.$0);

STORE employees INTO '/home/itelligence/Dataset/PigEmitted' USING PigStorage('~','-

TRANSACTIONS = load 'data/transactions' using PigStorage('\t') as (id:int,

A = JOIN TRANSACTIONS by user LEFT OUTER, USERS by id;

orders = load '/testData100k' using PigStorage(',') as (cstrId:int, itmId:int,

-- group by customer-id and item-id

grpd_cstr_itm = group orders by (cstrId,itmId);

grpd_cstr = group grpd_cstr_itm_cnt by cstrId ;

-- grpd_cstr: {group: int,grpd_cstr_itm_cnt: {cstrId: int,itmId: int,itmCnt: long}}

You might also like

B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL($0,'(.) (.),(.):(.)'));

B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL($0,'ERROR (.):(.)'));