0% found this document useful (0 votes)
11 views

BDH_practical_08_29

The document outlines the practical lab work for Semester 7 students at Yeshwantrao Chavan College of Engineering, focusing on Apache Pig operations for big data analytics. Key operations include loading and storing data, aggregation, filtering, and joining datasets, which facilitate effective data processing in Hadoop environments. The conclusion emphasizes the efficiency of Apache Pig in managing large-scale data workflows and enhancing data analysis capabilities.

Uploaded by

Badass Aman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

BDH_practical_08_29

The document outlines the practical lab work for Semester 7 students at Yeshwantrao Chavan College of Engineering, focusing on Apache Pig operations for big data analytics. Key operations include loading and storing data, aggregation, filtering, and joining datasets, which facilitate effective data processing in Hadoop environments. The conclusion emphasizes the efficiency of Apache Pig in managing large-scale data workflows and enhancing data analysis capabilities.

Uploaded by

Badass Aman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Yeshwantrao Chavan College of Engineering

Department of Artificial Intelligence & Data Science


Semester 7 Section: I Session: Odd 2024-25
Course: Lab: Big Data Hadoop Lab
Practical No: 08

Aim: Pig Operations: Load & Store Data, Aggregation Operations, Filtering Data and
Joining Datasets.

Theory:
Apache Pig offers a powerful scripting language, Pig Latin, to handle data transformation and
analysis on large datasets. Pig operations simplify the processing of structured and
unstructured data. Key operations include:
1. Loading and Storing Data: Pig allows data to be loaded from various sources such
as HDFS, local files, or other storage systems, and it can store the output back into
HDFS or other destinations.
2. Aggregation Operations: These include functions like COUNT, SUM, AVG, and
GROUP, which help in summarizing and aggregating large data sets.
3. Filtering Data: Pig supports filtering operations, allowing users to extract a subset of
data based on specific conditions or expressions.
4. Joining Datasets: Pig allows joining multiple datasets on a common field or key,
facilitating complex data analysis tasks.
These operations provide flexibility in managing big data pipelines, making Apache Pig a
versatile tool for big data analytics.

Implementation:
• Load Data into Pig:
data = LOAD 'hdfs://path/to/data.csv' USING PigStorage(',') AS (field1:datatype,
field2:datatype, ...);
• Store Data in HDFS:
• STORE data INTO 'hdfs://path/to/output' USING PigStorage(',');

Aggregation Operations:
• GROUP: Group records based on a common field for aggregation.
• grouped_data = GROUP employee_data BY department;

COUNT: Count the number of records per group.


count_data = FOREACH grouped_data GENERATE group, COUNT(employee_data);
SUM, AVG: Calculate the sum or average of a particular field.
salary_sum = FOREACH grouped_data GENERATE group, SUM(employee_data.salary);
avg_salary = FOREACH grouped_data GENERATE group, AVG(employee_data.salary);

Conclusion:
The operations provided by Apache Pig, such as loading and storing data, performing
aggregation, filtering datasets, and joining multiple datasets, make it a highly effective tool
for managing large-scale data processing in a distributed Hadoop environment. Pig’s ability
to execute complex data manipulations with simple scripts accelerates data analytics
workflows, enabling developers and analysts to gain insights quickly and efficiently from big
data. The integration of these operations into your data pipeline can significantly streamline
the processing of massive datasets.
Submitted to: Mrs. Kiran Khandare Ma’am
Name: Aman Raut Roll No.: 29 Reg. No.: 21071360

You might also like