0% found this document useful (0 votes)
2 views

BigData2

The document discusses Big Data and focuses on Pig, a high-level platform for parallel computation on large datasets, particularly in Hadoop's distributed file system. It explains the differences between traditional databases and Pig, introduces Pig Latin as a declarative language for data transformations, and provides examples of its usage, including user-defined functions (UDFs). The conclusion emphasizes Pig's effectiveness in analyzing large datasets and the ease of expressing complex transformations using Pig Latin.

Uploaded by

Saumya Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

BigData2

The document discusses Big Data and focuses on Pig, a high-level platform for parallel computation on large datasets, particularly in Hadoop's distributed file system. It explains the differences between traditional databases and Pig, introduces Pig Latin as a declarative language for data transformations, and provides examples of its usage, including user-defined functions (UDFs). The conclusion emphasizes Pig's effectiveness in analyzing large datasets and the ease of expressing complex transformations using Pig Latin.

Uploaded by

Saumya Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

09-12-2024

Content
BIG DATA
• Introduction
GROUP ASSIGNMENT 2 • Difference Between Traditional B/t Pig
• Pig Latin: A High-Level Language for Data Flow
• Example Pig Latin Script
• User-Defined Functions (UDFs)
SUBMITTED TO: SUBMITTED BY: • Conclusion
Mr. Shivam Bharadwaj Devi Prasanna Pati
(Assistant professor)
Diksha Singh
Divyanshu Singh

Introduction
Difference Between Traditional B/t Pig
• Pig is a high-level platform for parallel computation on large datasets. Feature Traditional Databases Pig
It is designed to make it easier to analyze large datasets that reside in
HDFS. Pig's programming language, Pig Latin, is similar to SQL, making Data Storage Primarily in-memory or on disk Distributed file system (HDFS)
it easier to learn for programmers who are already familiar with
relational databases. Data Processing Primarily single-node processing Distributed processing across a cluster

• In the era of big data, efficiently processing and analyzing massive


datasets is crucial. Traditional databases often struggle to handle the Data Scale Limited by available memory and disk space Can handle massive datasets

scale and complexity of modern data. Enter Pig, a high-level platform


designed specifically for parallel computation on large datasets Query Language SQL Pig Latin

residing in distributed file systems like Hadoop. Performance High for small to medium datasets High for large datasets
09-12-2024

Pig Latin: A High-Level Language for Data Flow


Pig Latin: A High-Level Language for Data Flow
• Pig Latin is a high-level language for expressing data transformations on
large datasets. It provides a declarative way to express data flow, allowing • FILTER: Selects tuples from a relation that satisfy a given condition.
users to focus on the logic of their analysis rather than the low-level details • FOREACH: Applies a function to each tuple in a relation.
of distributed computation. • JOIN: Joins two relations based on a common key.
Key Concepts in Pig Latin:- • GROUP: Groups tuples in a relation based on a key.
• Relations: A relation is a named collection of tuples, where each tuple is an • DUMP: Displays the contents of a relation
ordered list of values.
Scripts: A Pig script is a sequence of Pig Latin statements
• Operators: Operators are functions that transform relations into new
relations. Common operators include: that define a data flow graph.
1. LOAD: Loads data from a file into a relation.
2. STORE: Stores a relation to a file.

Example Pig Latin Script User-Defined Functions (UDFs)


• UDFs allow users to extend Pig's functionality by defining custom functions
A = LOAD 'data.txt' AS (name:chararray, age:int); that can be used in Pig Latin scripts. UDFs can be written in Java or Python
B = FILTER A BY age > 18; • Example UDF (Java)
C = FOREACH B GENERATE name, age * 2; public class MyUDF extends EvalFunc<String> {
public String exec(Tuple input) throws IOException {
STORE C INTO 'output.txt'; if (input == null || input.size() == 0) {
return null;
}
String str = (String) input.get(0);
return str.toUpperCase();
}
}
09-12-2024

Conclusion
Example UDF Usage
REGISTER 'myudf.jar'; • Pig is a powerful tool for analyzing large datasets. Its high-level
A = LOAD 'data.txt' AS (name:chararray); language, Pig Latin, and support for UDFs make it easy to express
complex data transformations. By understanding the key concepts of
B = FOREACH A GENERATE MyUDF(name); Pig Latin and how to use UDFs, you can leverage Pig's power to gain
insights from your data.

You might also like