BigData2
BigData2
Content
BIG DATA
• Introduction
GROUP ASSIGNMENT 2 • Difference Between Traditional B/t Pig
• Pig Latin: A High-Level Language for Data Flow
• Example Pig Latin Script
• User-Defined Functions (UDFs)
SUBMITTED TO: SUBMITTED BY: • Conclusion
Mr. Shivam Bharadwaj Devi Prasanna Pati
(Assistant professor)
Diksha Singh
Divyanshu Singh
Introduction
Difference Between Traditional B/t Pig
• Pig is a high-level platform for parallel computation on large datasets. Feature Traditional Databases Pig
It is designed to make it easier to analyze large datasets that reside in
HDFS. Pig's programming language, Pig Latin, is similar to SQL, making Data Storage Primarily in-memory or on disk Distributed file system (HDFS)
it easier to learn for programmers who are already familiar with
relational databases. Data Processing Primarily single-node processing Distributed processing across a cluster
residing in distributed file systems like Hadoop. Performance High for small to medium datasets High for large datasets
09-12-2024
Conclusion
Example UDF Usage
REGISTER 'myudf.jar'; • Pig is a powerful tool for analyzing large datasets. Its high-level
A = LOAD 'data.txt' AS (name:chararray); language, Pig Latin, and support for UDFs make it easy to express
complex data transformations. By understanding the key concepts of
B = FOREACH A GENERATE MyUDF(name); Pig Latin and how to use UDFs, you can leverage Pig's power to gain
insights from your data.