SlideShare a Scribd company logo
PIG ScriptingMaking Pig Turing-complete through embedding in a scripting languageJulien Le Dem - YahooOverviewPig execution model - Map/Reduce: a solution to process big data.  - Pig: makes it easier to manipulate data on the grid.  - Pig scripting: makes Pig easier with iterative algorithms and User Defined Functions.Jiras: PIG-1479, PIG-1794 (in 0.9), PIG-928 (in 0.8)Example: Transitive closure - Iterative process: requires a loop and a termination condition - Requires multiple join/group by: typical Pig usage - Requires User Defined FunctionsSolution 2: using Pig scriptingSolution 1: plain Pig1 file required:7 files required:UDFs take the elements of the tuple as parameters, not a tuple.The output schema is specified using a decorator (it can be a function if you need to manipulate the input schema)Modification flow:Modification flow:UDFs use standard Python constructs (tuple/list/dictionary) automatically converted to Pig.Python functions are automatically available as UDFsEmbedded Pig calls in PythonPython variables can be used in the pig scripts $n, $i, ...Unfortunately this solution does not fit here.It requires 7 artifacts: 3 Java UDFs: 3 classes must be compiled and packaged in a jar. The average UDF size is 50 to 100 lines of code.
Ad

More Related Content

What's hot (9)

Everything You Always Wanted to Know About Memory in Python - But Were Afraid...
Everything You Always Wanted to Know About Memory in Python - But Were Afraid...Everything You Always Wanted to Know About Memory in Python - But Were Afraid...
Everything You Always Wanted to Know About Memory in Python - But Were Afraid...
Piotr Przymus
 
Everything You Always Wanted to Know About Memory in Python But Were Afraid t...
Everything You Always Wanted to Know About Memory in Python But Were Afraid t...Everything You Always Wanted to Know About Memory in Python But Were Afraid t...
Everything You Always Wanted to Know About Memory in Python But Were Afraid t...
Piotr Przymus
 
Programming with Python - Adv.
Programming with Python - Adv.Programming with Python - Adv.
Programming with Python - Adv.
Mosky Liu
 
Boost.Python: C++ and Python Integration
Boost.Python: C++ and Python IntegrationBoost.Python: C++ and Python Integration
Boost.Python: C++ and Python Integration
GlobalLogic Ukraine
 
오픈소스 라이브러리 개발기
오픈소스 라이브러리 개발기오픈소스 라이브러리 개발기
오픈소스 라이브러리 개발기
겨울 정
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
Piotr Przymus
 
The Benefits of Type Hints
The Benefits of Type HintsThe Benefits of Type Hints
The Benefits of Type Hints
masahitojp
 
Using MPI
Using MPIUsing MPI
Using MPI
Kazuki Ohta
 
Open source projects with python
Open source projects with pythonOpen source projects with python
Open source projects with python
roskakori
 
Everything You Always Wanted to Know About Memory in Python - But Were Afraid...
Everything You Always Wanted to Know About Memory in Python - But Were Afraid...Everything You Always Wanted to Know About Memory in Python - But Were Afraid...
Everything You Always Wanted to Know About Memory in Python - But Were Afraid...
Piotr Przymus
 
Everything You Always Wanted to Know About Memory in Python But Were Afraid t...
Everything You Always Wanted to Know About Memory in Python But Were Afraid t...Everything You Always Wanted to Know About Memory in Python But Were Afraid t...
Everything You Always Wanted to Know About Memory in Python But Were Afraid t...
Piotr Przymus
 
Programming with Python - Adv.
Programming with Python - Adv.Programming with Python - Adv.
Programming with Python - Adv.
Mosky Liu
 
Boost.Python: C++ and Python Integration
Boost.Python: C++ and Python IntegrationBoost.Python: C++ and Python Integration
Boost.Python: C++ and Python Integration
GlobalLogic Ukraine
 
오픈소스 라이브러리 개발기
오픈소스 라이브러리 개발기오픈소스 라이브러리 개발기
오픈소스 라이브러리 개발기
겨울 정
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
Piotr Przymus
 
The Benefits of Type Hints
The Benefits of Type HintsThe Benefits of Type Hints
The Benefits of Type Hints
masahitojp
 
Open source projects with python
Open source projects with pythonOpen source projects with python
Open source projects with python
roskakori
 

Viewers also liked (8)

Embedding Pig in scripting languages
Embedding Pig in scripting languagesEmbedding Pig in scripting languages
Embedding Pig in scripting languages
Julien Le Dem
 
Data Eng Conf NY Nov 2016 Parquet Arrow
Data Eng Conf NY Nov 2016 Parquet ArrowData Eng Conf NY Nov 2016 Parquet Arrow
Data Eng Conf NY Nov 2016 Parquet Arrow
Julien Le Dem
 
Reducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networksReducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networks
Hakky St
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
Jen Aman
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
Julien Le Dem
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 
If you have your own Columnar format, stop now and use Parquet 😛
If you have your own Columnar format,  stop now and use Parquet  😛If you have your own Columnar format,  stop now and use Parquet  😛
If you have your own Columnar format, stop now and use Parquet 😛
Julien Le Dem
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 
Embedding Pig in scripting languages
Embedding Pig in scripting languagesEmbedding Pig in scripting languages
Embedding Pig in scripting languages
Julien Le Dem
 
Data Eng Conf NY Nov 2016 Parquet Arrow
Data Eng Conf NY Nov 2016 Parquet ArrowData Eng Conf NY Nov 2016 Parquet Arrow
Data Eng Conf NY Nov 2016 Parquet Arrow
Julien Le Dem
 
Reducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networksReducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networks
Hakky St
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
Jen Aman
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
Julien Le Dem
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 
If you have your own Columnar format, stop now and use Parquet 😛
If you have your own Columnar format,  stop now and use Parquet  😛If you have your own Columnar format,  stop now and use Parquet  😛
If you have your own Columnar format, stop now and use Parquet 😛
Julien Le Dem
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 
Ad

Similar to Poster Hadoop summit 2011: pig embedding in scripting languages (20)

Multithreaded_Programming_in_Python.pdf
Multithreaded_Programming_in_Python.pdfMultithreaded_Programming_in_Python.pdf
Multithreaded_Programming_in_Python.pdf
giridharsripathi
 
Multithreading by rj
Multithreading by rjMultithreading by rj
Multithreading by rj
Shree M.L.Kakadiya MCA mahila college, Amreli
 
Introduction to Chainer: A Flexible Framework for Deep Learning
Introduction to Chainer: A Flexible Framework for Deep LearningIntroduction to Chainer: A Flexible Framework for Deep Learning
Introduction to Chainer: A Flexible Framework for Deep Learning
Seiya Tokui
 
concurrency
concurrencyconcurrency
concurrency
Jonathan Wagoner
 
25 must know python for Interview by Tutort Academy
25 must know python for Interview by Tutort Academy25 must know python for Interview by Tutort Academy
25 must know python for Interview by Tutort Academy
yashikanigam1
 
pythontraining-201jn026043638.pptx
pythontraining-201jn026043638.pptxpythontraining-201jn026043638.pptx
pythontraining-201jn026043638.pptx
RohitKumar639388
 
Python training
Python trainingPython training
Python training
Kunalchauhan76
 
Unit-5 [Pig] working and architecture.pptx
Unit-5 [Pig] working and architecture.pptxUnit-5 [Pig] working and architecture.pptx
Unit-5 [Pig] working and architecture.pptx
tripathineeharika
 
Multicore
MulticoreMulticore
Multicore
Birgit Plötzeneder
 
Pacman game computer investigatory project
Pacman game computer investigatory projectPacman game computer investigatory project
Pacman game computer investigatory project
meenaloshiniG
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
Siddharth Mathur
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Viswanath Gangavaram
 
Unit V.pdf
Unit V.pdfUnit V.pdf
Unit V.pdf
KennyPratheepKumar
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
Siddharth Mathur
 
Building A Linux Cluster Using Raspberry PI #2!
Building A Linux Cluster Using Raspberry PI #2!Building A Linux Cluster Using Raspberry PI #2!
Building A Linux Cluster Using Raspberry PI #2!
A Jorge Garcia
 
PHYTON-REPORT.pdf
PHYTON-REPORT.pdfPHYTON-REPORT.pdf
PHYTON-REPORT.pdf
PraveenKumar640562
 
Debugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDBDebugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDB
bmbouter
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
Tulipp. Eu
 
W-334535VBE242 Using Python Libraries.pdf
W-334535VBE242 Using Python Libraries.pdfW-334535VBE242 Using Python Libraries.pdf
W-334535VBE242 Using Python Libraries.pdf
manassingh1509
 
unit-4-apache pig-.pdf
unit-4-apache pig-.pdfunit-4-apache pig-.pdf
unit-4-apache pig-.pdf
ssuser92282c
 
Multithreaded_Programming_in_Python.pdf
Multithreaded_Programming_in_Python.pdfMultithreaded_Programming_in_Python.pdf
Multithreaded_Programming_in_Python.pdf
giridharsripathi
 
Introduction to Chainer: A Flexible Framework for Deep Learning
Introduction to Chainer: A Flexible Framework for Deep LearningIntroduction to Chainer: A Flexible Framework for Deep Learning
Introduction to Chainer: A Flexible Framework for Deep Learning
Seiya Tokui
 
25 must know python for Interview by Tutort Academy
25 must know python for Interview by Tutort Academy25 must know python for Interview by Tutort Academy
25 must know python for Interview by Tutort Academy
yashikanigam1
 
pythontraining-201jn026043638.pptx
pythontraining-201jn026043638.pptxpythontraining-201jn026043638.pptx
pythontraining-201jn026043638.pptx
RohitKumar639388
 
Unit-5 [Pig] working and architecture.pptx
Unit-5 [Pig] working and architecture.pptxUnit-5 [Pig] working and architecture.pptx
Unit-5 [Pig] working and architecture.pptx
tripathineeharika
 
Pacman game computer investigatory project
Pacman game computer investigatory projectPacman game computer investigatory project
Pacman game computer investigatory project
meenaloshiniG
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
Siddharth Mathur
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Viswanath Gangavaram
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
Siddharth Mathur
 
Building A Linux Cluster Using Raspberry PI #2!
Building A Linux Cluster Using Raspberry PI #2!Building A Linux Cluster Using Raspberry PI #2!
Building A Linux Cluster Using Raspberry PI #2!
A Jorge Garcia
 
Debugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDBDebugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDB
bmbouter
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
Tulipp. Eu
 
W-334535VBE242 Using Python Libraries.pdf
W-334535VBE242 Using Python Libraries.pdfW-334535VBE242 Using Python Libraries.pdf
W-334535VBE242 Using Python Libraries.pdf
manassingh1509
 
unit-4-apache pig-.pdf
unit-4-apache pig-.pdfunit-4-apache pig-.pdf
unit-4-apache pig-.pdf
ssuser92282c
 
Ad

More from Julien Le Dem (19)

Data and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineageData and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineage
Julien Le Dem
 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & Marquez
Julien Le Dem
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
Julien Le Dem
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
Julien Le Dem
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020
Julien Le Dem
 
Strata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed databaseStrata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed database
Julien Le Dem
 
From flat files to deconstructed database
From flat files to deconstructed databaseFrom flat files to deconstructed database
From flat files to deconstructed database
Julien Le Dem
 
Strata NY 2017 Parquet Arrow roadmap
Strata NY 2017 Parquet Arrow roadmapStrata NY 2017 Parquet Arrow roadmap
Strata NY 2017 Parquet Arrow roadmap
Julien Le Dem
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
Julien Le Dem
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
Julien Le Dem
 
Mule soft mar 2017 Parquet Arrow
Mule soft mar 2017 Parquet ArrowMule soft mar 2017 Parquet Arrow
Mule soft mar 2017 Parquet Arrow
Julien Le Dem
 
Strata NY 2016: The future of column-oriented data processing with Arrow and ...
Strata NY 2016: The future of column-oriented data processing with Arrow and ...Strata NY 2016: The future of column-oriented data processing with Arrow and ...
Strata NY 2016: The future of column-oriented data processing with Arrow and ...
Julien Le Dem
 
Strata London 2016: The future of column oriented data processing with Arrow ...
Strata London 2016: The future of column oriented data processing with Arrow ...Strata London 2016: The future of column oriented data processing with Arrow ...
Strata London 2016: The future of column oriented data processing with Arrow ...
Julien Le Dem
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
Julien Le Dem
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
Julien Le Dem
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Julien Le Dem
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
Julien Le Dem
 
Parquet Twitter Seattle open house
Parquet Twitter Seattle open houseParquet Twitter Seattle open house
Parquet Twitter Seattle open house
Julien Le Dem
 
Parquet overview
Parquet overviewParquet overview
Parquet overview
Julien Le Dem
 
Data and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineageData and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineage
Julien Le Dem
 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & Marquez
Julien Le Dem
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
Julien Le Dem
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
Julien Le Dem
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020
Julien Le Dem
 
Strata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed databaseStrata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed database
Julien Le Dem
 
From flat files to deconstructed database
From flat files to deconstructed databaseFrom flat files to deconstructed database
From flat files to deconstructed database
Julien Le Dem
 
Strata NY 2017 Parquet Arrow roadmap
Strata NY 2017 Parquet Arrow roadmapStrata NY 2017 Parquet Arrow roadmap
Strata NY 2017 Parquet Arrow roadmap
Julien Le Dem
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
Julien Le Dem
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
Julien Le Dem
 
Mule soft mar 2017 Parquet Arrow
Mule soft mar 2017 Parquet ArrowMule soft mar 2017 Parquet Arrow
Mule soft mar 2017 Parquet Arrow
Julien Le Dem
 
Strata NY 2016: The future of column-oriented data processing with Arrow and ...
Strata NY 2016: The future of column-oriented data processing with Arrow and ...Strata NY 2016: The future of column-oriented data processing with Arrow and ...
Strata NY 2016: The future of column-oriented data processing with Arrow and ...
Julien Le Dem
 
Strata London 2016: The future of column oriented data processing with Arrow ...
Strata London 2016: The future of column oriented data processing with Arrow ...Strata London 2016: The future of column oriented data processing with Arrow ...
Strata London 2016: The future of column oriented data processing with Arrow ...
Julien Le Dem
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
Julien Le Dem
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
Julien Le Dem
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Julien Le Dem
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
Julien Le Dem
 
Parquet Twitter Seattle open house
Parquet Twitter Seattle open houseParquet Twitter Seattle open house
Parquet Twitter Seattle open house
Julien Le Dem
 

Recently uploaded (20)

Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 

Poster Hadoop summit 2011: pig embedding in scripting languages

  • 1. PIG ScriptingMaking Pig Turing-complete through embedding in a scripting languageJulien Le Dem - YahooOverviewPig execution model - Map/Reduce: a solution to process big data. - Pig: makes it easier to manipulate data on the grid. - Pig scripting: makes Pig easier with iterative algorithms and User Defined Functions.Jiras: PIG-1479, PIG-1794 (in 0.9), PIG-928 (in 0.8)Example: Transitive closure - Iterative process: requires a loop and a termination condition - Requires multiple join/group by: typical Pig usage - Requires User Defined FunctionsSolution 2: using Pig scriptingSolution 1: plain Pig1 file required:7 files required:UDFs take the elements of the tuple as parameters, not a tuple.The output schema is specified using a decorator (it can be a function if you need to manipulate the input schema)Modification flow:Modification flow:UDFs use standard Python constructs (tuple/list/dictionary) automatically converted to Pig.Python functions are automatically available as UDFsEmbedded Pig calls in PythonPython variables can be used in the pig scripts $n, $i, ...Unfortunately this solution does not fit here.It requires 7 artifacts: 3 Java UDFs: 3 classes must be compiled and packaged in a jar. The average UDF size is 50 to 100 lines of code.
  • 2. 3 Pig scripts in 3 separate files: init, main loop content, finalization. The average Pig script size is 5 to 10 lines.
  • 3. Main program: executes and coordinates the Pig scripts. It contains about 50 lines of code.This solution does fit on the poster.It requires 1 artifact, containing: 3 UDFs provided as functions: The average UDF size is 8 to 12 lines of code.
  • 4. 3 Pig queries: init, main loop content, finalization. Each Pig query adds an average 5 to 10 lines.
  • 5. Main function: executes and coordinates the Pig scripts. It adds about 10 lines to the script.Access to the output of pig scripts