SlideShare a Scribd company logo
A Homomorphism-based Framework for
Systematic Parallel Programming with MapReduce
Yu Liu1, Zhenjiang Hu2
1 The Graduate University for Advanced Studies,Tokyo, Japan
yuliu@nii.ac.jp
2 National Institute of Informatics,Tokyo, Japan
2 hu@nii.ac.jp
Mar. 10th, 2011
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Background
MapReduce
Google’s MapReduce is a popular parallel-distributed programming
model, for processing large data sets. It has been the de facto
standard for large scale data analysis.
Concepts from functional programming languages
Automatic parallel processing, fault tolerance
Good scalability
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
MapReduce
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
MapReduce
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
MapReduce
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
MapReduce
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
MapReduce
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Programming with MapReduce
A user has to
design a D&C algorithm that fits MapReduce paradigm
map this algorithm to MapReduce.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Programming with MapReduce
A user has to
design a D&C algorithm that fits MapReduce paradigm
map this algorithm to MapReduce.
Difficulties of programming with MapReduce
How to resolve the constrains on computing order.
How to resolve the data dependency.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Example
The Maximum Prefix Sum problem
mps [3, −1, 4, −1, −5, 9, 2, −6, 5, −10] = 11
A sequential program for MPS in O(n) time
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Example
The Maximum Prefix Sum problem
mps [3, −1, 4, −1, −5, 9, 2, −6, 5, −10] = 11
Hard to compute MPS with MapReduce
Computation has order.
MPS of sub-lists cannot be conquered directly.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Questions
Is there a systematic way to resolving such problems with
MapReduce ?
How to handle the problems with district order ?
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Questions
Is there a systematic way to resolving such problems with
MapReduce ?
How to handle the problems with district order ?
How to systematically design the divide-and-conquer
algorithm ?
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Motivation and objective
We propose a systematic approach to automatically generate fully
parallelized and scalable MapReduce programs.
A new framework which provides algorithmic programming
interfaces has been implemented.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
A systematic approach for programming with MapReduce
Firstly, derive a function h.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
A systematic approach for programming with MapReduce
Then write a inverse function h◦.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
A systematic approach for programming with MapReduce
D&C algorithm can be gotten.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
A systematic approach for programming with MapReduce
Map it to MapReduce paradigm.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
A systematic approach for programming with MapReduce
Parallelization is in a black box.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
A systematic approach for programming with MapReduce
Implemented by multi-phases MapReduce processing.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Conditions of this f function
Theorem
If there exists a binary operator such that
f (xs ++ ys) = f xs f ys
then such can be defined as :
x y = f (f ◦x ++ f ◦x)
where ++ islistconcatenation.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Iff a function can be defined both rightwards and leftwards, such
exists. We can derive a divide-and-conquer algorithm like this:
Divide-and-conquer
f (xs ++ ys) = f (f ◦
(f xs) ++ f ◦
(f ys))
Such functions are so called: homomorphisms.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Programming Interface
Fold and unfold
fold :: [α] → β
unfold :: β → [α].
The implementation in Java
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
A function which computes MPS and its right inverse can be
written as followings:
fold xs = mps sum xs
unfold (m, s) = [m, s − m]
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
The computation inside framework
Use fold and unfold functions doing the computation:
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Autonomous intermediate data
Each record of the intermediate data has the information of
position, thus the distribution of data is indifferent.
< id, val > → << parId, id >, val >
By taking use of sorting and grouping mechanism of MapReduce
framework, lists can be reconstructed when necessary.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
A formal definitation
homMR
homMR :: (α → β) → (β → β → β) → {(ID, α)} → β
homMR f (⊕) = getValue ◦ MapReduce mapper2 reducer2
◦ MapReduce mapper1 reducer1
where
mapper1 :: (ID, α) → [((PID, ID), α)]
mapper1 (i, a) = [(pid, i), a))]
where pid = makePid i
reducer1 :: ((PID, ID), [α]) → ((PID, ID), β)
reducer1 ((pid, j), ias) = ((pid, j), hom f (⊕) ias)
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
continued
mapper2 :: ((PID, ID), β) → ((PID, ID), β)
mapper2 ((pid, j), b) = ((c0, j), b)
where c0 is a predefined constant pid
reducer2 :: ((PID, ID), [β]) → ((PID, ID), β)
reducer2 ((c0, k), jbs) = ((c0, k), hom f (⊕) jbs)
getValue :: ((PID, ID), β) → β
getValue ((c0, k), c) = c
Where, hom f (⊕) denotes a sequential version of ([f , ⊕]).
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Actual user-program for MPS
http://screwdriver.googlecode.com
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Performance evaluation
Environment: hardware
We configured clusters with 2, 4, 8, and 16 nodes. Each
computing/data node has two Xeon CPUs (Nocona, single-core,
2.8 GHz), 2 GB memory. The nodes are connected with Gigabit
Ethernet.
Environment: software
Linux2.6.26 ,Hadoop 0.21.0 +HDFS
Hadoop configuration: heap size= 1024MB
maximum mapper per node: 2
maximum reducer per node: 1
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Test cases
We implemented several programs for three problems on our
framework and Hadoop:
1 the maximum-prefix-sum problem.
MPS-lh is implemented using our framework’ API.
MPS-mr is implemented by Hadoop API.
2 parallel sum of 64-bit integers
SUM-lh is implemented by our framework’ API.
SUM-mr is implemented by Hadoop API.
3 VAR-lh computes the variance of 32-bit floating-point
numbers;
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Test cases
Test data
100 million 64-bit integers (2.87 GB) for MPS, SUM.
100 million 32-bit floating-point numbers (2.76 GB) for VAR.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Performance
The experiment results are summarized :
With 16 nodes speedup of all cases are more than 7.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Performance
Time curves:
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Concluding remarks
In this research:
Introduced a systematic way of parallel programming on
MapReduce.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Concluding remarks
In this research:
Introduced a systematic way of parallel programming on
MapReduce.
Developed a framework on top of Hadoop.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Concluding remarks
In this research:
Introduced a systematic way of parallel programming on
MapReduce.
Developed a framework on top of Hadoop.
Algorithmic programming interfaces let user can focus on the
algebraic properties of problem.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Concluding remarks
In this research:
Introduced a systematic way of parallel programming on
MapReduce.
Developed a framework on top of Hadoop.
Algorithmic programming interfaces let user can focus on the
algebraic properties of problem.
Details of MapReduce are hidden.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Concluding remarks
In this research:
Introduced a systematic way of parallel programming on
MapReduce.
Developed a framework on top of Hadoop.
Algorithmic programming interfaces let user can focus on the
algebraic properties of problem.
Details of MapReduce are hidden.
Achieved good scalability and parallelism.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Concluding remarks
In this research:
Introduced a systematic way of parallel programming on
MapReduce.
Developed a framework on top of Hadoop.
Algorithmic programming interfaces let user can focus on the
algebraic properties of problem.
Details of MapReduce are hidden.
Achieved good scalability and parallelism.
Automatic optimization can be equipped.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Future work
Decrease the system overhead and do more optimization.
Extend to more complex data structure such as tree and
graph.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Related work
Parallel programming with list homomorphisms (M.Cole 95)
The Third Homomorphism Theorem(J.Gibbons 96).
Systematic extraction and implementation of
divide-and-conquer parallelism (Gorlatch PLILP96).
Automatic inversion generates divide-and-conquer parallel
programs(Morita et.al., PLDI07).
The third homomorphism theorem on trees: downward &
upward lead to divide-and-conquer (Morihata, POPL09)
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Thank you very much.
Questions?
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
List Homomorphism
Function h is said to be a list homomorphism
If there are a function f and an associative operator such that
for any list x and list y
h [a] = f a
h (x ++ y) = h(x) h(y).
Where ++ is the list concatenation.
Instance of a list homomorphism
sum [a] = a
sum (x ++ y) = sum x + sum y.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Theorem (The Third Homomorphism Theorem (Gibbons,96) )
Let h be a given function and and be binary operators. If the
following two equations hold for any element a and list y
h ([a] ++ y) = a h y
h (y ++ [a]) = h y a
then the function h is a homomorphism.
In fact, for a function h, if we have one of its right inverse h◦ that
satisfies h ◦ h◦ ◦ h = h, then we can obtain the list-homomorphic
definition as follows.
h = ([f , ]) where
f a = h [a]
l r = h (h◦ l ++ h◦ r)
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
MapReduce programs can be automatically obtained by
two sequential functions
homomorphism ([f , ⊕])
f :: a → b
⊕ :: b → b → b
(a ⊕ b) ⊕ c = a ⊕ (b ⊕ c).
fold and unfold, that compose leftwards and rightwards functions
fold([a] ++ x) = fold([a] ++ unfold(fold(x)))
fold(x ++ [a]) = fold(unfold(fold(x)) ++ [a]).
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Currently, Screwdriver provides two kinds of programming
interfaces:
Programming interface corresponding to definition of list
homomorphism;
Programming interface corresponding to the 3rd
homomorphism theorem.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Basic Homomorphism-Programming Interface
Two functions which define an homomorphism
filter :: a → b
plus :: b → b → b.
The implementation in Java
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Programming Interface based on the 3rd homomorphism
theorem
A function and its right inverse
fold :: [a] → b
unfold :: b → [a].
The implementation in Java
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
The implementation of Screwdriver : list representation
To implement our programming interface with Hadoop, we need to
consider how to represent lists in a distributed manner.
Input data: index-value pairs
We use integer as the index’s type, the list [a, b, c, d, e] is
represented by {(3, d), (1, b), (2, c), (0, a), (4, e)}.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Partition of input list
The pid(partition-id) of type PID is the index of a partial list. The
framework produces a same pid for the records which will be
grouped together. These records have continues id.
Intermediate data: nested pairs ((pid, id), val)
Suppose the above list was divided to two parts and in different
nodes, then they are represented as
{((0, 1), b), ((0, 2), c), ((0, 0), a)} and {((1, 3), d), ((1, 4), e)}.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Grouping and sorting of intermediate data
We defined two functions: the comparatorG and comparatorS as
follows:
comparatorG (pid1, id1) (pid2, id2) = if pid1 == pid2
then 0
else − 1
comparatorS (pid1, id1) (pid2, id2) = if id1 > id2
then 1
else − 1
for grouping intermediate records with same pid and sorting them
by id.
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr
Data partition
1 In MAP task,
intermediate records with same pid are grouped together and
sorted by id.
a partitioner dispatches the groups to different reducers.
2 In REDUCE task, reducers apply merge-sort on all groups
with same pid
Yu Liu1
, Zhenjiang Hu2
A Homomorphism-based Framework for Systematic Parallel Progr

More Related Content

What's hot (20)

Matlab presentation
Matlab presentationMatlab presentation
Matlab presentation
Usama Zahid
 
stacks in algorithems and data structure
stacks in algorithems and data structurestacks in algorithems and data structure
stacks in algorithems and data structure
faran nawaz
 
Real-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition SystemReal-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition System
IJORCS
 
Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsHelp the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
IJORCS
 
Knowledge Based Genetic Algorithm for Robot Path Planning
Knowledge Based Genetic Algorithm for Robot Path PlanningKnowledge Based Genetic Algorithm for Robot Path Planning
Knowledge Based Genetic Algorithm for Robot Path Planning
Tarundeep Dhot
 
ch13
ch13ch13
ch13
KITE www.kitecolleges.com
 
Improvement in Traditional Set Partitioning in Hierarchical Trees (SPIHT) Alg...
Improvement in Traditional Set Partitioning in Hierarchical Trees (SPIHT) Alg...Improvement in Traditional Set Partitioning in Hierarchical Trees (SPIHT) Alg...
Improvement in Traditional Set Partitioning in Hierarchical Trees (SPIHT) Alg...
AM Publications
 
DESIGN AND IMPLEMENTATION OF PATH PLANNING ALGORITHM
DESIGN AND IMPLEMENTATION OF PATH PLANNING ALGORITHM DESIGN AND IMPLEMENTATION OF PATH PLANNING ALGORITHM
DESIGN AND IMPLEMENTATION OF PATH PLANNING ALGORITHM
NITISH K
 
New Technique for Image Encryption Based on Choas and Change of MSB
New Technique for Image Encryption Based on Choas and Change of MSBNew Technique for Image Encryption Based on Choas and Change of MSB
New Technique for Image Encryption Based on Choas and Change of MSB
Editor IJCATR
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigation
guest90654fd
 
Automata based programming
Automata based programmingAutomata based programming
Automata based programming
M Reza Rahmati
 
MatLab Basic Tutorial On Plotting
MatLab Basic Tutorial On PlottingMatLab Basic Tutorial On Plotting
MatLab Basic Tutorial On Plotting
MOHDRAFIQ22
 
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...
Amir Ziai
 
Introduction to MATLAB
Introduction to MATLABIntroduction to MATLAB
Introduction to MATLAB
Sarah Hussein
 
Linear Model of Coregionalization
Linear Model of CoregionalizationLinear Model of Coregionalization
Linear Model of Coregionalization
Ed Isaaks
 
A computationally efficient method to find transformed residue
A computationally efficient method to find transformed residueA computationally efficient method to find transformed residue
A computationally efficient method to find transformed residue
iaemedu
 
Programming in python
Programming in pythonProgramming in python
Programming in python
Ivan Rojas
 
Rapport_Cemracs2012
Rapport_Cemracs2012Rapport_Cemracs2012
Rapport_Cemracs2012
Jussara F.M.
 
The Expression Problem (as part of the the PTT lecture)
The Expression Problem (as part of the the PTT lecture)The Expression Problem (as part of the the PTT lecture)
The Expression Problem (as part of the the PTT lecture)
Ralf Laemmel
 
Path Planning for Mobile Robot Navigation Using Voronoi Diagram and Fast Marc...
Path Planning for Mobile Robot Navigation Using Voronoi Diagram and Fast Marc...Path Planning for Mobile Robot Navigation Using Voronoi Diagram and Fast Marc...
Path Planning for Mobile Robot Navigation Using Voronoi Diagram and Fast Marc...
Waqas Tariq
 
Matlab presentation
Matlab presentationMatlab presentation
Matlab presentation
Usama Zahid
 
stacks in algorithems and data structure
stacks in algorithems and data structurestacks in algorithems and data structure
stacks in algorithems and data structure
faran nawaz
 
Real-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition SystemReal-Time Multiple License Plate Recognition System
Real-Time Multiple License Plate Recognition System
IJORCS
 
Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsHelp the Genetic Algorithm to Minimize the Urban Traffic on Intersections
Help the Genetic Algorithm to Minimize the Urban Traffic on Intersections
IJORCS
 
Knowledge Based Genetic Algorithm for Robot Path Planning
Knowledge Based Genetic Algorithm for Robot Path PlanningKnowledge Based Genetic Algorithm for Robot Path Planning
Knowledge Based Genetic Algorithm for Robot Path Planning
Tarundeep Dhot
 
Improvement in Traditional Set Partitioning in Hierarchical Trees (SPIHT) Alg...
Improvement in Traditional Set Partitioning in Hierarchical Trees (SPIHT) Alg...Improvement in Traditional Set Partitioning in Hierarchical Trees (SPIHT) Alg...
Improvement in Traditional Set Partitioning in Hierarchical Trees (SPIHT) Alg...
AM Publications
 
DESIGN AND IMPLEMENTATION OF PATH PLANNING ALGORITHM
DESIGN AND IMPLEMENTATION OF PATH PLANNING ALGORITHM DESIGN AND IMPLEMENTATION OF PATH PLANNING ALGORITHM
DESIGN AND IMPLEMENTATION OF PATH PLANNING ALGORITHM
NITISH K
 
New Technique for Image Encryption Based on Choas and Change of MSB
New Technique for Image Encryption Based on Choas and Change of MSBNew Technique for Image Encryption Based on Choas and Change of MSB
New Technique for Image Encryption Based on Choas and Change of MSB
Editor IJCATR
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigation
guest90654fd
 
Automata based programming
Automata based programmingAutomata based programming
Automata based programming
M Reza Rahmati
 
MatLab Basic Tutorial On Plotting
MatLab Basic Tutorial On PlottingMatLab Basic Tutorial On Plotting
MatLab Basic Tutorial On Plotting
MOHDRAFIQ22
 
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...
Amir Ziai
 
Introduction to MATLAB
Introduction to MATLABIntroduction to MATLAB
Introduction to MATLAB
Sarah Hussein
 
Linear Model of Coregionalization
Linear Model of CoregionalizationLinear Model of Coregionalization
Linear Model of Coregionalization
Ed Isaaks
 
A computationally efficient method to find transformed residue
A computationally efficient method to find transformed residueA computationally efficient method to find transformed residue
A computationally efficient method to find transformed residue
iaemedu
 
Programming in python
Programming in pythonProgramming in python
Programming in python
Ivan Rojas
 
Rapport_Cemracs2012
Rapport_Cemracs2012Rapport_Cemracs2012
Rapport_Cemracs2012
Jussara F.M.
 
The Expression Problem (as part of the the PTT lecture)
The Expression Problem (as part of the the PTT lecture)The Expression Problem (as part of the the PTT lecture)
The Expression Problem (as part of the the PTT lecture)
Ralf Laemmel
 
Path Planning for Mobile Robot Navigation Using Voronoi Diagram and Fast Marc...
Path Planning for Mobile Robot Navigation Using Voronoi Diagram and Fast Marc...Path Planning for Mobile Robot Navigation Using Voronoi Diagram and Fast Marc...
Path Planning for Mobile Robot Navigation Using Voronoi Diagram and Fast Marc...
Waqas Tariq
 

Similar to A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce (20)

Towards Systematic Parallel Programming over MapReduce
Towards Systematic Parallel Programming over MapReduceTowards Systematic Parallel Programming over MapReduce
Towards Systematic Parallel Programming over MapReduce
Yu Liu
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
Mohammad Mustaqeem
 
Large-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressivenessLarge-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressiveness
Sangjin Han
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
Pietro Michiardi
 
Arvindsujeeth scaladays12
Arvindsujeeth scaladays12Arvindsujeeth scaladays12
Arvindsujeeth scaladays12
Skills Matter Talks
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
gothicane
 
Hadoop v0.3.1
Hadoop v0.3.1Hadoop v0.3.1
Hadoop v0.3.1
Matthew McCullough
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
eSAT Publishing House
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
Harisankar H
 
try
trytry
try
Lamha Agarwal
 
An Intro to Hadoop
An Intro to HadoopAn Intro to Hadoop
An Intro to Hadoop
Matthew McCullough
 
93 1
93 193 1
93 1
Irfan Canan
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
David Gleich
 
Monads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy DyagilevMonads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy Dyagilev
JavaDayUA
 
SLE2015: Distributed ATL
SLE2015: Distributed ATLSLE2015: Distributed ATL
SLE2015: Distributed ATL
Amine Benelallam
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Konstantin V. Shvachko
 
Monadic Computations in C++14
Monadic Computations in C++14Monadic Computations in C++14
Monadic Computations in C++14
Ovidiu Farauanu
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
Subhas Kumar Ghosh
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
Kyong-Ha Lee
 
Map reduce
Map reduceMap reduce
Map reduce
xydii
 
Towards Systematic Parallel Programming over MapReduce
Towards Systematic Parallel Programming over MapReduceTowards Systematic Parallel Programming over MapReduce
Towards Systematic Parallel Programming over MapReduce
Yu Liu
 
Large-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressivenessLarge-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressiveness
Sangjin Han
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
Pietro Michiardi
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
gothicane
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
eSAT Publishing House
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
David Gleich
 
Monads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy DyagilevMonads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy Dyagilev
JavaDayUA
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Konstantin V. Shvachko
 
Monadic Computations in C++14
Monadic Computations in C++14Monadic Computations in C++14
Monadic Computations in C++14
Ovidiu Farauanu
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
Kyong-Ha Lee
 
Map reduce
Map reduceMap reduce
Map reduce
xydii
 
Ad

More from Yu Liu (20)

A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
Yu Liu
 
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and SolutionsCloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
Yu Liu
 
Introduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDocIntroduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDoc
Yu Liu
 
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
Yu Liu
 
Survey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search EnginesSurvey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search Engines
Yu Liu
 
Paper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded TreewidthPaper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Yu Liu
 
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set SelectionPaper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Yu Liu
 
An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013
Yu Liu
 
An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)
Yu Liu
 
An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)
Yu Liu
 
A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on Spark
Yu Liu
 
Introduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming FrameworkIntroduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming Framework
Yu Liu
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
Yu Liu
 
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Yu Liu
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and Experiments
Yu Liu
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce world
Yu Liu
 
Introduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsIntroduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applications
Yu Liu
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)
Yu Liu
 
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on HadoopScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
Yu Liu
 
Implementing Generate-Test-and-Aggregate Algorithms on Hadoop
Implementing Generate-Test-and-Aggregate Algorithms on HadoopImplementing Generate-Test-and-Aggregate Algorithms on Hadoop
Implementing Generate-Test-and-Aggregate Algorithms on Hadoop
Yu Liu
 
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
Yu Liu
 
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and SolutionsCloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
Yu Liu
 
Introduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDocIntroduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDoc
Yu Liu
 
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
Yu Liu
 
Survey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search EnginesSurvey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search Engines
Yu Liu
 
Paper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded TreewidthPaper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Yu Liu
 
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set SelectionPaper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Yu Liu
 
An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013
Yu Liu
 
An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)
Yu Liu
 
An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)
Yu Liu
 
A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on Spark
Yu Liu
 
Introduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming FrameworkIntroduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming Framework
Yu Liu
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
Yu Liu
 
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Yu Liu
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and Experiments
Yu Liu
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce world
Yu Liu
 
Introduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsIntroduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applications
Yu Liu
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)
Yu Liu
 
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on HadoopScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
Yu Liu
 
Implementing Generate-Test-and-Aggregate Algorithms on Hadoop
Implementing Generate-Test-and-Aggregate Algorithms on HadoopImplementing Generate-Test-and-Aggregate Algorithms on Hadoop
Implementing Generate-Test-and-Aggregate Algorithms on Hadoop
Yu Liu
 
Ad

Recently uploaded (20)

Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 ADr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr. Jimmy Schwarzkopf
 
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Nikki Chapple
 
LSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection FunctionLSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection Function
Takahiro Harada
 
SDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhereSDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhere
Adtran
 
UiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build PipelinesUiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build Pipelines
UiPathCommunity
 
Supercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMsSupercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMs
Francesco Corti
 
The case for on-premises AI
The case for on-premises AIThe case for on-premises AI
The case for on-premises AI
Principled Technologies
 
Let’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack CommunityLet’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack Community
SanjeetMishra29
 
Create Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent BuilderCreate Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent Builder
DianaGray10
 
Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Peter Bittner
 
Droidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing HealthcareDroidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing Healthcare
Droidal LLC
 
Dev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API WorkflowsDev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API Workflows
UiPathCommunity
 
Contributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptxContributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptx
Patrick Lumumba
 
Gihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai TechnologyGihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai Technology
zainkhurram1111
 
Maxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing placeMaxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing place
usersalmanrazdelhi
 
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyesEnd-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
Improving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevExImproving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevEx
Justin Reock
 
New Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDBNew Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 
Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 ADr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr Jimmy Schwarzkopf presentation on the SUMMIT 2025 A
Dr. Jimmy Schwarzkopf
 
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Nikki Chapple
 
LSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection FunctionLSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection Function
Takahiro Harada
 
SDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhereSDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhere
Adtran
 
UiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build PipelinesUiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build Pipelines
UiPathCommunity
 
Supercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMsSupercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMs
Francesco Corti
 
Let’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack CommunityLet’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack Community
SanjeetMishra29
 
Create Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent BuilderCreate Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent Builder
DianaGray10
 
Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Peter Bittner
 
Droidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing HealthcareDroidal: AI Agents Revolutionizing Healthcare
Droidal: AI Agents Revolutionizing Healthcare
Droidal LLC
 
Dev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API WorkflowsDev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API Workflows
UiPathCommunity
 
Contributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptxContributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptx
Patrick Lumumba
 
Gihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai TechnologyGihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai Technology
zainkhurram1111
 
Maxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing placeMaxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing place
usersalmanrazdelhi
 
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyesEnd-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
Improving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevExImproving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevEx
Justin Reock
 
New Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDBNew Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 

A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce

  • 1. A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce Yu Liu1, Zhenjiang Hu2 1 The Graduate University for Advanced Studies,Tokyo, Japan [email protected] 2 National Institute of Informatics,Tokyo, Japan 2 [email protected] Mar. 10th, 2011 Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 2. Background MapReduce Google’s MapReduce is a popular parallel-distributed programming model, for processing large data sets. It has been the de facto standard for large scale data analysis. Concepts from functional programming languages Automatic parallel processing, fault tolerance Good scalability Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 3. MapReduce Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 4. MapReduce Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 5. MapReduce Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 6. MapReduce Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 7. MapReduce Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 8. Programming with MapReduce A user has to design a D&C algorithm that fits MapReduce paradigm map this algorithm to MapReduce. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 9. Programming with MapReduce A user has to design a D&C algorithm that fits MapReduce paradigm map this algorithm to MapReduce. Difficulties of programming with MapReduce How to resolve the constrains on computing order. How to resolve the data dependency. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 10. Example The Maximum Prefix Sum problem mps [3, −1, 4, −1, −5, 9, 2, −6, 5, −10] = 11 A sequential program for MPS in O(n) time Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 11. Example The Maximum Prefix Sum problem mps [3, −1, 4, −1, −5, 9, 2, −6, 5, −10] = 11 Hard to compute MPS with MapReduce Computation has order. MPS of sub-lists cannot be conquered directly. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 12. Questions Is there a systematic way to resolving such problems with MapReduce ? How to handle the problems with district order ? Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 13. Questions Is there a systematic way to resolving such problems with MapReduce ? How to handle the problems with district order ? How to systematically design the divide-and-conquer algorithm ? Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 14. Motivation and objective We propose a systematic approach to automatically generate fully parallelized and scalable MapReduce programs. A new framework which provides algorithmic programming interfaces has been implemented. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 15. A systematic approach for programming with MapReduce Firstly, derive a function h. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 16. A systematic approach for programming with MapReduce Then write a inverse function h◦. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 17. A systematic approach for programming with MapReduce D&C algorithm can be gotten. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 18. A systematic approach for programming with MapReduce Map it to MapReduce paradigm. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 19. A systematic approach for programming with MapReduce Parallelization is in a black box. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 20. A systematic approach for programming with MapReduce Implemented by multi-phases MapReduce processing. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 21. Conditions of this f function Theorem If there exists a binary operator such that f (xs ++ ys) = f xs f ys then such can be defined as : x y = f (f ◦x ++ f ◦x) where ++ islistconcatenation. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 22. Iff a function can be defined both rightwards and leftwards, such exists. We can derive a divide-and-conquer algorithm like this: Divide-and-conquer f (xs ++ ys) = f (f ◦ (f xs) ++ f ◦ (f ys)) Such functions are so called: homomorphisms. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 23. Programming Interface Fold and unfold fold :: [α] → β unfold :: β → [α]. The implementation in Java Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 24. A function which computes MPS and its right inverse can be written as followings: fold xs = mps sum xs unfold (m, s) = [m, s − m] Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 25. The computation inside framework Use fold and unfold functions doing the computation: Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 26. Autonomous intermediate data Each record of the intermediate data has the information of position, thus the distribution of data is indifferent. < id, val > → << parId, id >, val > By taking use of sorting and grouping mechanism of MapReduce framework, lists can be reconstructed when necessary. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 27. A formal definitation homMR homMR :: (α → β) → (β → β → β) → {(ID, α)} → β homMR f (⊕) = getValue ◦ MapReduce mapper2 reducer2 ◦ MapReduce mapper1 reducer1 where mapper1 :: (ID, α) → [((PID, ID), α)] mapper1 (i, a) = [(pid, i), a))] where pid = makePid i reducer1 :: ((PID, ID), [α]) → ((PID, ID), β) reducer1 ((pid, j), ias) = ((pid, j), hom f (⊕) ias) Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 28. continued mapper2 :: ((PID, ID), β) → ((PID, ID), β) mapper2 ((pid, j), b) = ((c0, j), b) where c0 is a predefined constant pid reducer2 :: ((PID, ID), [β]) → ((PID, ID), β) reducer2 ((c0, k), jbs) = ((c0, k), hom f (⊕) jbs) getValue :: ((PID, ID), β) → β getValue ((c0, k), c) = c Where, hom f (⊕) denotes a sequential version of ([f , ⊕]). Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 29. Actual user-program for MPS http://screwdriver.googlecode.com Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 30. Performance evaluation Environment: hardware We configured clusters with 2, 4, 8, and 16 nodes. Each computing/data node has two Xeon CPUs (Nocona, single-core, 2.8 GHz), 2 GB memory. The nodes are connected with Gigabit Ethernet. Environment: software Linux2.6.26 ,Hadoop 0.21.0 +HDFS Hadoop configuration: heap size= 1024MB maximum mapper per node: 2 maximum reducer per node: 1 Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 31. Test cases We implemented several programs for three problems on our framework and Hadoop: 1 the maximum-prefix-sum problem. MPS-lh is implemented using our framework’ API. MPS-mr is implemented by Hadoop API. 2 parallel sum of 64-bit integers SUM-lh is implemented by our framework’ API. SUM-mr is implemented by Hadoop API. 3 VAR-lh computes the variance of 32-bit floating-point numbers; Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 32. Test cases Test data 100 million 64-bit integers (2.87 GB) for MPS, SUM. 100 million 32-bit floating-point numbers (2.76 GB) for VAR. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 33. Performance The experiment results are summarized : With 16 nodes speedup of all cases are more than 7. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 34. Performance Time curves: Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 35. Concluding remarks In this research: Introduced a systematic way of parallel programming on MapReduce. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 36. Concluding remarks In this research: Introduced a systematic way of parallel programming on MapReduce. Developed a framework on top of Hadoop. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 37. Concluding remarks In this research: Introduced a systematic way of parallel programming on MapReduce. Developed a framework on top of Hadoop. Algorithmic programming interfaces let user can focus on the algebraic properties of problem. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 38. Concluding remarks In this research: Introduced a systematic way of parallel programming on MapReduce. Developed a framework on top of Hadoop. Algorithmic programming interfaces let user can focus on the algebraic properties of problem. Details of MapReduce are hidden. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 39. Concluding remarks In this research: Introduced a systematic way of parallel programming on MapReduce. Developed a framework on top of Hadoop. Algorithmic programming interfaces let user can focus on the algebraic properties of problem. Details of MapReduce are hidden. Achieved good scalability and parallelism. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 40. Concluding remarks In this research: Introduced a systematic way of parallel programming on MapReduce. Developed a framework on top of Hadoop. Algorithmic programming interfaces let user can focus on the algebraic properties of problem. Details of MapReduce are hidden. Achieved good scalability and parallelism. Automatic optimization can be equipped. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 41. Future work Decrease the system overhead and do more optimization. Extend to more complex data structure such as tree and graph. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 42. Related work Parallel programming with list homomorphisms (M.Cole 95) The Third Homomorphism Theorem(J.Gibbons 96). Systematic extraction and implementation of divide-and-conquer parallelism (Gorlatch PLILP96). Automatic inversion generates divide-and-conquer parallel programs(Morita et.al., PLDI07). The third homomorphism theorem on trees: downward & upward lead to divide-and-conquer (Morihata, POPL09) Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 43. Thank you very much. Questions? Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 44. List Homomorphism Function h is said to be a list homomorphism If there are a function f and an associative operator such that for any list x and list y h [a] = f a h (x ++ y) = h(x) h(y). Where ++ is the list concatenation. Instance of a list homomorphism sum [a] = a sum (x ++ y) = sum x + sum y. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 45. Theorem (The Third Homomorphism Theorem (Gibbons,96) ) Let h be a given function and and be binary operators. If the following two equations hold for any element a and list y h ([a] ++ y) = a h y h (y ++ [a]) = h y a then the function h is a homomorphism. In fact, for a function h, if we have one of its right inverse h◦ that satisfies h ◦ h◦ ◦ h = h, then we can obtain the list-homomorphic definition as follows. h = ([f , ]) where f a = h [a] l r = h (h◦ l ++ h◦ r) Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 46. MapReduce programs can be automatically obtained by two sequential functions homomorphism ([f , ⊕]) f :: a → b ⊕ :: b → b → b (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c). fold and unfold, that compose leftwards and rightwards functions fold([a] ++ x) = fold([a] ++ unfold(fold(x))) fold(x ++ [a]) = fold(unfold(fold(x)) ++ [a]). Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 47. Currently, Screwdriver provides two kinds of programming interfaces: Programming interface corresponding to definition of list homomorphism; Programming interface corresponding to the 3rd homomorphism theorem. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 48. Basic Homomorphism-Programming Interface Two functions which define an homomorphism filter :: a → b plus :: b → b → b. The implementation in Java Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 49. Programming Interface based on the 3rd homomorphism theorem A function and its right inverse fold :: [a] → b unfold :: b → [a]. The implementation in Java Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 50. The implementation of Screwdriver : list representation To implement our programming interface with Hadoop, we need to consider how to represent lists in a distributed manner. Input data: index-value pairs We use integer as the index’s type, the list [a, b, c, d, e] is represented by {(3, d), (1, b), (2, c), (0, a), (4, e)}. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 51. Partition of input list The pid(partition-id) of type PID is the index of a partial list. The framework produces a same pid for the records which will be grouped together. These records have continues id. Intermediate data: nested pairs ((pid, id), val) Suppose the above list was divided to two parts and in different nodes, then they are represented as {((0, 1), b), ((0, 2), c), ((0, 0), a)} and {((1, 3), d), ((1, 4), e)}. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 52. Grouping and sorting of intermediate data We defined two functions: the comparatorG and comparatorS as follows: comparatorG (pid1, id1) (pid2, id2) = if pid1 == pid2 then 0 else − 1 comparatorS (pid1, id1) (pid2, id2) = if id1 > id2 then 1 else − 1 for grouping intermediate records with same pid and sorting them by id. Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr
  • 53. Data partition 1 In MAP task, intermediate records with same pid are grouped together and sorted by id. a partitioner dispatches the groups to different reducers. 2 In REDUCE task, reducers apply merge-sort on all groups with same pid Yu Liu1 , Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Progr