Stateful MapReduce PDF
Stateful MapReduce PDF
Ahmed Elgohary
Electrical and Computer Engineering Department
University of Waterloo
200 University Avenue West, Waterloo, ON, Canada
ABSTRACT
Hadoop is considered the cornerstone of todays cloud analytics. Much work is being carried out towards developing and enhancing its capabilities. However, an opposite
research direction has started to emerge. In that direction,
researchers are arguing that Hadoop is not suitable for some
applications so, new frameworks need to be developed. Examples on such applications are graph analytics, online incremental processing, and iterative algorithms.
In this paper it is envisioned that by adding and maintaining
states across multiple Hadoop jobs, a wide range of applications will perfectly fit into Hadoop eliminating the need
to develop new frameworks. A Stateful MapReduce API
in addition to efficient design and implementation to extend
Hadoop are presented. Our experimental evaluation demonstrates the effectiveness of the proposed extensions.
Hadoop[7] is the most commonly used MapReduce implementation. Much work is being carried out to develop and
improve its capabilities. For example, the authors in [2] presented policies for grouping and scheduling multiple MapReduce jobs in order to improve the overall system throughCategories and Subject Descriptors
put. In [12] opportunities for sharing portions of the work
D.3 [Programming Techniques]: Concurrent Programcarried out by multiple MapReduce jobs were identified. An
ming, Distributed programming
analytical model for grouping jobs together was accordingly
developed. Another interesting direction towards Hadoop
General Terms
development was presented in [8]. In that work, the authors
Design
considered the problem of automatically tuning Hadoop parameters based on the expected behaviour of the submitted
jobs. Also, [1] presented a hybrid model to combine Hadoop
Keywords
Distributed computing, Cloud Computing, MapReduce, Hadoop with relational databases in order to enhance the systems
performance.
1.
INTRODUCTION
Recently, researcher started to argue that Hadoop(or MapReduce framework in general) is not suitable for some applications so, new frameworks need to be developed to suit these
applications. For examples, the authors in [11] stated that
graph algorithms does not fit into MapReduce. So, they
built a totally new framework (Pregel) designed specifically
for graph processing. In [10] a new architecture for stateful
bulk processing for dataflow programs was presented. In [5]
the authors were concerned about using Hadoop for online
analytics due to the latency introduced by materializing the
intermediate date. Hence, they proposed a modified MapReduce architecture that allows data to be pipelined between
operators. For iterative processing using Hadoop, the authors in [4] developed Haloop in which loop-invariant data
are cached locally at the worker machines.
It can be noticed from the paragraph above that eventually
we will end up with several frameworks (totally new frameworks or different variations of Hadoop). In this paper, we
2.
We modified MapReduce API to provide users with an access to the states. Users can store/retrieve key-value pairs
to/from the state of each task (Mapper or Reducer). The
Stateful Mapper/Reducer are defined as:
map(keyIn, valIn, State): <KeyOut, ValOut>
reduce(keyIn, <List> ValsIn, State): <Keyout, ValOUt>
Users can access State as follows:
int count = state.get("count")
state.set("count", count)
Users also need to specify which tasks should be stateful and
which tasks should be stateless. The API is flexible enough
so users can combine stateful and stateless tasks in the same
job.
3.
EXAMPLES
3.1
3.2
3.3
Jobs Queue
Job Tracker
New Job
Init Job
Create Tasks (set Backup state) Scheduler
Add
Add Job to Queue
Schedule Tasks on Try to Schedule on the
BackupStates Table
TTaskTracker
previous TaskTracker
Heartbeat
Communication
HDFS
Store
State
Retrieve
State
Execution JVM
Retrieve State from HDFS if
needed
Invoke Stateful API
Write new state to HDFS
Return new State & Backup
Location
RPC-Based
Protocol
Task Tracker
Retrieve previous state from table
fter task execution
Update state after
States Table
minDistance+edge.weight)
4.
In this section, the proposed design and implementation details of extending Hadoop to support the Stateful MapReduce API described in Section 2 are given. In order for the
Stateful MapReduce to be acceptable, maintaining states
should be achieved with the minimum additional overhead.
Also, the new API should not affect the scalability and the
fault tolerance of Hadoop.
In the basic architecture of Hadoop a JobTracker process
runs on the master node and a TaskTracker process runs on
each slave node. When a job is submitted to the system, the
JobTracker initializes the job, creates the map and reduce
tasks, and then adds the job to the execution queue. TaskTrackers communicate with the JobTracker in a heartbeat
communication. When the JobTracker receives a heartbeat
from a TaskTracker indicating that this TaskTracker can accept new tasks, Task scheduler picks the suitable tasks from
the jobs queue and assigns them to that TaskTracker. The
task scheduler tries to assign map tasks on the same machines where their inputs exist. TaskTracker creates a new
execution JVM for each task.
The proposed extensions are based on: 1) States are maintained locally at Task Trackers, 2) A Persistent copy of each
state is written to HDFS and 3) At the end of each task, the
JobTracker is informed with the location of the persistent
state of each task.
Figure 1 shows our modifications to the overall system architecture. BackUpState table is maintained by the JobTracker to store the location of the persistent state of each
5.
EVALUATOIN
Comparing Running Time of Sessionization Task using Stateless and Stateful MapReduce
35
Stateless MapReduce
Stateful MapReduce
30
25
Running Time (mins)
20
15
10
0
1(1.33)
2(2.66)
3(3.99)
pared the running time of two MapReduce jobs: 1) Stateless MapReduce job and 2) Stateful MapReduce job. In the
stateless job, the the system combines all the logs received
after each notification and resubmit all of them as a new
MapReduce job. In the stateful job, stateful reducers are
used to maintain the total number of objects requested by
each user so far and only the newly arriving logs are submitted as the job input. The running time of processing
each notification is recorded in addition to the latency overhead introduced by maintaining the states in the stateful
program.
The used evaluation infrastructure consisted of a cluster of
10 slave Amazon EC2 small instances in addition to 1 master
small instance. Each instance had 1.7GB memory, 1 EC2
Compute Unit 1 virtual core with 1 EC2 Compute Unit)
and 160GB local storage. All the machines were running
fedora-core linux, java 1.6.0 07, and Hadoop 0.203.0. We
created a new customized Amazon Machine Image (AMI)
on which the Stateful MapReduce implementation inside
Hadoop 0.203.0 was deployed and recreated a similar cluster to run the stateful jobs. All the default Hadoop configurations were not changed expect for the number of the
reducers. We used 25 reducers for both experiments.
Figure 2 shows the running time of jobs launched after notification. Using stateless MapReduce, the running time keeps
increasing as more data is received which indicates the low
performance of the stateless MapReduce when used in such
applications especially when the much data needs to be processed. On the other hand, stateful MapReduce achieves
almost constant running time as more data arrives to the
system since it avoids all the redundant communications
(resubmitting all the previously received logs after each new
notification) and computations (recounting the number of
the objects requested by each user).
To provide an estimate for the incurred overhead resulting
from maintaining states, the latency of writing each task
state to the persistent storage (HDFS) in addition to the
size of the state. Figure 3 shows the average latency and
Online MapReduce[5] can be considered a complementary work to the stateful MapReduce since online MapReduce is concerned with avoiding the latency of materializing the intermediate data while our work is concerned with avoid the latency of repeating computations and data transfers.
800
5.6
5.4
700
600
4.8
4.6
500
4.4
5.2
Building a system that is aware of the states introduces a lot of optimizations opportunities. For example as described in section 4 the scheduler at the JobTracker utilizes the information about the TaskTracker
on which a task was previously run to make a better
scheduling design to avoid retrieving states from the
persistent storage.
4.2
400
4
3.8
300
3.6
1
5
6
Run Number
10
6.
A second set of experiments in which we investigate the performance of stateful MapReduce using other types of jobs
are currently in progress. In these experiments we consider
the PageRank an the Single Source Shortest Path problems
described in 3.2 and 3.3 respectively. We prepared a semisynthetic large graph dataset. The LiveJournal [9] graph
that consists of 4847571 nodes and 68993773 edges is used.
Weights for the edges were generated randomly from the
range [0,1]. To enlarge the size of the dataset, a long string
was appended to each node Id. We managed to make the
graph size around 12GB. We plan to compare the running
time of stateless and stateful versions of the two jobs.
There are other three possible directions to investigate towards the development of the stateful MapReduce:
DISCUSSION
2. In our current implementation we optimized the scheduling of reduce tasks. However, it is more challenging to
consider the states when scheduling map tasks. Map
task scheduling is based on avoiding loading map input from a remote machine so, loading task state also
should be considered when deciding on the machine on
which each map task should be run.
7.
CONCLUSION
8.
REFERENCES