0% found this document useful (0 votes)
11 views80 pages

A01_MAJOR

The project report presents a work-sharing algorithm called Honeybee, designed to utilize nearby mobile devices as a resource cloud to complement remote cloud services. It addresses challenges such as node heterogeneity and dynamic worker capabilities, achieving significant performance improvements and energy savings through a prototype framework. The report is submitted by four students for their Bachelor of Technology in Computer Science and Engineering at St. Martin's Engineering College under the guidance of Dr. M. Narayanan.

Uploaded by

mannanabdul049
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views80 pages

A01_MAJOR

The project report presents a work-sharing algorithm called Honeybee, designed to utilize nearby mobile devices as a resource cloud to complement remote cloud services. It addresses challenges such as node heterogeneity and dynamic worker capabilities, achieving significant performance improvements and energy savings through a prototype framework. The report is submitted by four students for their Bachelor of Technology in Computer Science and Engineering at St. Martin's Engineering College under the guidance of Dr. M. Narayanan.

Uploaded by

mannanabdul049
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

A

Project Report on

COMPUTING WITH NEARBY SERVE: A WORK SHARING


ALGORITHM FOR CLOUD SERVERS
Submitted for partial fulfilment of the requirements for the award of the degree of

BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
by

GADE PRATHYUSHA 18K81A0522


CH CHANDANA 18K81A0514
BOGARAM AKANKSHA 18K81A0512
SYED ABDUL MANNAN H HASHMI 16K81A0554

Under the Guidance of

Dr. M. NARAYANAN

PROFESSOR & HEAD

DEPARTMENT OF CSE

DEPARTMENT OF COMPUTER SCIENCE AND


ENGINEERING

St. MARTIN'S ENGINEERING COLLEGE


UGC Autonomous
Affiliated to JNTUH, Approved by AICTE,
Accredited by NBA & NAAC A+, ISO 9001:2008 Certified

JULY - 2022
St. MARTIN'S ENGINEERING COLLEGE
An Autonomous Institute
NBA & NAAC A+ Accredited
Dhulapally, Secunderabad - 500 100
ww.smec.ac.in

CERTIFICATE

This is to certify that the project entitled “Computing with nearby server: A work

sharing algorithm for cloud servers” is being submitted by MS. GADE

PRATHYUSHA 18K81A0522, MS. CH CHANDANA 18K81A0514, MS.

BOGARAM AKANKSHA 18K81A0512, MR. SYED ABDUL MANNAN H

HASHMI 16K81A0554, in fulfilment of the requirement for the award of degree of

BACHELOR IN COMPUTER SCIENCE AND ENGINEERING is recorded of

bonafide work carried out by them. The result embodied in this report have been verified

and found satisfactory.

Guide Head of the Department


Dr. M. NARAYANAN Dr. M. NARAYANAN
Professor & Head Professor & Head

Department of CSE Department of CSE

Internal Examiner External Examiner

Date:

Place:

ii
St. MARTIN'S ENGINEERING COLLEGE
An Autonomous Institute
NBA & NAAC A+ Accredited
Dhulapally, Secunderabad - 500 100
ww.smec.ac.in

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DECLARATION

We, the students of ‘Bachelor of Technology in Department of Computer Science and


Engineering’, session: 2018 - 2022, St. Martin’s Engineering College, Dhulapally,
Kompally, Secunderabad, hereby declare that the work presented in this Mini Project Work
entitled ‘Computing with nearby server: A work sharing algorithm for cloud servers’ is
the outcome of our own bonafide work and is correct to the best of our knowledge and this
work has been undertaken taking care of Engineering Ethics. This result embodied in this
project report has not been submitted in any university for award of any degree.

Gade Prathyusha 18K81A0522


Ch Chandana 18K81A0514
Bogaram Akanksha 18K81A0512
Syed Abdul Mannan H Hashmi 16K81A0554

iii
ACKNOWLEDGEMENT

The satisfaction and euphoria that accompanies the successful completion of any task
would be incomplete without the mention of the people who made it possible and whose
encouragement and guidance have crowded our efforts with success.

We extend our deep sense of gratitude to Principal, Dr. P. SANTOSH KUMAR PATRA,
St. Martin’s Engineering College Dhulapally, for permitting us to undertake this project.

We are also thankful to Dr. M. NARAYANAN, Head of the Department, Department of


Computer Science and Engineering, St. Martin’s Engineering College, Dhulapally,
Secunderabad for his support and guidance throughout our project

We would like to thank our Project Coordinators Dr. B. RAJALINGAM, Associate


Professor and Dr. G. GOVINDARAJULU, Professor, Department of Computer Science and
Engineering for their valuable support.

We would like to express our sincere gratitude and indebtedness to our project supervisor
Dr. M. NARAYANAN, Professor and Head of the Department , Department of Computer
Science and Engineering, St. Martins Engineering College, Dhulapally, for his support and
guidance throughout our project.

Finally, we express thanks to all those who have helped us successfully completing this
project. Furthermore, we would like to thank our family and friends for their moral support
and encouragement. We express thanks to all those who have helped us in successfully
completing the project.

Gade Prathyusha 18K81A0522


Ch Chandana 18K81A0514
Bogaram Akanksha 18K81A0512
Syed Abdul Mannan H Hashmi 16K81A0554

iv
ABSTRACT
As mobile devices evolve to be powerful and pervasive computing tools, their usage
also continues to increase rapidly. However, mobile device users frequently experience
problems when running intensive applications on the device itself, or offloading to remote
clouds, due to resource shortage and connectivity issues. Ironically, most users’ environments
are saturated with devices with significant computational resources. This paper argues that
nearby mobile devices can efficiently be utilised as a crowd-powered resource cloud to
complement the remote clouds. Node heterogeneity, unknown worker capability, and
dynamism are identified as essential challenges to be addressed when scheduling work among
nearby mobile devices. We present a work sharing model, called Honeybee, using an
adaptation of the well-known work stealing method to load balance independent jobs among
heterogeneous mobile nodes, able to accommodate nodes randomly leaving and joining the
system. The overall strategy of Honeybee is to focus on short-term goals, taking advantage of
opportunities as they arise, based on the concepts of proactive workers and opportunistic
delegator. We evaluate our model using a prototype framework built using Android and
implement two applications. We report speedups of up to 4 with seven devices and energy
savings up to 71% with eight devices.

The Mobile Edge Cloud (MEC) is a network architecture concept that offers a cloud-
like capability at the edge of the network. Being close to the end users, MECs decrease the
latency and increase the performance of high-bandwidth applications.

Crowdsourcing involves a large group of dispersed participants contributing or


producing goods or services—including ideas, voting, micro-tasks, and finances—for payment
or as volunteers. Contemporary crowdsourcing often involves digital platforms to attract and
divide work between participants to achieve a cumulative result; however, it may not always
be an online activity, and there are various historical examples of crowdsourcing. On the other
hand, mobile crowd computing is an amalgamation of the machine and human intelligence to
achieve a given set of tasks in a distributed manner. Thus, crowd computing utilizes the
computation and communication capability of the devices.

v
CONTENTS

CERTIFICATE ii

DECLARATION iii

ACKNOWLEDGEMENT iv

ABSTRACT v

LIST OF FIGURES viii

LIST OF TABLES ix

LIST OF ACRONYMS AND DEFINITIONS x

CHAPTER 1: INTODUCTION 01

1.1 OBJECTIVE OF THE PROJECT 03

CHAPTER 2: LITERATURE SURVEY 04

CHAPTER 3: SYSTEM ANALYSIS AND DESIGN 11

3.1 Existing System 11

3.2 Proposed System 11

CHAPTER 4: SYSTEM REQUIREMENTS & SPECIFICATIONS 13

4.1 INTRODUCTION OF TECHNOLOGIES USED 13

4.2 DESIGN 32

4.2.1 UML Diagrams 32

4.2.2 Class Diagram 32

4.2.3 Use Case Diagram 33

4.2.4 Sequence Diagram 34

4.2.5 Collaborative Diagram 35

4.2.6 Component Diagram 36

4.2.7 Deployment Diagram 37

vi
4.2.8 Activity Diagram 37

4.2.9 Data Flow Diagram 38

4.3 MODULES 39

4.3.1 Mapper Module 39

4.3.2 Aggregator Module 39

4.3.3 Reducing Module 39

4.4 TEST CASES 40

4.5 SYSTEM REQUIREMENTS 41

4.5.1 Hardware Requirements 41

4.5.2 Software Requirements 41

4.6 TESTING 42

4.6.1 Implementation and Testing 42

4.6.2 Implementation 42

4.6.3 Testing 42

4.6.4 System Testing 42

4.6.5 Module Testing 43

4.6.6 Integration Testing 43

4.6.7 Acceptance Testing 43

CHAPTER 5: SOURCE CODE 44

CHAPTER 6: EXPERIMENTAL RESULTS 57

CHAPTER 7: CONCLUSION AND SCOPE 65

7.1 CONCLUSION 65

7.2 FUTURE SCOPE 65

CHAPTER 8: REFERENCES 66

ONE PAGE STUDENT PROFILES 67

vii
LIST OF FIGURES
FIGURE No. FIGURE TITLE PAGE NO.

1.1 Classification of Cloud Computing Subsets 02

4.1 Compiling & Interpreting Java Source Code 14

4.2 Types of Components 16

4.3 Types of Containers 17

4.4 Java Swing Class Hierarchy 28

4.5 Class Diagram 33

4.6 Use Case Diagram 34

4.7 Sequence Diagram 35

4.8 Collaborative Diagram 36

4.9 Component Diagram 36

4.10 Deployment Diagram 37

4.11 Activity Diagram 38

6.1 Mapper Application Home Screen 57

6.2 Defining Reducer1 Location 58

6.3 Defining Reducer2 Location 59

6.4 Upload Documents 60

6.5 MapReduce Aggregation 61

6.6 Aggregated Data 62

6.7 Results on Reducer Nodes 63

6.8 Traffic Cost Graph 64

viii
LIST OF TABELS
TABLE NO. TABLE NAME PAGE NO.

4.1 Swing & AWT Differences 27

4.2 Swing & AWT Components 28, 29

4.3 Test Cases 40

ix
LIST OF ACRONYMS AND DEFINITIONS

S. NO. ACRONYM DEFINITION

01 ADMM Alternating Direction Method of Multiplier

02 DDP Distributed Data Parallelization

03 DFD Data Flow Diagram

04 HDFS Hadoop Distributed File System

05 MSJO MapReduce Server Job Organizer Problem

06 NGS Next Generation Sequencing

07 NMF Non-Negative Matrix Factorization

08 RTM Requirements Traceability Matrix

09 SDLC System Development Life Cycle

10 SNMF Separable Non-Negative Matrix

Factorization

x
CHAPTER 1
INTRODUCTION
Today’s environments are becoming embedded with mobile devices with
augmented capabilities, equipped with various sensors, wireless connectivity as well as
limited computational resources. Whether we are on the move, on a train, or at an airport, in
a shopping Centre or on a bus, a plethora of mobile devices surround us every day, thus
creating a resource-saturated ecosystem of machine and human intelligence. However,
beyond some traditional web-based applications, current technology does not facilitate
exploiting this resource rich space of machine and human resources. Collaboration among
such smart mobile devices can pave the way for greater computing opportunities, not just
by by creating crowd-sourced computing opportunities needing a human element, but also
by solving the resource limitation problem inherent to mobile devices. While there are
research projects in areas such as mobile grid computing where mobile work sharing is
centrally coordinated by a remote server (HTC power to give1) and crowd-powered systems
using mobile devices (Kamino2, Parko3) a gap exists for supporting collective resource
sharing without relying on a remote entity for connectivity and coordination. However, such
mobile crowds (also referred to as mobile edge clouds) are not meant to replace the remote
cloud computing model, but to complement it as given below:

 As an alternative resource cloud in environments where connectivity to remote clouds


is minimal.
 To decrease the strain on the network.
 To utilize machine resources of idle mobile devices.
 To exploit mobile devices’ sensor capabilities which has enabled the mobile crowd
sensing paradigm? A resource cloud capable of such multi-modality sensing can
enable innovative applications.
 As mobile devices are usually accompanied by users, they also possess an element of
human intelligence which can be leveraged to solve issues that require human
intervention, such as qualitative classification.
 A mobile crowd can be viewed as a specialized form of a mobile cloud which, in turn,
can be viewed from two main perspectives
 migrating the computation and storage in mobile devices to resource-rich centralized
remote servers, and everaging the computational capabilities of the mobile devices by

1
having them as resource nodes, as been adopted in research such as the Mobile Device
Cloud, Hyrax, Mobile Edge-Clouds, MClouds, MMPI, Virtual cloud computing for
mobile devices.

Fig. 1.1: Classifications of Cloud Computing subsets


Both of these views have the same objective of moving computation and/or storage
away from the resource- constrained mobile device to an external entity. As illustrated in
Fig. 1.1: the difference lies in the nature of external resource providers used to augment the
computing potential of mobile devices. The focus of this paper is on mobile crowd (or edge-
cloud). In our view, the human user of a mobile device is also a resource, which adds an
element of crowd computing to the mobile cloud as well. Therefore, we refer to this
specialized mobile cloud as the Mobile Crowd.

There are several unique features that differentiate mobile crowd environments
from a typical grid/distributed computing cluster, such as less computation power and
limited energy on nodes, node mobility resulting in frequent disconnections, and node
heterogeneity. Hence, solutions from grid/distributed computing cannot be used as they are,
and need to be adapted to suit the requirements of mobile crowd environments.

2
1.1 OBJECTIVE OF THE PROJECT

As mobile devices evolve to be powerful and pervasive computing tools, their


usage also continues to increase rapidly. However, mobile device users frequently
experience problems when running intensive applications on the device itself, or offloading
to remote clouds, due to resource shortage and connectivity issues. Ironically, most users’
environments are saturated with devices with significant computational resources. This
paper argues that nearby mobile devices can efficiently be utilized as a crowd-powered
resource cloud to complement the remote clouds. Node heterogeneity, unknown worker
capability, and dynamism are identified as essential challenges to be addressed when
scheduling work among nearby mobile devices. We present a work sharing model, called
Honeybee, using an adaptation of the well-known work stealing method to load balance
independent jobs among heterogeneous mobile nodes, able to accommodate nodes randomly
leaving and joining the system. The overall strategy of Honeybee is to focus on short-term
goals, taking advantage of opportunities as they arise, based on the concepts of proactive
workers and opportunistic delegator. We evaluate our model using a prototype framework
built using Android and implement two applications. We report speedups of up to 4 with
seven devices and energy savings up to 71% with eight devices.

3
CHAPTER 2
LITERATURE SURVEY
MapReduce: Simplified Data Processing on Large Clusters

MapReduce is a programming model and an associated implementation for


processing and generating large data sets. Users specify a map function that processes a
key/value pair to generate a set of intermediate key/value pairs, and a reduce function that
merges all intermediate values associated with the same intermediate key. Many real-world
tasks are expressible in this model, as shown in the paper. Programs written in this functional
style are automatically parallelized and executed on a large cluster of commodity machines.
The run-time system takes care of the details of partitioning the input data, scheduling the
program’s execution across a set of machines, handling machine failures, and managing the
required inter-machine communication. This allows programmers without any experience
with parallel and distributed systems to easily utilize the resources of a large distributed
system. Our implementation of MapReduce runs on a large cluster of commodity machines
and is highly scalable: a typical MapReduce computation processes many terabytes of data
on thousands of machines. Programmers find the system easy to use: hundreds of
MapReduce programs have been implemented and upwards of one thousand MapReduce
jobs are executed on Google’s clusters every day.

The MapReduce programming model has been successfully used at Google for
many different purposes. We attribute this success to several reasons. First, the model is
easy to use, even for programmers without experience with parallel and distributed systems,
since it hides the details of parallelization, fault-tolerance, locality optimization, and load
balancing. Second, a large variety of problems are easily expressible as MapReduce
computations. For example, MapReduce is used for the generation of data for Google’s
production web search service, for sorting, for data mining, for machine learning, and many
other systems. Third, we have developed an implementation of MapReduce that scales to
large clusters of machines comprising thousands of machines. The implementation makes
efficient use of these machine resources and therefore is suitable for use on many of the
large computational problems encountered at Google. We have learned several things from
this work. First, restricting the programming model makes it easy to parallelize and
distribute computations and to make such computations fault-tolerant. Second, network
bandwidth is a scarce resource. A number of optimizations in our system are therefore

4
targeted at reducing the amount of data sent across the network: the locality optimization
allows us to read data from local disks, and writing a single copy of the intermediate data to
local disk saves network bandwidth. Third, redundant execution can be used to reduce the
impact of slow machines, and to handle machine failures and data loss [1].

Map Task Scheduling in MapReduce with Data Locality: Throughput and Heavy-
Traffic Optimality

Scheduling map tasks to improve data locality is crucial to the performance of


MapReduce. Many works have been devoted to increasing data locality for better efficiency.
However, to the best of our knowledge, fundamental limits of MapReduce computing
clusters with data locality, including the capacity region and theoretical bounds on the delay
performance, have not been studied. In this paper, we address these problems from a
stochastic network perspective. Our focus is to strike the right balance between data-locality
and load-balancing to simultaneously maximize throughput and minimize delay. We present
a new queuing architecture and propose a map task scheduling algorithm constituted by the
Join the Shortest Queue policy together with the MaxWeight policy. We identify an outer
bound on the capacity region, and then prove that the proposed algorithm stabilizes any
arrival rate vector strictly within this outer bound. It shows that the algorithm is throughput
optimal and the outer bound coincides with the actual capacity Region. Further, we study
the number of backlogged tasks under the proposed algorithm, which is directly related to
the delay performance based on Little’s law. We prove that the proposed algorithm is heavy-
traffic optimal, i.e., it asymptotically minimizes the number of backlogged tasks as the
arrival rate vector approaches the boundary of the capacity region. Therefore, the proposed
algorithm is also delay optimal in the heavy-traffic regime.

We considered map scheduling algorithms in MapReduce with data locality. We


first presented the capacity region of a MapReduce computing cluster with data locality and
then we proved the throughput optimality. Beyond throughput, we showed that the proposed
algorithm asymptotically minimizes the number of backlogged tasks as the arrival rate
vector approaches the boundary of the capacity region, i.e., it is heavy-traffic optimal [2]

Joint Scheduling of MapReduce Jobs with Servers: Performance Bounds and


Experiments

MapReduce has achieved tremendous success for large-scale data processing in


data centres. A key feature distinguishing MapReduce from previous parallel models is that

5
it interleaves parallel and sequential computation. Past schemes, and especially their
theoretical bounds, on general parallel models are therefore, unlikely to be applied to
MapReduce directly. There are many recent studies on MapReduce job and task scheduling.
These studies assume that the servers are assigned in advance. in current data centres,
multiple MapReduce jobs of different Importance levels run together. In this paper, we
investigate a schedule problem for MapReduce taking server assignment in to consideration
as well. We formulate a MapReduce server-job organizer problem (MSJO) and show that it
is NP-complete. We develop a 3-approximation algorithm and a fast heuristic. We evaluate
our algorithms through both simulations and experiments on Amazon EC2 with an
implementation in Hadoop. The results confirm the advantage of our algorithms

In this paper, we studied MapReduce job scheduling with consideration of server


assignment. We showed that with- out such joint consideration, there can be great
performance loss. We formulated a MapReduce server-job organizer problem. This problem
is NP-complete and we developed a 3- approximation algorithm MarS. We evaluated our
algorithm through extensive simulation. The results show that MarS can outperform state-
of-the-art strategies by as much as 40 % in terms of total weighted job completion time. We
also implement a prototype of MarS in Hadoop and test it with experiment on Amazon EC2.
The experiment results confirm the advantage of our algorithm [3]

The Hadoop Distributed File System

The Hadoop Distributed File System (HDFS) is designed to store very large data
sets reliably, and to stream those data sets at high bandwidth to user applications. In a large
cluster, thousands of servers both hosts directly attached storage and execute user
application tasks. By distributing storage and computation across many servers, the resource
can grow with demand while remaining economical at every size. We describe the
architecture of HDFS and report on experience using HDFS to manage 25 petabytes of
enterprise data at Yahoo.

This section presents some of the future work that the Hadoop team at Yahoo is
considering; Hadoop being an open-source project implies that new features and changes
are decided by the Hadoop development community at large. The Hadoop cluster is
effectively unavailable when its Name Node is down. Given that Hadoop is used primarily
as a batch system, restarting the Name Node has been a satisfactory recovery means.
However, we have taken steps towards auto-mated failover. Currently a Backup Node

6
receives all transactions from the primary Name Node. This will allow a failover to a warm
or even a hot BackupNode if we send block reports to both the primary NameNode and
BackupNode. A few Hadoop users outside Yahoo! have experimented with manual failover.
Our plan is to use Zookeeper, Yahoo’s distributed consensus technology to build an
automated failover solution. Scalability of the NameNode has been a key struggle. Because
the NameNode keeps all the namespace and block locations in memory, the size of the
NameNode heap has limited the number of files, also the number of blocks address-able [4].

Map-Reduce Meets Wider Varieties of Applications

Recent studies and industry practices build data-center-scale computer systems to


meet the high storage and processing demands of data-intensive and compute-intensive
applications, such as web searches. The Map-Reduce programming model is one of the most
popular programming paradigms on these systems. In this paper, we report our experiences
and insights gained from implementing three data-intensive and compute-intensive tasks
that have different Characteristics from previous studies: a large-scale machine learning
computation, a physical simulation task, and a digital media processing task. We identify
desirable features and places to improve in the Map-Reduce model. Our goal is to better
understand such large-scale computation and data processing in order to design better
supports for them.

In this paper, we studied three data-intensive and compute-intensive applications


that have very different characteristics from previous reported Map-Reduce applications.
We find that although we can easily implement a semantically correct Map-Reduce program,
achieving good performance is tricky. For example, a computation that looks similar to word
counting at the first sight may turn out to have very different characteristics, such as the
number and variance of intermediate results, thus resulting in unexpected performance.
Learning from the application studies, we explore the design space for supporting data-
intensive and compute-intensive applications on data-centre-scale computer systems. We
find two directions are promising: (i) enhancing a job control system with a set of desirable
features; (ii) supporting flexible compos able components and including more optimization
supports in Map-Reduce System. We plan to investigate these directions in future work [5].

Distributed Machine Learning and Graph Processing with Sparse Matrices

It is cumbersome to write machine learning and graph algorithms in data-parallel


models such as MapReduce and Dryad. We observe that these algorithms are based on

7
matrix computations and, hence, are inefficient to implement with the restrictive
programming and communication interface of such frameworks. In this paper we show that
array-based languages such as R [2] are suitable for implementing complex algorithms and
can outperform current data parallel solutions. Since R is single threaded and does not scale
to large datasets, we have built Pronto, a distributed system that extends R and addresses
many of its limitations. Pronto efficiently shares sparse structured data can leverage multi-
cores, and dynamically partitions data to mitigate load imbalance. Our results show the
promise of this approach: many important machine learning and graph algorithms can be
expressed in a single framework and are substantially faster than those in Hadoop and Spark.
Pronto advocates the use of sparse matrix operations to simplify the implementation of
machine learning and graph algorithms in a cluster. Pronto uses distributed arrays for
structured processing, efficiently uses multi-cores, and dynamically partitions data to reduce
load imbalance. Our experience shows that pronto is a flexible computation model that can
be used to implement a variety of complex algorithms [6].

Using Bioinformatics Applications on the Cloud

Dealing with large genomic data on a limited computing resource has been an
inevitable challenge in life science. Bioinformatics applications have required high
performance computation capabilities for next-generation sequencing (NGS) data and the
human genome sequencing data with single nucleotide polymorphisms (SNPs). From 2008,
Cloud computing platforms have been widely adopted to deal with the large data sets with
parallel processing tools. MapReduce parallel programming framework is dominantly used
due to its fast and ancient performance for data processing on cloud clusters. This study
introduces various research projects regarding to reducing a data analysis time and
improving usability with their approaches. Hadoop implementations and work ow toolkits
are focused on address parallel data processing tools and easy-to-use environments

These days, individual research laboratory is able to generate terabytes of data (or
even larger), which is no surprises to new sequencing technologies in genomic research.
High performance computation environments keep improving on processing large-scale data
at low cost. The combination of MapReduce and cloud computing facilitates fast and
efficient parallel processing on the virtual environment for terabyte-scale data analysis in
bioinformatics, if the analysis consists of embarrassingly parallel problems. MapReduce
framework is suitable for the simple and dividable tasks such as read alignment, sequence

8
search and image recognition. Easy-to-use methods and user-friendly cloud platforms have
been provided to researchers so that they can easily have ac- cess to the cloud with their
large data sets uploaded on the cloud in a secure manner. Scientific work ow may focus on
improving data transfer and handling tasks regarding these usability problems. More
challenges are expected to deal with data storage and analysis since it grows at unprecen-
dented scales [7].

Comparison of Distributed Data-Parallelization Patterns for Big Data Analysis: A


Bioinformatics Case Study

As a distributed data-parallelization (DDP) pattern, MapReduce has been adopted


by many new big data analysis tools to achieve good scalability and performance in Cluster
or Cloud environments. This paper explores how two binary DDP patterns, i.e., Co Group
and Match, could also be used in these tools. We re-implemented an existing bioinformatics
tool, called Cloudburst, with three different DDP pattern combinations. We identify two
factors, namely, input data balancing and value sparseness, which could greatly affect the
performances using different DDP patterns. Our experiments show: (i) a simple DDP pattern
switch could speed up performance by almost two times; (ii) the identified factors can
explain the differences wen the “big data” era, it is very popular and effective to use DDP
patterns in order to achieve scalability and parallelization. These DDP patterns also bring
challenges on which pattern or pattern combination is the best for a certain tool. This paper
demonstrates different DDP patterns could have a great impact on the performances of the
same tool. We find that although MapReduce can be used for wider range of applications
with either one or two input datasets, it is not always the best choice in terms of application
complexity and performance. To understand the differences, we identified two affecting
factors, namely input data balancing and value sparseness, on their performance differences.
The feasibility of these two factors is verified through experiments. We believe many tools
in bioinformatics and other domains have a similar logic with CloudBurst as they need to
match two input datasets, and therefore could also benefit from our findings. For future
work, we plan to investigate more tools that are suitable for multiple DDP patterns and their
performances on other DDP engines like Hadoop, which will generalize our findings. We
will also study how to utilize the identified factors to automatically Select the best DDP
pattern combination from multiple available ones [8].

9
Compressed Nonnegative Matrix Factorization Is Fast and Accurate

Nonnegative matrix factorization (NMF) has an established reputation as a useful


data analysis technique in numerous applications. However, its usage in practical situations
is undergoing challenges in recent years. The fundamental factor to this is the increasingly
growing size of the datasets available and needed in the information sciences. To address
this, in this work we propose to use structured random compression, that is, random
projections that exploit the data structure, for two NMF variants: classical and separable. In
separable NMF (SNMF) the Left factors are a subset of the columns of the input matrix. We
present suitable formulations for each problem, dealing with different representative
algorithms within each one. We show that the resulting compressed techniques are faster
than their uncompressed variants, vastly reduce memory demands, and do not encompass
any significant deterioration in performance. The proposed structured random projections
for SNMF allow dealing with arbitrarily shaped large matrices, beyond the standard limit of
tall-and-skinny matrices, granting access to very efficient computations in this general
setting. We accompany the algorithmic presentation with theoretical foundations and
numerous and diverse examples, showing the suitability of the proposed approaches.

In this work we proposed to use structured random projections for NMF and SNMF.
For NMF, we presented formulations for three popular techniques, namely, multiplicative
updates, active set method for nonnegative least squares and ADMM. For SNMF, we
presented a general technique that can be used with any algorithm. In all cases, we showed
that the resulting compressed techniques are faster than their uncompressed variants and, at
the same time; do not introduce significant errors in the final result. There are in the literature
very efficient SNMF algorithms for tall-and-skinny matrices. Interestingly, the use of
structured random projections allows computing SNMF for arbitrarily large matrices,
granting access to very efficient computations in the general setting. As a by-product, we
also propose an algorithmic solution for Computing structured random projections of
extremely large matrices (i.e., matrices so large that even after compression they do not fit
in main memory). This is useful as a general tool for computing many different matrix
decompositions, such as the singular value decomposition, for example. We are currently
investigating the problem of replacing the Frobenius norm with and Norm in our compressed
variants of NMF and SNMF. In this setting, the fast Cauchy transform is a suitable
alternative to structured random projections [9].

10
CHAPTER 3
SYSTEM ANALYSIS AND DESIGN
3.1 EXISTING SYSTEM
Intermediate data are shuffled according to a hash function in Hadoop, which would
lead to large network traffic because it ignores network topology and data size associated
with each key. To tackle this problem incurred by the traffic-oblivious partition scheme, we
take into account of both task locations and data size associated with each key in this paper.
By assigning keys with larger data size to reduce tasks closer to map tasks, network traffic
can be significantly reduced.

To further reduce network traffic within a MapReduce job, we consider to


aggregate data with the same keys before sending them to remote reduce tasks. Although a
similar function, called combiner, has been already adopted by Hadoop, it operates
immediately after a map task solely for its generated data, failing to exploit the data
aggregation opportunities among multiple tasks on different machines.

Disadvantages:

 Traditionally, A hash function is used to partition intermediate data among reduce


tasks, which, however, is not traffic-efficient because network topology and data size
associated with each key are not taken into consideration.
 It leads to large network traffic because it ignores network topology and data size
associated with each key.
 Network traffic can be significantly reduced.

3.2 PROPOSED SYSTEM

In this paper, we jointly consider data partition and aggregation for a Map Reduce
job with an objective that is to minimize the total network traffic. In particular, we propose
a distributed algorithm for big data applications by decomposing the original large-scale
problem into several sub problems that can be solved in parallel. Moreover, an online
algorithm is designed to deal with the data partition and aggregation in a dynamic manner.
Finally, extensive simulation results demonstrate that our proposals can significantly reduce
network traffic cost in both offline and online cases.

11
Advantages:

 Each aggregator can reduce merged traffic from multiple map tasks. It is designed
to adjust data partition and aggregation in a dynamic manner.
 It can significantly reduce network traffic cost in both offline and online cases.

12
CHAPTER 4
SYSTEM REQUIREMENTS & SPECIFICATIONS
4.1. INTRODUCTION OF TECHNOLOGIES USED
About Java:

Initially the language was called as “oak” but it was renamed as “java” in 1995.The
primary motivation of this language was the need for a platform-independent (i.e.,
architecture neutral) language that could be used to create software to be embedded in
various consumer electronic devices. Finally, Java is to Internet Programming where c was
to System Programming.

Java is a programmer’s language


Java is cohesive and consistent
Except for those constraint imposed by the Internet environment. Java gives
the programmer, full control
Importance of Java to the Internet

Java has had a profound effect on the Internet. This is because; java expands the
Universe of objects that can move about freely in Cyberspace. In a network, two categories
of objects are transmitted between the server and the personal computer. They are passive
information and Dynamic active programs. in the areas of Security and probability. But
Java addresses these concerns and by doing so, has opened the door to an exciting new
form of program called the Applet.

Applications and applets

An application is a program that runs on our computer under the operating system
of that computer. It is more or less like one creating using C or C++. Java’s ability to create
Applets makes it important. An Applet I san application, designed to be transmitted over the
Internet and executed by a Java-compatible web browser. An applet I actually a tiny Java
program, dynamically downloaded across the network, just like an image. But the difference
is, it is an intelligent program, not just a media file.

Java Architecture

13
Java architecture provides a portable, robust, high performing environment for
development. Java provides portability by compiling the byte codes for the Java Virtual
Machine, which is then interpreted on each platform by the run-time environment. Java is a
dynamic system, able to load code when needed from a machine in the same room or across
the planet.

Compilation of code

When you compile the code, the Java compiler creates machine code (called byte
code) for a hypothetical machine called Java Virtual Machine (JVM). The JVM is supposed
to executed the byte code. The JVM is created for the overcoming the issue of probability.
The code is written and compiled for one machine and interpreted on all machines. This
machine is called Java Virtual Machine.

Compiling and interpreting java source code

Java
Pc Java Byte interpreter
compiler code

Macintosh Java
compiler Platform interpreterm
Source independ acintosh
code ent
SPARC Java
Compiler interpreter(
SPARC)

Fig. 4.1: Compiling and interpreting java source code

During run-time the Java interpreter tricks the byte code file into thinking that it is
running on a Java Virtual Machine. Fig. 4.1: depicts the compiling and interpretation in java.
In reality this could be an Intel Pentium windows 95 or sun SPARCstation running Solaris
or Apple Macintosh running system and all could receive code from any computer through
internet and run the Applets.

Simple:

14
Java was designed to be easy for the Professional programmer to learn and to use
effectively. If you are an experienced C++ Programmer. Learning Java will orient features
of C++. Most of the confusing concepts from C++ are either left out of Java or implemented
in a cleaner, more approachable manner. In Java there are a small number of clearly defined
ways to accomplish a given task.

Object oriented
Java was not designed to be source-code compatible with any other language. This
allowed the Java team the freedom to design with a blank state. One outcome of this was a
clean usable, pragmatic approach to objects. The object model in Java is simple and easy to
extend, while simple types, such as integers, are kept as high-performance non-objects.

Robust
The multi-platform environment of the web places extraordinary demands on a
program, because the program must execute reliably in a variety of systems. The ability to
create robust programs. Was given a high priority in the design of Java. Java is strictly typed
language; it checks your code at compile time and runtime.

Java virtually eliminates the problems of memory management and deal location,
which is completely automatic. In a well-written Java program, all run-time errors can and
should be managed by your program.

AWT And Swings:


AWT:
Graphical User Interface:

The user interface is that part of a program that interacts with the user of the
program. GUI is a type of user interface that allows users to interact with electronic devices
with images rather than text commands. A class library is provided by the Java programming
language which is known as Abstract Window Toolkit (AWT) for writing graphical
programs. The Abstract Window Toolkit (AWT) contains several graphical widgets which
can be added and positioned to the display area with a layout manager.

As the Java programming language, the AWT is not platform-independent. AWT


uses system peers object for constructing graphical widgets. A common set of tools is
provided by the AWT for graphical user interface design. The implementation of the user

15
interface elements provided by the AWT is done using every platform's native GUI toolkit.
One of the AWT's significance is that the look and feel of each platform can be preserved.

Components:
A graphical user interface is built of graphical elements called components.
A component is an object having a graphical representation that can be displayed on the
screen and that can interact with the user. Components allow the user to interact with the
program and provide the input to the program. In the AWT, all user interface components
are instances of class Component or one of its subtypes. Typical components include such
items as buttons, scrollbars, and text fields.

Types of Components:

Fig. 4.2: Types of Components

Before proceeding ahead, first we need to know what containers are. Fig. 4.2: gives
information about the types of components present in AWT package and their further
division

Containers:

Components do not stand alone, but rather are found within containers. In order to
make components visible, we need to add all components to the container. Containers
contain and control the layout of components.

16
In the AWT, all containers are instances of class Container or one of its subtypes.
Components must fit completely within the container that contains them. For adding
components to the container, we will use add() method.

Types of containers:

Fig. 4.3: Types of Containers

Different types of containers are shown in the Fig. 4.3: each of them is explained
as follows in a detailed manner.

Basic GUI Logic: The GUI application or applet is created in three steps. These are:

 Add components to Container objects to make your GUI.


 Then you need to setup event handlers for the user interaction with GUI.
 Explicitly display the GUI for application.

A new thread is started by the interpreter for user interaction when an AWT GUI is
displayed. When any event is received by this new thread such as click of a mouse, pressing
of key etc then one of the event handlers is called by the new thread set up for GUI. One
important point to note here is that the event handler code is executed within the thread.

Creating a Frame:
Method1:
In the first method we will be creating frame by extending Frame class which is
defined in java.awt package. Following program demonstrate the creation of a frame.

17
import java.awt.*;
public class FrameDemo1 extends Frame
{
FrameDemo1()
{
setTitle("Label Frame");
setVisible(true);
setSize(500,500);
}
public static void main(String[] args)
{
new FrameDemo1 ();
}
}

In the above program we are using three methods:

setTitle: For setting the title of the frame we will use this method. It takes String as
an argument which will be the title name.

SetVisible: For making our frame visible we will use this method. This method
takes Boolean value as an argument. If we are passing true then window will be visible
otherwise window will not be visible.

SetSize: For setting the size of the window we will use this method. The first
argument is width of the frame and second argument is height of the frame.

Method 2:

In this method we will be creating the Frame class instance for creating frame
window. Following program demonstrate Method2.

import java.awt.*;

public class FrameDemo2


{
public static void main(String[] args)
{

18
Frame f = new Frame();
f.setTitle("My first frame");
f.setVisible(true);
f.setSize(500,500);
}
}

Types of Components:

1) Labels:

This is the simplest component of Java Abstract Window Toolkit. This component
is generally used to show the text or string in your application and label never perform any
type of action.

Label l1 = new Label("One");

Label l2 = new Label("Two");

Label l3 = new Label("Three",Label.CENTER);

In the above three lines we have created three labels with the name “one, two, three”.
In the third label we are passing two arguments. Second argument is the justification of the
label. Now after creating components we will be adding it to the container.

add(l1);

add(l2);

add(l3);

We can set or change the text in a label by using the setText( ) method. You can
obtain the current label by calling getText( ). These methods are shown here:

void setText(String str)

String getText( )

2) Buttons:

This is the component of Java Abstract Window Toolkit and is used to trigger
actions and other events required for your application.

19
The syntax of defining the button is as follows:

Button l1 = new Button("One");

Button l2 = new Button("Two");

Button l3 = new Button("Three");

We can change the Button's label or get the label's text by using the
Button.setLabel(String) and Button.getLabel() method.

3) CheckBox:

A check box is a control that is used to turn an option on or off. It consists of a


small box that can either contain a check mark or not. There is a label associated with each
check box that describes what option the box represents. You change the state of a check
box by clicking on it. The syntax of the definition of Checkbox is as follows :

Checkbox Win98 = new Checkbox("Windows 98/XP", null, true);

Checkbox winNT = new Checkbox("Windows NT/2000");

Checkbox solaris = new Checkbox("Solaris");

Checkbox mac = new Checkbox("MacOS");

To retrieve the current state of a check box, call getState( ). To set its state, call
setState( ). You can obtain the current label associated with a check box by calling
getLabel(). To set the label, call setLabel( ). These methods are as follows:
boolean getState( )
void setState(boolean on)
String getLabel( )
void setLabel(String str)
Here, if on is true, the box is checked. If it is false, the box is cleared. The string
passed in str becomes the new label associated with the invoking check box.

4) Radio Button:
This is the special case of the Checkbox component of Java AWT package. This is
used as a group of checkboxes which group name is same. Only one Checkbox from a
Checkbox Group can be selected at a time. Syntax for creating radio buttons is as follows:

CheckboxGroup cbg = new CheckboxGroup();

20
Checkbox Win98 = new Checkbox("Windows 98/XP", cbg , true);

Checkbox winNT = new Checkbox("Windows NT/2000",cbg, false);

Checkbox solaris = new Checkbox("Solaris",cbg, false);

Checkbox mac = new Checkbox("MacOS",cbg, false);

For Radio Button we will be using CheckBox class. The only difference in
Checkboxes and radio button is in Check boxes we will specify null for checkboxgroup but
whereas in radio button we will be specifiying the checkboxgroup object in the second
parameter.

5) Choice:

The Choice class is used to create a pop-up list of items from which the user may
choose. Thus, a Choice control is a form of menu. Syntax for creating choice is as follows:

Choice os = new Choice();

/* adding items to choice */

os.add("Windows 98/XP");

os.add("Windows NT/2000");

os.add("Solaris");

os.add("MacOS");

We will be creating choice with the help of Choice class. Pop up list will be creating with
the creation of object, but it will not have any items. For adding items we will be using add()
method defined in Choice class.

To determine which item is currently selected, you may call either getSelectedItem( ) or
getSelectedIndex( ). These methods are shown here:

String getSelectedItem( )
int getSelectedIndex( )
The getSelectedItem( ) method returns a string containing the name of the item.
getSelectedIndex( ) returns the index of the item. The first item is at index 0. By default,
the first item added to the list is selected.

21
6) List:

List class is also same as choice but the only difference in list and choice is, in
choice user can select only one item whereas in List user can select more than one item.
Syntax for creating list is as follows:

List os = new List(4, true);

First argument in the List constructor specifies the number of items allowed in the
list. Second argument specifies whether multiple selections are allowed or not.

/* Adding items to the list */

os.add("Windows 98/XP");

os.add("Windows NT/2000");

os.add("Solaris");

os.add("MacOS");

In list we can retrieve the items which are selected by the users. In multiple
selection user will be selecting multiple values for retrieving all the values we have a method
called getSelectedValues() whose return type is string array. For retrieving single value
again we can use the method defined in Choice i.e. getSelectedItem().

7)TextField:

Text fields allow the user to enter strings and to edit the text using the arrow keys,
cut and paste keys. TextField is a subclass of TextComponent. Syntax for creating list is as
follows:

TextField tf1 = new TextField(25);

TextField tf2 = new TextField();

In the first text field we are specifying the size of the text field and the second text
field is created with the default value. TextField (and its superclass TextComponent)
provides several methods that allow you to utilize a text field. To obtain the string currently
contained in the text field, call getText( ). To set the text, call setText( ). These methods are
as follows:

String getText( )

void setText(String str)

22
We can control whether the contents of a text field may be modified by the user by
calling setEditable( ). You can determine editability by calling isEditable( ). These methods
are shown here:

boolean isEditable( )

void setEditable(boolean canEdit)

isEditable( ) returns true if the text may be changed and false if not. In setEditable( ), if
canEdit is true, the text may be changed. If it is false, the text cannot be altered.

There may be times when we will want the user to enter text that is not displayed,
such as a password. We can disable the echoing of the characters as they are typed by calling
setEchoChar( ).

8)TextArea:

TextArea is a multiple line editor. Syntax for creating list is as follows:

TextArea area = new TextArea(20,30);

Above code will create one text area with 20 rows and 30 columns. TextArea is a
subclass of TextComponent. Therefore, it supports the getText( ), setText( ),
getSelectedText( ), select( ), isEditable( ), and setEditable( ) methods described in the
preceding section.

TextArea adds the following methods:

void append(String str)

void insert(String str, int index)

void replaceRange(String str, int startIndex, int endIndex)

The append( ) method appends the string specified by str to the end of the current
text. insert( ) inserts the string passed in str at the specified index. To replace text, call
replaceRange( ). It replaces the characters from startIndex to endIndex–1, with the
replacement text passed in str.

Layout Managers:

A layout manager automatically arranges controls within a window by using some


type of algorithm. Each Container object has a layout manager associated with it. A layout

23
manager is an instance of any class that implements the LayoutManager interface. The
layout manager is set by the setLayout( ) method. If no call to setLayout( ) is made, then
the default layout manager is used. Whenever a container is resized (or sized for the first
time), the layout manager is used to position each of the components within it. The
setLayout( ) method has the following general form:

void setLayout(LayoutManager layoutObj)

Here, layoutObj is a reference to the desired layout manager. If you wish to disable
the layout manager and position components manually, pass null for layoutObj. If we do
this, you will need to determine the shape and position of each component manually, using
the setBounds( ) method defined by Component.

Void setBounds(int x , int y , int width, int length)

In which first two arguments are the x and y axis. Third argument is width and
fourth argument is height of the component.

Java has several predefined LayoutManager classes, several of which are


described next. You can use the layout manager that best fits your application.

FlowLayout:

FlowLayout is the default layout manager. This is the layout manager that the
preceding examples have used. FlowLayout implements a simple layout style, which is
similar to how words flow in a text editor. Components are laid out from the upper-left
corner, left to right and top to bottom. When no more components fit on a line, the next one
appears on the next line. A small space is left between each component, above and below,
as well as left and right. Here are the constructors for FlowLayout:

FlowLayout( )

FlowLayout(int how)

FlowLayout(int how, int horz, int vert)

The first form creates the default layout, which centers components and leaves five
pixels of space between each component. The second form lets you specify how each line is
aligned. Valid values for how are as follows:

24
FlowLayout.LEFT

FlowLayout.CENTER

FlowLayout.RIGHT

These values specify left, center, and right alignment, respectively. The third form
allows you to specify the horizontal and vertical space left between components in horz and
vert, respectively.

BorderLayout:

The BorderLayout class implements a common layout style for top-level windows.
It has four narrow, fixed-width components at the edges and one large area in the center.
The four sides are referred to as north, south, east, and west. The middle area is called the
center. Here are the constructors defined by BorderLayout:

BorderLayout( )

BorderLayout(int horz, int vert)

The first form creates a default border layout. The second allows you to specify the
horizontal and vertical space left between components in horz and vert, respectively.
BorderLayout defines the following constants that specify the regions:

BorderLayout.CENTER BorderLayout.SOUTH

BorderLayout.EAST BorderLayout.WEST

BorderLayout.NORTH

When adding components, you will use these constants with the following form of add( ),
which is defined by Container:

void add(Component compObj, Object region);

Here, compObj is the component to be added, and region specifies where the component
will be added.

GridLayout:

GridLayout lays out components in a two-dimensional grid. When you instantiate


a GridLayout, you define the number of rows and columns.

25
The constructors supported by GridLayout are shown here:

GridLayout( )

GridLayout(int numRows, int numColumns )

GridLayout(int numRows, int numColumns, int horz, int vert)

The first form creates a single-column grid layout. The second form creates a grid
layout with the specified number of rows and columns. The third form allows you to specify
the horizontal and vertical space left between components in horz and vert, respectively.
Either numRows or numColumns can be zero. Specifying numRows as zero allows for
unlimited-length columns. Specifying numColumns as zero allows for unlimited-length
rows.

Swings:

Swing is important to develop Java programs with a graphical user interface (GUI).
There are many components which are used for the building of GUI in Swing. The Swing
Toolkit consists of many components for the building of GUI. These components are also
helpful in providing interactivity to Java applications. Following are components which are
included in Swing toolkit:

 list controls
 buttons
 labels
 tree controls
 table controls

All AWT flexible components can be handled by the Java Swing. Swing toolkit
contains far more components than the simple component toolkit. It is unique to any other
toolkit in the way that it supports integrated internationalization, a highly customizable text
package, rich undo support etc. Not only this you can also create your own look and feel
using Swing other than the ones that are supported by it. The customized look and feel can
be created using Synth which is specially designed. Not to forget that Swing also contains
the basic user interface such as customizable painting, event handling, drag and drop etc.

26
The Java Foundation Classes (JFC) which supports many more features important
to a GUI program comprises of Swing as well. The features which are supported by Java
Foundation Classes (JFC) are the ability to create a program that can work in different
languages, the ability to add rich graphics functionality etc.

There are several components contained in Swing toolkit such as check boxes,
buttons, tables, text etc. Some very simple components also provide sophisticated
functionality. For instance, text fields provide formatted text input or password field
behaviour. Furthermore, the file browsers and dialogs can be used according to one's need
and can even be customized.
Difference between Swings and AWT:
AWT stands for Abstract Window Toolkit. It is a platform-dependent API to
develop GUI (Graphical User Interface) or window-based applications in Java. It was
developed by heavily Sun Microsystems in 1995. Swing is a lightweight Java graphical user
interface (GUI) that is used to create various applications. Swing has platform-independent
components. Table. 4.1: gives us the information about the Swings and AWT

Swings AWT

Swings are the light weight components.


AWTs are the heavy weight components.

Swings are developed by using pure


AWTs are developed by using C and C++.
java language.

We can have different look and feel in


This feature is not available in awt.
swings.

Swing has many advanced features like


This are not available in awt.
JTabel, JTabbedPane and JTree

Table. 4.1: Difference between Swings and AWT

27
Java Swing Class Hierarchy:

Java Swing tutorial is a part of Java Foundation Classes (JFC) that is used to create
window-based applications. Fig. 4.4: depicts that structure. It is built on the top of AWT
(Abstract Windowing Toolkit) API and entirely written in java. Unlike AWT, Java Swing
provides platform-independent and lightweight components.

Fig. 4.4: Java swing class hierarchy

Swing Components:

All the components which are supported in AWT same components are also
supported in Swings with a slight change in their class name. Table. 4.2: had different AWT
components and Swing Components. Both of them as same use cases.

AWT Components Swing Components

Label JLabel

TextField JTextField

TextArea JTextArea

Choice JComboBox

Checkbox JCheckBox

List JList

Button JButton

- JRadioButton

28
- JPasswordField

- JTable

- JTree

- JTabbedPane

MenuBar JMenuBar

Menu JMenu

MenuItem JMenuItem

- JFileChooser

- JOptionPane

Table. 4.2: Swing & AWT Components

JTabbedPane class:

The JTabbedPane container allows many panels to occupy the same area of the
interface, and the user may select which to show by clicking on a tab.

Constructor

JTabbedPane tp = new JTabbedPane();

Adding tabs to the JTabbedPane

Add tabs to a tabbed pane by calling addTab and passing it a String title and an
instance of a class which should be called when we pressed a tab. That class should be a
subclass of JPanel.

addTab(“String”,instance);

Example program:

import javax.swing.*;

import java.awt.*;

public class TabbedPaneDemo extends JFrame

29
{

TabbedPaneDemo()

setLayout(new FlowLayout(FlowLayout.LEFT));

setTitle("Tabbed Demo");

setVisible(true);

setSize(500,500);

setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);

JTabbedPane pane = new JTabbedPane();

pane.addTab("Countries",new Count());

pane.addTab("Cities",new Cit());

add(pane);

public static void main(String a[])

new TabbedPaneDemo();

class Count extends JPanel

Count()

JButton b1 = new JButton("India");

JButton b2 = new JButton("SriLanka");

JButton b3 = new JButton("Australia");

add(b1);

30
add(b2);

add(b3);

class Cit extends JPanel

Cit()

JCheckBox cb1 = new JCheckBox("Hyderabad");

JCheckBox cb2 = new JCheckBox("Banglore");

JCheckBox cb3 = new JCheckBox("Pune");

add(cb1);

add(cb2);

add(cb3);

JMenuBar, JMenu, JMenuItem

A top-level window can have a menu bar associated with it. A menu bar displays a
list of top-level menu choices. Each choice is associated with a drop-down menu. This
concept is implemented in Java by the following classes: JMenuBar, JMenu, and
JMenuItem. In general, a menu bar contains one or more JMenu objects. Each JMenu object
contains a list of JMenuItem objects. Each JMenuItem object represents something that can
be selected by the user. To create a menu bar, first create an instance of JMenuBar. This
class only defines the default constructor. Next, create instances of JMenu that will define
the selections displayed on the bar. Following are the constructors for Menu:

JMenu( )
JMenu(String optionName)
JMenuItem( )

31
4.2 DESIGN
4.2.1 Uml Diagrams

The Unified Modeling Language allows the software engineer to express an analysis
model using the modeling notation that is governed by a set of syntactic semantic and
pragmatic rules.

A UML system is represented using five different views that describe the system
from distinctly different perspective. Each view is defined by a set of diagram, which is as
follows.

 User Model View


i. This view represents the system from the users perspective.
ii. The analysis representation describes a usage scenario from the end-users
perspective.

 Structural Model view


i. In this model the data and functionality are arrived from inside the system.
ii. This model view models the static structures.

 Behavioral Model View


It represents the dynamic of behavioral as parts of the system, depicting the
interactions of collection between various structural elements described in
the user model and structural model view.

 Implementation Model View


In this the structural and behavioural as parts of the system are represented
as they are to be built.

 Environmental Model View


In these the structural and behavioural aspects of the environment in which
the system is to be implemented are represented.

4.2.2 Class Diagram:

The class diagram is the main building block of object-oriented modeling. It is used
both for general conceptual modeling of the systematic of the application, and for detailed
modeling translating the models into programming code. The Fig. 4.5: Class diagram gives

32
information about classes involved. Class diagrams can also be used for data modeling. The
classes in a class diagram represent both the main objects, interactions in the application and
the classes to be programmed.

A class with three sections.

In the diagram, classes are represented with boxes which contain three parts:

 The upper part holds the name of the class


 The middle part contains the attributes of the class
 The bottom part gives the methods or operations the class can take or undertake.

Main
DBCon
msg : String
msg : String
a : String Chart Reducer2
lat : String
reducer : String agg : int
lon : String
lat : double no agg : int start()
reducer : String
lon : double title : String append()
dist : double
getCon()
addReducer() createChart()
setReducer()
prepareStatement()
setDistance()
executeUpdate() Reducer1
add()

start()
append()

WordFrequencyMapper DocumentReducer
DefineReducer data : Text document_count : int
WordFrequency wc : string
reducer : String pattern : Pattern
lat : String path : String wdcounter : String
matcher : matcher
lon : String file : String
start() reduce()
addReducer() waitForCompletion() toString()
getInputSplit()
getPath() split()

Fig. 4.5: Class diagram

4.2.3 Use Case Diagram:


A use case diagram at its simplest is a representation of a user's interaction with the
system and depicting the specifications of a use case. A use case diagram can portray the
different types of users of a system and the various ways that they interact with the system.
This type of diagram is typically used in conjunction with the textual use case and will often
be accompanied by other types of diagrams as well. The purpose of a use case diagram in
UML is to demonstrate the different ways that a user might interact with a system. A use

33
case diagram doesn't go into a lot of detail for example, don't expect it to model the order in
which steps are performed. Instead, a proper use case diagram depicts a high-level overview
of the relationship between use cases, actors, and systems. Fig. 4.6: gives information about
the use case diagram related to our project.

Define reducer locations

Upload the documents

User

Start the map reduce aggregation

Caliculate the network traffic graph

Fig. 4.6: Use case diagram

4.2.4 Sequence Diagram:

A sequence diagram is a kind of interaction diagram that shows how processes


operate with one another and in what order. It is a construct of a Message Sequence Chart.
A sequence diagram shows object interactions arranged in time sequence. It depicts the
objects and classes involved in the scenario and the sequence of messages exchanged
between the objects needed to carry out the functionality of the scenario. Sequence diagrams
are typically associated with use case realizations in the Logical View of the system under
development. Sequence diagrams are sometimes called event diagrams, event scenarios, and
timing diagrams.
Sequence diagram simply depicts interaction between objects in a sequential order
i.e., the order in which these interactions take place. Fig. 4.7: gives details about the

34
sequence diagram. We can also use the terms event diagrams or event scenarios to refer to
a sequence diagram. Sequence diagrams describe how and in what order the objects in a
system function. These diagrams are widely used by businessmen and software developers
to document and understand requirements for new and existing systems.

Mapper Reducer Upload MapReduce Graph


homepage aggregation

add reducer location details

Enter reducer name,lattitude,longitude values to save

Reducer details added

Run the reducer application

Upload the data into our application

inputs documents loaded


Click on start mapreduce application

it is started and produce the processing time

after processing aggregate data will be displayed

details will be updated at Reducer node

click on network traffic cost graph

graph will be displayed between processing time & technique

Fig. 4.7: Sequence diagram

4.2.5 Collaborative Diagram

A collaboration diagram describes interactions among objects in terms of sequenced


messages. Fig. 4.8: depicts the collaboration diagram for our project. Collaboration diagrams
represent a combination of information taken from class, sequence, and use case diagrams
describing both the static structure and dynamic behaviour of a system.
Collaboration diagrams are created by first identifying the structural elements
required to carry out the functionality of an interaction. A model is then built using the
relationships between those elements. Several vendors offer software for creating and
editing collaboration diagrams. collaboration diagram is used to show the relationship
between the objects in a system. Both the sequence and the collaboration diagrams represent
the same information but differently. Instead of showing the flow of messages, it depicts the
architecture of the object residing in the system as it is based on object-oriented
programming. An object consists of several features.

35
4: Run the reducer application
10: details will be updated at Reducer node

1: add reducer location details


2: Enter reducer name,lattitude,longitude values to save
Reducer

5: Upload the data into our application


Mapper homepage 3: Reducer details added

6: inputs documents loaded Upload


8: it is started and produce the processing time
9: after processing aggregate data will be displayed

7: Click on start mapreduce application


11: click on network traffic cost graph
MapReduce Graph
aggregation
12: graph will be displayed between processing time & technique

Fig. 4.8: Collaboration diagram

4.2.6 COMPONENT DIAGRAM


In the Unified Modelling Language, a component diagram depicts how components
are wired together to form larger components and or software systems. They are used to
illustrate the structure of arbitrarily complex systems

Components are wired together by using an assembly connector to connect the


required interface of one component with the provided interface of another component in
the system. Fig. 4.9: had components involved in this thesis. This illustrates the service
consumer - service provider relationship between the two components.

define the
reducer location

Run the reducer


applications

Mapper
Upload the
document

Start the
aggregation

Displays the
cost graph

Fig. 4.9: Component diagram

36
4.2.7 DEPLOYMENT DIAGRAM

A deployment diagram in the Unified Modeling Language models the physical


deployment of artifacts on nodes. Fig. 4.10: shows deployment diagram. To describe a web
site, for example, a deployment diagram would show what hardware components ("nodes")
exist (e.g., a web server, an application server, and a database server), what software
components ("artifacts") run on each node (e.g., web application, database), and how the
different pieces are connected (e.g., JDBC, REST, RMI).

The nodes appear as boxes, and the artifacts allocated to each node appear as
rectangles within the boxes. Nodes may have sub nodes, which appear as nested boxes. A
single node in a deployment diagram may conceptually represent multiple physical nodes,
such as a cluster of database servers.

Define reducer
location

Run reducer
applications

Upload the
document

Mapper

Start the
aggregation

Produce network
cost graph

Fig. 4.10: Deployment diagram

4.2.8 Activity Diagram:

Activity diagram Activity diagram is another important diagram in UML to describe


dynamic aspects of the system. It is basically a flow chart to represent the flow form one
activity to another activity. The activity can be described as an operation of the system in
our context. Fig. 4.11: depicts activities involved.

So, the control flow is drawn from one operation to another. This flow can be
sequential, branched or concurrent.

37
Define Reducer location
details

Yes No

Run the reducer


application

Loading the input


data

Start the aggregation

Produce time chart


graph

Fig. 4.11: Activity Diagram

4.2.9 Data Flow Diagram:

Data flow diagrams illustrate how data is processed by a system in terms of inputs
and outputs.

Data flow diagrams can be used to provide a clear representation of any business
function. The technique starts with an overall picture of the business and continues by
analyzing each of the functional areas of interest. This analysis can be carried out in
precisely the level of detail required. The technique exploits a method called top-down
expansion to conduct the analysis in a targeted way.

As the name suggests, Data Flow Diagram (DFD) is an illustration that explicates
the passage of information in a process. A DFD can be easily drawn using simple symbols.
Additionally, complicated processes can be easily automated by creating DFDs using easy-
to-use, free downloadable diagramming tools. A DFD is a model for constructing and
analyzing information processes.

38
4.3 MODULES
4.3.1 Mapper Module:

In mapper module, map tasks are launched in parallel to convert the


original input splits into intermediate data in a form of key/value pairs. These key/value
pairs are stored on local machine and organized into multiple data partitions. Map is a
user-defined function, which takes a series of key-value pairs and processes each one of
them to generate zero or more key-value pairs.

4.3.2 Aggregator Module:

In aggregator module, each aggregator can reduce merged traffic from


multiple map tasks.

4.3.3 Reducing Module:

In reducer module, each reduce task fetches its own share of data partitions
from all map tasks to generate the final result. The Reducer takes the grouped key-value
paired data as input and runs a Reducer function on each one of them. Here, the data can be
aggregated, filtered, and combined in a number of ways, and it requires a wide range of
processing. Once the execution is over, it gives zero or more key-value pairs to the final
step.

39
4.4 TEST CASES
Table. 4.3: gives information about different testcases present in our project which
with name, description, steps, status and priority.

Test
Test Test Test Steps Test
Test Case Case
Cas Case Priorit
Desc Statu
e Id Name Expecte y
Step Actual s
d
It defines If we Location Reducers
the doesn’t details details will
reducers provide will not be saved
Reducer particular latitude, be saved successfully
1 location location by longitude High High
details providing values
latitude &
longitude
values
Start the If we not Reducer Reducer
reducer run the don’t node will be
nodes ,and applicatio know started
2 Run all details n the
reducer will be updated High High
updated at details
reducer
node
Data will If we can’t We Input data
Upload be upload the can’t loaded
3 the input uploaded data reduce successfully High High
data from the
shuffle network
phase traffic
It If we not We After
Aggregat aggregates start the can’t processing
ion using all the aggregatio reduce the
4 Map partitioned n the aggregate High High
reduce data network data, it
traffic displays the
count result.
Displays If we can’t Nothing Graph will
Network the graph do any will be be displayed
5 traffic between aggragatio displaye using High High
cost processing n d aggregated/n
graph time & o aggregated
Technique data

Table. 4.3: Test cases

40
4.5 SYSTEM REQUIREMENTS
4.5.1 Hardware Requirements:

 Processor : Intel(R) Core(TM) i5-1035G1 CPU


 Speed : 1.19 Ghz.
 Hard Disk : 1 TB.
 Ram : 8 GB.
 System Type : 64-bit Operating System

4.5.2 SOFTWARE REQUIREMENTS:

 Operating system : Windows XP/10.


 Coding Language : JAVA 1.6.0_11
 Frontend : AWT, Swings
 Backend : MySQL
 Tools : Cygwin 3.3

41
4.6 TESTING

4.6.1 Implementation and Testing:

Implementation is one of the most important tasks in project is the phase in which
one has to be cautions because all the efforts undertaken during the project will be very
interactive. Implementation is the most crucial stage in achieving successful system and
giving the users confidence that the new system is workable and effective. Each program is
tested individually at the time of development using the sample data and has verified that
these programs link together in the way specified in the program specification. The computer
system and its environment are tested to the satisfaction of the user.

4.6.2 Implementation

The implementation phase is less creative than system design. It is primarily


concerned with user training, and file conversion. The system may be requiring extensive
user training. The initial parameters of the system should be modifies as a result of a
programming. A simple operating procedure is provided so that the user can understand the
different functions clearly and quickly. The different reports can be obtained either on the
inkjet or dot matrix printer, which is available at the disposal of the user. The proposed
system is very easy to implement. In general implementation is used to mean the process of
converting a new or revised system design into an operational one.

4.6.3 Testing

Testing is the process where the test data is prepared and is used for testing the
modules individually and later the validation given for the fields. Then the system testing
takes place which makes sure that all components of the system property functions as a unit.
The test data should be chosen such that it passed through all possible condition. Actually
testing is the state of implementation which aimed at ensuring that the system works
accurately and efficiently before the actual operation commence. The following is the
description of the testing strategies, which were carried out during the testing period.

4.6.4 System Testing

Testing has become an integral part of any system or project especially in the field
of information technology. The importance of testing is a method of justifying, if one is
ready to move further, be it to be check if one is capable to with stand the rigors of a

42
particular situation cannot be underplayed and that is why testing before development is so
critical. When the software is developed before it is given to user to user the software must
be tested whether it is solving the purpose for which it is developed. This testing involves
various types through which one can ensure the software is reliable. The program was tested
logically and pattern of execution of the program for a set of data are repeated. Thus the
code was exhaustively checked for all possible correct data and the outcomes were also
checked.

4.6.5 Module Testing

To locate errors, each module is tested individually. This enables us to detect error
and correct it without affecting any other modules. Whenever the program is not satisfying
the required function, it must be corrected to get the required result. Thus all the modules
are individually tested from bottom up starting with the smallest and lowest modules and
proceeding to the next level. Each module in the system is tested separately. For example
the job classification module is tested separately. This module is tested with different job
and its approximate execution time and the result of the test is compared with the results that
are prepared manually. The comparison shows that the results proposed system works
efficiently than the existing system. Each module in the system is tested separately. In this
system the resource classification and job scheduling modules are tested separately and their
corresponding results are obtained which reduces the process waiting time.

4.6.6 Integration Testing

After the module testing, the integration testing is applied. When linking the
modules there may be chance for errors to occur, these errors are corrected by using this
testing. In this system all modules are connected and tested. The testing results are very
correct. Thus, the mapping of jobs with resources is done correctly by the system.

4.6.7 Acceptance Testing

When that user fined no major problems with its accuracy, the system passers
through a final acceptance test. This test confirms that the system needs the original goals,
objectives and requirements established during analysis without actual execution which
elimination wastage of time and money acceptance tests on the shoulders of users and
management, it is finally acceptable and ready for the operation.

43
CHAPTER 5
SOURCE CODE
Main.java

package mapreduce;
import javax.swing.JFrame;
import javax.swing.JLabel;
import javax.swing.JButton;
import javax.swing.JPanel;
import java.awt.event.ActionEvent;
import java.awt.event.ActionListener;
import javax.swing.UIManager;
import java.awt.BorderLayout;
import java.awt.Dimension;
import java.awt.Color;
import java.awt.Font;
import javax.swing.JOptionPane;
import javax.swing.JFileChooser;
import java.io.File;
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.ArrayList;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.net.Socket;
import java.sql.Statement;
import java.sql.ResultSet;
import org.jfree.ui.RefineryUtilities;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.FileWriter;
public class Main extends JFrame
{
GradientPanel p1;
JPanel p2;
JLabel title;
JButton b1,b2,b3,b4,b5;
Font f1;
JFileChooser chooser;
File file;
static StringBuilder sb = new StringBuilder();
ArrayList<Location> location = new ArrayList<Location>();
public void remove(){
File remove = new File("output");
File list[] = remove.listFiles();
if(list != null){
for(int i=0;i<list.length;i++){
if(list[i] != null)

44
System.out.println(list[i].delete()+" delete =======");
}
}
if(remove != null)
remove.delete();
}
public Main(){
super("Map Reduce System");
p1 = new GradientPanel(600,200);
p1.setLayout(null);
f1 = new Font("Courier New",Font.BOLD,14);
p2 = new JPanel();
p2.setBackground(new Color(204, 110, 155));
title = new JLabel("<HTML><BODY><CENTER>ON TRAFFIC-AWARE
PARTITION AND AGGREGATION IN MAPREDUCE<BR/>FOR BIG DATA
APPLICATIONS</CENTER></BODY></HTML>");
title.setForeground(Color.white);
title.setFont(new Font("Times New ROMAN",Font.PLAIN,17));
p2.add(title);
chooser = new JFileChooser(new File("."));
chooser.setFileSelectionMode(JFileChooser.DIRECTORIES_ONLY);
JPanel pan3 = new JPanel();
b1 = new JButton("Define Reducer Location");
b1.setFont(f1);
b1.setBounds(220,50,250,30);
p1.add(b1);
b1.addActionListener(new ActionListener(){
public void actionPerformed(ActionEvent ae){
DefineReducer dr = new DefineReducer();
dr.setVisible(true);
dr.setSize(600,360);
dr.setLocationRelativeTo(null);
dr.setResizable(false);
}
});
b2 = new JButton("Upload Documents");
b2.setFont(f1);
b2.setBounds(220,100,250,30);
p1.add(b2);
b2.addActionListener(new ActionListener(){
public void actionPerformed(ActionEvent ae){
int option = chooser.showOpenDialog(Main.this);
if(option == chooser.APPROVE_OPTION){
sb.delete(0,sb.length());
file = chooser.getSelectedFile();
JOptionPane.showMessageDialog(Main.this,"Input
documents loaded");
}
}
});

45
b3 = new JButton("Start MapReduce Aggregation");
b3.setFont(f1);
b3.setBounds(220,150,250,30);
p1.add(b3);
b3.addActionListener(new ActionListener(){
public void actionPerformed(ActionEvent ae){
try{
remove();
long start = System.currentTimeMillis();
String a[] = {"test"};
WordFrequency.setInputPath(file.getPath());
WordFrequency.start(a);
Location loc = location.get(0);
Socket soc = null;
if(loc.getReducer().equals("Reducer1"))
soc = new Socket("localhost",2222);
if(loc.getReducer().equals("Reducer2"))
soc = new Socket("localhost",3333);
ObjectOutputStream out = new
ObjectOutputStream(soc.getOutputStream());
Object req[] = {"input",sb.toString()};
out.writeObject(req);
out.flush();
ObjectInputStream in = new
ObjectInputStream(soc.getInputStream());
Object res[] = (Object[])in.readObject();
String msg = (String)res[0];
long end = System.currentTimeMillis();
ViewResult vr = new ViewResult();
vr.setVisible(true);
vr.setSize(600,400);
if(msg.equals("output")){
String output = (String)res[1];
String arr[] = output.split("\n");
for(int i=0;i<arr.length;i++){
Object ar[] = arr[i].split("\t");
vr.dtm.addRow(ar);
}
}
FileWriter fw = new FileWriter("D:/agg.txt");
fw.write(""+(end-start));
fw.close();
JOptionPane.showMessageDialog(Main.this,"Processing Time "+(end-start));
}catch(Exception e){
e.printStackTrace();
}
}
});
b4 = new JButton("Network Traffic Cost Graph");
b4.setFont(f1);

46
b4.setBounds(220,200,250,30);
p1.add(b4);
b4.addActionListener(new ActionListener(){
public void actionPerformed(ActionEvent ae){
try{
BufferedReader br = new BufferedReader(new FileReader("D:/no_agg.txt"));
int noagg = Integer.parseInt(br.readLine().trim());
br.close();
br = new BufferedReader(new FileReader("D:/agg.txt"));
int agg = Integer.parseInt(br.readLine().trim());
br.close();
Chart chart1 = new Chart("Processing Time Chart",noagg,agg);
chart1.pack();
RefineryUtilities.centerFrameOnScreen(chart1);
chart1.setVisible(true);
}catch(Exception e){
e.printStackTrace();
}
}
});
b5 = new JButton("Exit");
b5.setFont(f1);
b5.setBounds(220,250,250,30);
p1.add(b5);
b5.addActionListener(new ActionListener(){
public void actionPerformed(ActionEvent ae){
System.exit(0);
}
});
getContentPane().add(p1,BorderLayout.CENTER);
getContentPane().add(p2,BorderLayout.NORTH);
}
public static void main(String a[])throws Exception{
UIManager.setLookAndFeel("com.sun.java.swing.plaf.nimbus.NimbusLookAndFe
el");
Main main = new Main();
main.setVisible(true);
main.setExtendedState(JFrame.MAXIMIZED_BOTH);
main.readReducerLoc();
}
public double distance(double lat1, double lon1, double lat2, double lon2, char unit){
double theta = lon1 - lon2;
double dist = Math.sin(deg2rad(lat1)) * Math.sin(deg2rad(lat2)) +
Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) * Math.cos(deg2rad(theta));
dist = Math.acos(dist);
dist = rad2deg(dist);
dist = dist * 60 * 1.1515;
if (unit == 'K') {
dist = dist * 1.609344;
} else if (unit == 'N') {

47
dist = dist * 0.8684;
}
return (dist);
}
public double deg2rad(double deg) {
return (deg * Math.PI / 180.0);
}
public double rad2deg(double rad) {
return (rad * 180.0 / Math.PI);
}
public void readReducerLoc(){
try{
location.clear();
Statement stmt = DBCon.getCon().createStatement();
ResultSet rs = stmt.executeQuery("select * from reducer");
while(rs.next()){
String reducer = rs.getString(1);
double lat = rs.getDouble(2);
double lon = rs.getDouble(3);
double dis = distance(17.4359786,78.4481956,lat,lon,'M');
Location loc = new Location();
loc.setReducer(reducer);
loc.setDistance(dis);
location.add(loc);
}
java.util.Collections.sort(location,new Location());
for(int i=0;i<location.size();i++){
Location loc = location.get(i);
System.out.println(loc.getReducer());
}
}catch(Exception e){
e.printStackTrace();
}
}
}

DefineReducer.java

package mapreduce;
import javax.swing.JFrame;
import javax.swing.JLabel;
import javax.swing.JTextField;
import javax.swing.JButton;
import javax.swing.JPanel;
import java.awt.event.ActionEvent;
import java.awt.event.ActionListener;
import javax.swing.UIManager;
import java.awt.BorderLayout;
import java.awt.Dimension;
import java.awt.Color;

48
import java.awt.Font;
import javax.swing.JComboBox;
import javax.swing.JOptionPane;
public class DefineReducer extends JFrame{
GradientPanel p1;
JPanel p2;
JLabel l1,l2,l3,l4;
JTextField tf1,tf2;
JButton b1;
JComboBox c1;
Font f1;
public DefineReducer(){
super("Define Reducer");
p1 = new GradientPanel(600,200);
p1.setLayout(null);
l1 = new JLabel("Define Reducer Location Screen");
l1.setFont(new Font("Courier New",Font.BOLD,18));
l1.setBounds(250,20,400,30);
p1.add(l1);
l2 = new JLabel("Reducer Name");
l2.setFont(f1);
l2.setBounds(200,60,100,30);
p1.add(l2);
c1 = new JComboBox();
c1.addItem("Reducer1");
c1.addItem("Reducer2");
c1.setFont(f1);
c1.setBounds(300,60,130,30);
p1.add(c1);
l3 = new JLabel("Latitude");
l3.setFont(f1);
l3.setBounds(200,110,100,30);
p1.add(l3);
tf1 = new JTextField(15);
tf1.setFont(f1);
tf1.setBounds(300,110,130,30);
p1.add(tf1);
l4 = new JLabel("Longitude");
l4.setFont(f1);
l4.setBounds(200,160,100,30);
p1.add(l4);
tf2 = new JTextField(15);
tf2.setFont(f1);
tf2.setBounds(300,160,130,30);
p1.add(tf2);
b1 = new JButton("Save Reducer");
b1.setFont(f1);
b1.setBounds(220,210,140,30);
p1.add(b1);
b1.addActionListener(new ActionListener(){

49
public void actionPerformed(ActionEvent ae){
login();
}
});
getContentPane().add(p1,BorderLayout.CENTER);
}
public void clear(){
tf1.setText("");
tf2.setText("");
}
public void login(){
String reducer = c1.getSelectedItem().toString().trim();
String lat = tf1.getText();
String lon = tf2.getText();
if(lat == null || lat.trim().length() <= 0){
JOptionPane.showMessageDialog(this,"Latitude must be enter");
tf1.requestFocus();
return;
}
if(lon == null || lon.trim().length() <= 0){
JOptionPane.showMessageDialog(this,"Longitude must be enter");
tf2.requestFocus();
return;
}
double lat1 = 0;
double lon1 = 0;
try{
lat1 = Double.parseDouble(lat.trim());
}catch(NumberFormatException nfe){
JOptionPane.showMessageDialog(this,"Latitude must be decimal value
only");
tf1.requestFocus();
return;
}
try{
lon1 = Double.parseDouble(lon.trim());
}catch(NumberFormatException nfe){
JOptionPane.showMessageDialog(this,"Longitude must be decimal value
only");
tf2.requestFocus();
return;
}
try{
String msg = DBCon.addReducer(reducer,lat.trim(),lon.trim());
if(msg.equals("success")){
JOptionPane.showMessageDialog(this,"Reducer details added");
setVisible(false);
}else{
JOptionPane.showMessageDialog(this,"Error in adding reducer
details");

50
}
} catch(Exception e) {
e.printStackTrace();
}
}
}

Chart.java

package mapreduce;
import java.awt.Color;
import java.awt.Dimension;
import java.awt.GradientPaint;
import javax.swing.JPanel;
import org.jfree.chart.ChartFactory;
import org.jfree.chart.ChartPanel;
import org.jfree.chart.JFreeChart;
import org.jfree.chart.axis.CategoryAxis;
import org.jfree.chart.axis.CategoryLabelPositions;
import org.jfree.chart.axis.NumberAxis;
import org.jfree.chart.labels.StandardCategorySeriesLabelGenerator;
import org.jfree.chart.plot.CategoryPlot;
import org.jfree.chart.plot.PlotOrientation;
import org.jfree.chart.renderer.category.BarRenderer;
import org.jfree.data.category.CategoryDataset;
import org.jfree.data.category.DefaultCategoryDataset;
import org.jfree.ui.ApplicationFrame;
import org.jfree.ui.RefineryUtilities;
import java.util.ArrayList;
import java.awt.event.WindowEvent;
import javax.swing.JScrollPane;
import org.jfree.chart.ChartUtilities;
public class Chart extends ApplicationFrame{
static int noagg;
static int agg;
static String title;
public Chart(String paramString,int a1,int a2){
super(paramString);
noagg = a1;
agg = a2;
JPanel localJPanel = createDemoPanel();
localJPanel.setPreferredSize(new Dimension(800, 370));
JScrollPane jsp = new JScrollPane(localJPanel);
setContentPane(localJPanel);
}
private static CategoryDataset createDataset(){
DefaultCategoryDataset localDefaultCategoryDataset = new
DefaultCategoryDataset();
localDefaultCategoryDataset.addValue(noagg,"No Aggregation","No
Aggregation");

51
localDefaultCategoryDataset.addValue(agg,"Aggregation","Aggregation");
return localDefaultCategoryDataset;
}
public void windowClosing(WindowEvent we) {
this.setVisible(false);
}
private static JFreeChart createChart(CategoryDataset paramCategoryDataset){
JFreeChart localJFreeChart = ChartFactory.createBarChart(title,"Technique
Name", "Processing Time", paramCategoryDataset, PlotOrientation.VERTICAL, true,
true, false);
CategoryPlot localCategoryPlot = (CategoryPlot)localJFreeChart.getPlot();
localCategoryPlot.setDomainGridlinesVisible(true);
localCategoryPlot.setRangeCrosshairVisible(true);
localCategoryPlot.setRangeCrosshairPaint(Color.blue);
NumberAxis localNumberAxis = (NumberAxis)localCategoryPlot.getRangeAxis();
localNumberAxis.setStandardTickUnits(NumberAxis.createIntegerTickUnits());
BarRenderer localBarRenderer = (BarRenderer)localCategoryPlot.getRenderer();
localBarRenderer.setDrawBarOutline(false);
GradientPaint localGradientPaint1 = new GradientPaint(0.0F, 0.0F, Color.blue, 0.0F,
0.0F, new Color(0, 0, 64));
GradientPaint localGradientPaint2 = new GradientPaint(0.0F, 0.0F, Color.green, 0.0F,
0.0F, new Color(0, 64, 0));
GradientPaint localGradientPaint3 = new GradientPaint(0.0F, 0.0F, Color.red, 0.0F,
0.0F, new Color(64, 0, 0));
localBarRenderer.setSeriesPaint(0, localGradientPaint1);
localBarRenderer.setSeriesPaint(1, localGradientPaint2);
localBarRenderer.setSeriesPaint(2, localGradientPaint3);
localBarRenderer.setLegendItemToolTipGenerator(new
StandardCategorySeriesLabelGenerator("Tooltip: {0}"));
CategoryAxis localCategoryAxis = localCategoryPlot.getDomainAxis();
localCategoryAxis.setCategoryLabelPositions(CategoryLabelPositions.createUpRotationL
abelPositions(0.5235987755982988D));
return localJFreeChart;
}
public static JPanel createDemoPanel(){
JFreeChart localJFreeChart = createChart(createDataset());
return new ChartPanel(localJFreeChart);
}}

DBCon.java

package mapreduce;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.Statement;
import java.util.ArrayList;
public class DBCon{
private static Connection con;

52
public static Connection getCon()throws Exception {
Class.forName("com.mysql.jdbc.Driver");
con = DriverManager.getConnection("jdbc:mysql://localhost/mapreduce","root","root");
return con;
}
public static String addReducer(String reducer,String lat,String lon)throws Exception{
String msg="fail";
con = getCon();
PreparedStatement stmt=con.prepareStatement("delete from reducer where
reducer_name=?");
stmt.setString(1,reducer);
stmt.executeUpdate();
PreparedStatement stat=con.prepareStatement("insert into reducer values(?,?,?)");
stat.setString(1,reducer);
stat.setString(2,lat);
stat.setString(3,lon);
int i=stat.executeUpdate();
if(i > 0)
msg = "success";
return msg;
}}

Reducer1.java

package mapreduce;
import java.io.ObjectOutputStream;
import java.io.ObjectInputStream;
import java.net.Socket;
import java.net.ServerSocket;
import java.awt.BorderLayout;
import java.awt.Color;
import java.awt.Container;
import java.awt.Font;
import java.awt.event.WindowAdapter;
import java.awt.event.WindowEvent;
import javax.swing.JFrame;
import javax.swing.JLabel;
import javax.swing.JPanel;
import javax.swing.JButton;
import javax.swing.JScrollPane;
import javax.swing.JTextArea;
import javax.swing.SwingUtilities;
import java.awt.event.ActionListener;
import java.awt.event.ActionEvent;
import javax.swing.UIManager;
public class Reducer1 extends JFrame{
ProcessThread thread;
JPanel p1,p2;
JLabel l1;
JScrollPane jsp;

53
JTextArea area;
Font f1,f2;
ServerSocket server;
Socket socket;
static StringBuilder sb = new StringBuilder();
public void start(){
try{
area.append("Node1 startup\n");
server = new ServerSocket(2222);
while(true){
socket = server.accept();
socket.setKeepAlive(true);
thread = new ProcessThread(socket,area);
thread.start();
}
}catch(Exception e){
e.printStackTrace();
}
}
public Reducer1(){
setTitle("Reducer1 Node");
p1 = new JPanel();
l1 = new JLabel("<html><body><center><font size=6 color=#f5ea01>Reducer1
Node</font></center></body></html>");
l1.setForeground(Color.white);
p1.add(l1);
p1.setBackground(Color.black);
f2 = new Font("Courier New", 1, 13);
p2 = new JPanel();
p2.setLayout(new BorderLayout());
area = new JTextArea();
area.setFont(f2);
jsp = new JScrollPane(area);
area.setEditable(false);
p2.add(jsp);
getContentPane().add(p1, "North");
getContentPane().add(p2, "Center");
addWindowListener(new WindowAdapter(){
@Override
public void windowClosing(WindowEvent we){
try{
if(socket != null){
socket.close();
}
server.close();
}catch(Exception e){
//e.printStackTrace();
}
}
});

54
}
public static void main(String a[])throws Exception{
UIManager.setLookAndFeel("com.sun.java.swing.plaf.nimbus.NimbusLookAndFe
el");
Reducer1 r1 = new Reducer1();
r1.setVisible(true);
r1.setSize(600,400);
new ServerThread(r1);
}
}

Reducer2.java

package mapreduce;
import java.io.ObjectOutputStream;
import java.io.ObjectInputStream;
import java.net.Socket;
import java.net.ServerSocket;
import java.awt.BorderLayout;
import java.awt.Color;
import java.awt.Container;
import java.awt.Font;
import java.awt.event.WindowAdapter;
import java.awt.event.WindowEvent;
import javax.swing.JFrame;
import javax.swing.JLabel;
import javax.swing.JPanel;
import javax.swing.JButton;
import javax.swing.JScrollPane;
import javax.swing.JTextArea;
import javax.swing.SwingUtilities;
import java.awt.event.ActionListener;
import java.awt.event.ActionEvent;
import javax.swing.UIManager;
public class Reducer2 extends JFrame{
ProcessThread thread;
JPanel p1,p2;
JLabel l1;
JScrollPane jsp;
JTextArea area;
Font f1,f2;
ServerSocket server;
Socket socket;
static StringBuilder sb = new StringBuilder();
public void start(){
try{
area.append("Node2 startup\n");
server = new ServerSocket(3333);
while(true){
socket = server.accept();

55
socket.setKeepAlive(true);
thread = new ProcessThread(socket,area);
thread.start();
}
}catch(Exception e){
e.printStackTrace();
}
}
public Reducer2(){
setTitle("Reducer2 Node");
p1 = new JPanel();
l1 = new JLabel("<html><body><center><font size=6 color=#f5ea01>Reducer2
Node</font></center></body></html>");
l1.setForeground(Color.white);
p1.add(l1);
p1.setBackground(Color.black);
f2 = new Font("Courier New", 1, 13);
p2 = new JPanel();
p2.setLayout(new BorderLayout());
area = new JTextArea();
area.setFont(f2);
jsp = new JScrollPane(area);
area.setEditable(false);
p2.add(jsp);
getContentPane().add(p1, "North");
getContentPane().add(p2, "Center");
addWindowListener(new WindowAdapter(){
@Override
public void windowClosing(WindowEvent we){
try{
if(socket != null){
socket.close();
}
server.close();
}catch(Exception e){
//e.printStackTrace();
}}
});
}
public static void main(String a[])throws Exception{
UIManager.setLookAndFeel("com.sun.java.swing.plaf.nimbus.NimbusLookAndFe
el");
Reducer2 r2 = new Reducer2();
r2.setVisible(true);
r2.setSize(600,400);
new ServerThread(r2);
}
}

56
CHAPTER 7
EXPERIMENTAL RESULTS
A GUI (graphical user interface) is a system of interactive visual components
for computer software. A GUI displays objects that convey information, and represent
actions that can be taken by the user. The objects change color, size, or visibility when the
user interacts with them.

Upon running our code, we get GUI as Fig. 6.1: containing different buttons
as follows.

Fig. 6.1: Mapper application home screen


It shows us with following options
 Defining reducer location
 Uploading documents
 Start MapReduce Aggregation
 Network traffic cost graph
 Exit

57
Fig. 6.2: Defining Reducer1 location

Click on Define Reducer location and add 2 reducer locations. Now we need to
define the reducers. As of our project we are defining two reducers. Firstly, We are defining
the reducer 1 at particular location by providing the specific latitude and longitude. In our
project, we provided reducer 1 latitude at 17.4293823 and longitude at 78.4507852. On
clicking the save reducer button, the reducer 1 location details will be saved which can be
seen in Fig. 6.2.

After successfully adding the reducer 1 details the screen is displayed as reducer
details added. After that we should click on the okay button so that we can define the next
reducer location.

We need to repeat the same for the reducer and start both the nodes before starting
our application project.

58
Fig. 6.3: Defining Reducer2 location

Now we need to define the reducer 2. We are defining the reducer 2 at particular
location by providing the specific latitude and longitude. In our project, we provided reducer
2 latitude at 17.4080325 and longitude at 78.4489493 as shown in Fig. 6.3. On clicking the
save reducer button, the reducer 2 location details will be saved.

After successfully adding the reducer 2 details the screen is displayed as reducer
details added then we should click on the okay button.

Soon after adding the locations of reducer1 and reducer2, we are required to start the
reducer nodes by following the below steps

 Run the reducer applications reducer 1


 Run the reducer applications reducer 2

59
Fig. 6.4: Upload Documents

Now upload the document or dataset on which processing need to be done. Next step
after running the reducer application i.e., reducer1 and reducer 2, the main work is to get the
assigned task done. Here in order to carry out the work to be done we mainly need to upload
the document the file which contains the data that need to be processed to obtain the required
results.
The data is present in a file named s1 which contains the required data in it. Hence,
we select the s1 data file and upload the document.
After successfully uploading the document, the data gets loaded which is the input
data and input document gets loaded. The demonstration is shown in the above Fig. 6.4.

60
Click on start MapReduce Aggregation

Fig. 6.5: MapReduce Aggregation

MapReduce is a software framework and programming model used for processing


huge amount of data. MapReduce program work in two phases namely Map and Reduce.
Map tasks deal with splitting and mapping of data while Reduce tasks shuffle and
reduce the data.
Hence after successful uploading of document click on start MapReduce
Aggregation. So, by the definition of MapReduce it first split the data called as input splits
and passed through mapping function to produce output values and then output values are
shuffled, reduced and return the single output value. Fig. 6.5: Illustrate the MapReduce
Aggregation.

61
The below screen shows the Aggregated data after successfully processing by the
code provided by us in the previous steps

Fig. 6.6: Aggregated data

In here the processed data from the reducer nodes are obtained depending upon the
calculation made to which reducer node the request is nearer. Aggregate data is high-level
data which is acquired by combining individual-level data. For instance, the output of an
industry is an aggregate of the firms' individual outputs within that industry. Aggregate
data are applied in statistics, data warehouses, and in economics.

Upon calculating the distance between each node and the query location, the
request will be sent to the node which is near which is shown as follows (Reducer
applications)

62
Fig. 6.7: Results on Reducer nodes
In the above, the request has been processed by Reducer 1, because the reducer 1 is
nearer to the mapper location. (For mapper application location details refer Main.java line
number 214). When the distance between both the reducers and query is same the request is
sent to any one the reducer nodes. In actual the reducers are placed away from each other in
order to control traffic problems associated.

In our case the distance between the reducers and the query location are calculated
with the help of longitude and latitude and smaller distance is taken into consideration and
job is scheduled to that node. As we can see in Fig. 6.7: the request is been sent to reducer1.

63
Click on network traffic cost graph in order to check the traffic patterns

Fig. 6.8: Traffic cost graph


The above graph represents network traffic cost for no aggregation processing time
and aggregation processing time. Aggregate data is high-level data which is acquired by
combining individual-level data. For instance, the output of an industry is an aggregate of
the firms' individual outputs within that industry. Aggregate data are applied in statistics,
data warehouses, and in economics. Graph differs when there is aggregation and no
aggregation conditions shown in Fig. 6.8.

In this the aggregator functionality is to reduce merged traffic from multiple map
tasks.

64
CHAPTER 7
CONCLUSION AND FUTURE SCOPE
7.1. CONCLUSION

In this project, we study the joint optimization of intermediate data partition and
aggregation in MapReduce to minimize network traffic cost for big data applications.
We propose a three-layer model for this problem and formulate it as a mixed-integer
nonlinear problem, which is then transferred into a linear form that can be solved by
mathematical tools. We report speedups of up to 4 with seven devices and energy savings
up to 71% with eight devices. To deal with the large-scale formulation due to big data, we
design a distributed algorithm to solve the problem on multiple machines. Our project has
better results when compared with other algorithms such as honeybee model and etc. This
Furthermore, we extend our algorithm to handle the MapReduce job in an online manner
when some system parameters are not given. Finally, we conduct extensive simulations to
evaluate our proposed algorithm under both offline cases and online cases. The simulation
results demonstrate that our proposals can effectively reduce network traffic cost under
various network settings.

7.2. FUTURE SCOPE

For future enhancements, one can consider the following:

 Attaching the load balancers and autoscaling groups increases accuracy and
efficiency of the results during high traffic.
 Distributed Scheduling can be improved with the help of YARN tool.
 Including the ability to calculate location of the sender.
 More sophisticated tools by combining existing ones can be used.

65
CHAPTER 8
REFERENCES
[1]. J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large
clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008.
[2]. W. Wang, K. Zhu, L. Ying, J. Tan, and L. Zhang, “Map task scheduling in
MapReduce with data locality: Throughput and heavy-traffic optimality,” in
INFOCOM, 2013 Proceedings IEEE.IEEE, 2013, pp. 1609–1617.
[3]. F. Chen, M. Kodialam, and T. Lakshman, “Joint scheduling of processing and
shuffle phases in MapReduce systems,” in INFOCOM,2012 Proceedings IEEE.
IEEE, 2012, pp. 1143–1151.
[4]. Y. Wang, W. Wang, C. Ma, and D. Meng, “Zput: A speedy data uploading
approach for the Hadoop distributed file system,” in Cluster Computing
(CLUSTER), 2013 IEEE International Conference on. IEEE, 2013, pp. 1–5.
[5]. S. Chen and S. W. Schlosser, “Map-reduce meets wider varieties of applications,”
Intel Research Pittsburgh, Tech. Rep. IRP-TR-08-05, 2008.
[6]. S. Venkataraman, E. Bodzsar, I. Roy, A. AuYoung, and R. S. Schreiber, “Presto:
distributed machine learning and graph processing with sparse matrices,” in
Proceedings of the 8th ACM European Conference on Computer Systems. ACM,
2013, pp. 197–210.
[7]. A.Matsunaga, M. Tsugawa, and J. Fortes, “Cloudblast: Combining mapreduce
and virtualization on distributed resources for bioinformatics applications,” in
eScience, 2008. eScience’08. IEEE Fourth International Conference on 2008, pp.
222–229.
[8]. T.White, Hadoop: the definitive guide: the definitive guide. “O’ Reilly
Media, Inc.”, 2009. Genomics, proteomics & bioinformatics, vol. 12, no. 1, pp.
48–51, 2014.

66
ONE PAGE: STUDENT PROFILE

PRATHYUSHA GADE

Prathyusha Gade is currently pursuing her final year, Bachelor of Technology in


the Computer Science and Engineering stream at St. Martin’s Engineering College. She
completed her schooling from Bhashyam Public School with CGPA of 10/10 and
intermediate from Sri Chaitanya Junior College with 93% aggregate. She has fair
understanding of Java and C, basic knowledge of C++ and python, hands on experience with
HTML and PHP. She is pretty good at SQL, Data structures. She has participated in some
events, seminars, project expos conducted by the college. She has been placed in 7
companies with highest package 27 LPA, Amazon. She is doing here internship at Axiom
where she learned AWS cloud services such as EC2, VPC, S3, Lambda, etc and DevOps
tools like Jenkins, Dockers, Terraform. She has experience with blockchain development
using substrate. She completed many certificate courses from Coursera, Sololearn, AWS
Partner Central.

67
CH CHANDANA

Ch Chandana is currently pursuing her final year,Bachelor of Technology in the


Computer Science and Engineering stream from St. Martin's Engineering College. She
completed her secondary education from Sister Nivedita school with CGPA of 8.8/10 and
intermediate from Sri Chaitanya Junior College and secured an aggregate of 94%.she is good
at programming languages like C,Java and Python and have basic knowledge of C++ and
quite knowledgeable in SQL and data structures,hands on experience with scripting
languages like HTML and Javascript.she is indulged in many activities and participated in
events,seminars,project expos,workshops conducted by the college.she also participated in
an event SPEED conducted by MLRIT.she got placed in 3 companies namely TCS,Infosys
and HCL.She did certification courses from Coursera and Sololearn.

68
BOGARAM AKANKSHA

Bogaram Akanksha is currently pursuing her final year, Bachelor of Technology in


the Computer Science and Engineering stream at St. Martin’s Engineering College. She
completed her schooling from Balaji High School with CGPA of 7.8/10 and intermediate
from Sri Chaitanya Junior College with 78.8% aggregate. She has fair understanding of Java
and C, basic knowledge of C++ and python, hands on experience with HTML and PHP. She
is pretty good at SQL, Data structures. She has participated in some events, seminars, project
expos conducted by the college. She has been placed in TCS company with highest package
3.36 LPA. She completed many certificate courses from Coursera, Sololearn.

69
SYED ABDUL MANNAN H HASHMI

Syed Abdul Mannan H Hashmi is currently pursuing his final year, Bachelor of
Technology in the Computer Science and Engineering stream at St. Martin's Engineering
College. He completed his schooling from FIITJEE World School with CGPA of 9.3 and
intermediate from Sri Chaitanya Narayana Junior Kalasala with 85% aggregate. He has fair
understanding of Java and C, basic knowledge of C++ and python, hands on experience with
HTML. He is pretty good at data structures. He has participated in some events, seminars,
project expos conducted by college.

70

You might also like