SlideShare a Scribd company logo
VOLUME 9,ISSUE 3,MAY 2016 ISSN-2347-8047
INTERNATIONAL JOURNAL OF COGNITIVE SCIENCE,
ENGINEERING AND TECHNOLOGY
Available online at: http://airnetjournal.org/
A Survey on Data Mapping Strategy for data
stored in the storage cloud
Navneet Kumar
Department Of cse
K S Institute Of
Technology
Bengaluru,India
Naseeruddin V N
Department Of cse
K S Institute Of
Technology
Bengaluru,India
Murali Krishna V
Department Of cse
K S Institute Of
Technology
Bengaluru,India
S K Manu
Department Of cse
K S Institute Of
Technology
Bengaluru,India
Swathi K
Department Of cse
K S Institute Of
Technology
Bengaluru,India
Abstract— In the recent past the data being processed over the
internet is increasing exponentially so it’s difficult to store such
huge amount of data and It becomes computationally inefficient
to analyze such huge data. There is currently considerable
enthusiasm around the Map Reduce paradigm for large-scale
data analysis. It is inspired by functional programming which
allows expressing distributed computation massive amounts of
data. It is designed for large-scale data processing as it allows to
run on clusters of commodity hardware. A prominent parallel
data processing tool Map Reduce is gaining significant
momentum from both industry and academia as the volume of
data to analyze grows rapidly. In this paper we propose a method
to process huge amount of data over the internet. This method
involves storing the data to be processed on the cloud and
processing the data on hadoop multicluster environment.
Keywords— Storage Cloud, Hadoop cluster, Hadoop,
Distributed File System, Parallel Processing, MapReduce
I.Introduction
The very challenging problem is to analyze big data. For the
effective handling of such massive data or applications, the
use of MapReduce framework has been widely came into
focus. Over the last few years, MapReduce has emerged as the
most popular computing paradigm for parallel, batch-style and
analysis of large amount of data. Many areas where massive
data analysis is required, MapReduce is used. There are
evolving numbers of applications that handle big data but to
handle such huge collection of data is a very
challengingproblem today. Here, we got the MapReduce or its
opensource equivalent Hadoop which is a powerful tool for
building such applications. Data-intensive processing is fast
and currently becoming a necessity to handle the large
databases efficiently. It is required to design algorithms that
must be capable of scaling to real-world datasets. There is
currently considerable enthusiasm around the MapReduce
paradigm for large-scale data analysis. It is inspired by
functional programming which allows expressing distributed
computations on massive amounts of data. It is designed for
large-scale data processing as it allows running on clusters of
commodity hardware. MapReduce is used in the areas where
the volume of data to analyze grows speedily. Though, it
comprise of such abilities, still there are argument on its
concert, effectiveness, and simple concept. At the present time
there is outburst of data, so to process such a massive volume
of data in a timely manner, parallel processing is important.
MapReduce gained its popularity when used successfully by
Google. In real, it is a scalable and fault-tolerant data
processing tool which provides the ability to process huge
voluminous data in parallel with many low-end computing
nodes. By virtue of its simplicity, scalability, and fault
tolerance, MapReduce is becoming ubiquitous, gaining
significant momentum from both industry and academia.
However, MapReduce has inherent limitations on its
performance and efficiency. Therefore, many studies have
endeavoured to overcome the limitations of the MapReduce
framework. The goal of this analysis is to provide a timely
remark on the status of MapReduce studies and related work
focusing on the current research aimed at improving and
enhancing the MapReduce framework. This paper is brought
into consideration to assist the database in understanding
various technical aspects of the MapReduce framework. In
this paper, we focus on the working of MapReduce framework
and examine its in-built advantages and drawbacks. We then
introduce application and effective ways to improve its
properties so that we can get the optimized result. We also
brought into focus the issues and challenges raised on
MapReduce. It is well known for its simplicity, effectiveness
and capability to handle “Big Data” in a timely manner. With
all these valuable features still it consist of some limitations
which is required to be sorted out.
II.Working
MapReduce is a programming model and an associated
implementation for processing and generating large datasets
that is amenable to a broad variety of real-world tasks [3]. The
MapReduce paradigm of parallel programming provides
simplicity, while at the same time offering load balancing and
fault tolerance The Google File System (GFS) that typically
underlies a MapReduce system provides the efficient and
reliable distributed data storage needed for applications
involving large databases [10]. MapReduce is inspired by the
map and reduces primitives present in functional languages. In
its pure form, various implementations of the MapReduce
interface are possible, depending on the desired context. Some
currently available implementations are: shared-memory
multi-core system [11][12], asymmetric multi-core
processors[13], graphic processors, and cluster of networked
machines[4]. The most popular implementation is probably
the one introduced by Google, which utilizes large clusters of
commodity computers connected with switched Ethernet. In
essence, the Google’s MapReduce technique simplifies the
development and lowers the cost of large-scale distributed
applications on clusters of commodity machines. MapReduce
framework executes its tasks based on runtime scheduling
scheme. It means that MapReduce does not build any
execution plan that specifies which tasks will run on which
nodes before execution [14]. The MapReduce model is
capable of parallelly processing large data sets distributed
across many nodes. The main goal is to simplify large data
processing by using inexpensive cluster computers and to
make this easy for users while achieving both load balancing
and fault tolerance. Map-Reduce have two primary functions:
the Map function and the Reduce function. These functions are
defined by the user to meet the specific requirements. The
original Map-Reduce software is a proprietary system of
Google, and therefore, not available for public use [15].
Although the distributed computing is largely simplified with
the notions of Map and Reduce primitives, the underlying
infrastructure is non-trivial in order to achieve the desired
performance [2]. A key infrastructure in Google’s MapReduce
is the underlying distributed file system to ensure data locality
and availability [3]. Combining the MapReduce programming
technique and an efficient distributed file system, one can
easily achieve the goal of distributed computing with data
parallelism over thousands of computing nodes.
III.Methodlogy
The Architecture above illustrates the layout of the project.
User uploads the data to the cloud over the internet, then
selects the operation to be carried out. the controller present as
a middleware interprets the request and forwards the request to
hadoop master. hadoop master starts the jobtracker and
connects the cloud as the data node and the mapreduce
algorithm is run, which maps the data and reduces according
to the algorithm implempted. the result is collected and
concated and stored back onto the cloud for the use r to
download the result.
The use of a storage cloud allows the user to upload the data
and download the data from places connected to the internet
without known where the actual processing is done.
IV.Design
The goals of application is to provide an easy to use interface
so that a user with even little knowledge about using website
can use the browser . We have designed few models and
structures to explain the design and structure of the application
under discussion.
Data Flow Model
A data flow diagram (DFD) is a graphical representation of
the "flow" of data through an information system, modeling
its process aspects. Often they are a preliminary step used to
create an overview of the system which can later be
elaborated. DFDs can also be used for the visualization of
data processing (structured design).
A DFD shows what kinds of information will be input to and
output from the system, where the data will come from and
go to, and where the data will be stored. It does not show
information about the timing of processes, or information
about whether processes will operate in sequence or in
parallel.
Data flow: A data flow shows the flow of information from
its source to its destination. A data flow is represented by a
line, with arrowheads showing the direction of flow.
Data Store: A data store is a holding place for information
within the system. It is represented by an open-ended narrow
rectangle.
External Entities: It is normal for all information represented
within a system to have been obtained from and/or to be
passed on to external source recipient.
Processes: When naming processes, avoid glossing over
them, without really understanding their role. It is descriptive
title area – like ‘process’ or ‘update’.
Data Flows: Double-headed arrows can be used on all but
bottom-level diagrams. Furthermore, in common with most of
the other symbol used, a data flow at a particular level of
diagram may be decomposed to multiple data flows.
V.Snapshots
Description: Admin logins using admin as Admin name and
Password. If admin name or password not matches it will
display message as Wrong admin or wrong password, If
matches it will display message as Login Success.
Description: Website contains options for user, through
which he can do the necessary task.
Description: User can select the containers and can upload the
data to the cloud.
Description: User can select the output container where the
result is stored and can download the data from the cloud.
Acknowledgment
The satisfaction and euphoria that accompany the successful
completion of any task will be incomplete without the mention
of the individuals, we are greatly indebted to, who through
guidance and providing facilities have served as a beacon of
light and crowned our efforts with success .we are thankful to
Mrs. Swathi K , Assistant Professor, CSE,KSIT for being our
Project Guide, under whose able guidance this project work
has been carried out and completed successfully.
We thank the management, principal, Department of computer
science and engineering, KSIT. We thank VGST(Vision
Group on Science and Technology) Government of Karnataka,
India for providing infrastructure facilities through the K-FIST
Level II project at KSIT,CSE R&D Department Bengaluru.
References
[1] Maitrey S, Jha. An Integrated Approach for CURE Clustering using Map-
Reduce Technique. In Proceedings of Elsevier, ISBN 978-81- 910691-6-3,2nd
August 2013.
[2] Kyuseok Shim. MapReduce Algorithms for Big Data Analysis. In
Proceedings of the VLDB Endowment, Vol. 5, No. 12, August 27th 2012,
Istanbul, Turkey.
[3]Jeffrey Dean et al. Mapreduce: Simplified data processing on large
clusters. In Proceedings of the 6th USENIX OSDI, pages 137–150, 2004.
[4] J. Dean et al. MapReduce: Simplified data processing on large clusters.
Communications of the ACM, 51(1):107– 113, 2008.
[5] D. DeWitt and M. Stonebraker. MapReduce: A major step backwards. The
Database Column, 1, 2008.
[6] A. Pavlo et al. A comparison of approaches to large-scale data analysis. In
Proceedings of the ACM SIGMOD, pages 165– 178, 2009.
[7] M. Stonebraker et al. MapReduce and parallel DBMSs: friends or foes?
Communications of the ACM, 53(1):64–71, 2010.
[8] A. Thusoo et al. Hive: a warehousing solution over a mapreduce
framework. Proceedings of the VLDB Endowment, (2):1626–1629, 2009.
[9] A.F. Gates et al. Building a high-level dataflow system on top of Map-
Reduce: the Pig experience. Proceedings of the VLDB Endowment,
2(2):1414–1425, 2009.
[10] S. Ghemawat et al. The google file system. ACM SIGOPS Operating
Systems Review, 37(5):29–43, 2003.
[11]OpenStack Installation Guide for Ubuntu 14.04 ,February 26, 2015.
[12]http://www.stackoverflow.com/
[13]https://github.com/
Ad

More Related Content

What's hot (16)

Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
IJECEIAES
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
Nithin Kakkireni
 
A New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexing
A New Multi-Dimensional Hyperbolic Structure for Cloud Service IndexingA New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexing
A New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexing
IJDMS
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
paperpublications3
 
TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)
ruchabhandiwad
 
IRJET- Cost Effective Workflow Scheduling in Bigdata
IRJET-  	  Cost Effective Workflow Scheduling in BigdataIRJET-  	  Cost Effective Workflow Scheduling in Bigdata
IRJET- Cost Effective Workflow Scheduling in Bigdata
IRJET Journal
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
ijcsit
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
IOSR Journals
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
cscpconf
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
inventy
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
Krishna Sujeer
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
IJCSIS Research Publications
 
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCESURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
AM Publications,India
 
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey
IJECEIAES
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
Anusha sweety
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
International Journal of Engineering Inventions www.ijeijournal.com
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
IJECEIAES
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
Nithin Kakkireni
 
A New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexing
A New Multi-Dimensional Hyperbolic Structure for Cloud Service IndexingA New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexing
A New Multi-Dimensional Hyperbolic Structure for Cloud Service Indexing
IJDMS
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
paperpublications3
 
TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)
ruchabhandiwad
 
IRJET- Cost Effective Workflow Scheduling in Bigdata
IRJET-  	  Cost Effective Workflow Scheduling in BigdataIRJET-  	  Cost Effective Workflow Scheduling in Bigdata
IRJET- Cost Effective Workflow Scheduling in Bigdata
IRJET Journal
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
ijcsit
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
IOSR Journals
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
cscpconf
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
inventy
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
IJCSIS Research Publications
 
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCESURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
AM Publications,India
 
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey
IJECEIAES
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
Anusha sweety
 

Viewers also liked (12)

Dokumen standard dunia seni visual tahun 3
Dokumen standard dunia seni visual tahun 3Dokumen standard dunia seni visual tahun 3
Dokumen standard dunia seni visual tahun 3
mohanasuriya
 
Presentation
PresentationPresentation
Presentation
NavNeet KuMar
 
Liz Marble Resume
Liz Marble ResumeLiz Marble Resume
Liz Marble Resume
Liz Marble
 
Introducing ourselves
Introducing ourselvesIntroducing ourselves
Introducing ourselves
Didem Kızıltoprak Irmaklı
 
Slide ukiran
Slide ukiran Slide ukiran
Slide ukiran
mohanasuriya
 
Arnaud giroux curriculum vitae, resume 2015
Arnaud giroux curriculum vitae, resume 2015Arnaud giroux curriculum vitae, resume 2015
Arnaud giroux curriculum vitae, resume 2015
Arnaud Giroux
 
Awaken Assurance Brochure - Advantages
Awaken Assurance Brochure - AdvantagesAwaken Assurance Brochure - Advantages
Awaken Assurance Brochure - Advantages
Douglas Schwartz
 
Publications list_JDR_Jan-2016
Publications list_JDR_Jan-2016Publications list_JDR_Jan-2016
Publications list_JDR_Jan-2016
John Ray
 
Background of Yuan depreciation and central parity reform
Background of Yuan depreciation and central parity reformBackground of Yuan depreciation and central parity reform
Background of Yuan depreciation and central parity reform
yun zhang
 
Our school
Our schoolOur school
Our school
Didem Kızıltoprak Irmaklı
 
Problems and challenges of animal husbandry extension
Problems and challenges of animal husbandry extensionProblems and challenges of animal husbandry extension
Problems and challenges of animal husbandry extension
Preethi Sundar
 
Nematodes
NematodesNematodes
Nematodes
Ghadeer Khaled
 
Dokumen standard dunia seni visual tahun 3
Dokumen standard dunia seni visual tahun 3Dokumen standard dunia seni visual tahun 3
Dokumen standard dunia seni visual tahun 3
mohanasuriya
 
Liz Marble Resume
Liz Marble ResumeLiz Marble Resume
Liz Marble Resume
Liz Marble
 
Arnaud giroux curriculum vitae, resume 2015
Arnaud giroux curriculum vitae, resume 2015Arnaud giroux curriculum vitae, resume 2015
Arnaud giroux curriculum vitae, resume 2015
Arnaud Giroux
 
Awaken Assurance Brochure - Advantages
Awaken Assurance Brochure - AdvantagesAwaken Assurance Brochure - Advantages
Awaken Assurance Brochure - Advantages
Douglas Schwartz
 
Publications list_JDR_Jan-2016
Publications list_JDR_Jan-2016Publications list_JDR_Jan-2016
Publications list_JDR_Jan-2016
John Ray
 
Background of Yuan depreciation and central parity reform
Background of Yuan depreciation and central parity reformBackground of Yuan depreciation and central parity reform
Background of Yuan depreciation and central parity reform
yun zhang
 
Problems and challenges of animal husbandry extension
Problems and challenges of animal husbandry extensionProblems and challenges of animal husbandry extension
Problems and challenges of animal husbandry extension
Preethi Sundar
 
Ad

Similar to A Survey on Data Mapping Strategy for data stored in the storage cloud 111 (20)

LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
ijwscjournal
 
Big Data Storage System Based on a Distributed Hash Tables System
Big Data Storage System Based on a Distributed Hash Tables SystemBig Data Storage System Based on a Distributed Hash Tables System
Big Data Storage System Based on a Distributed Hash Tables System
IJDMS
 
Big Data Storage System Based on a Distributed Hash Tables System
Big Data Storage System Based on a Distributed Hash Tables SystemBig Data Storage System Based on a Distributed Hash Tables System
Big Data Storage System Based on a Distributed Hash Tables System
ijdmsjournal
 
Hadoop
HadoopHadoop
Hadoop
Veera Sundari
 
Dataintensive
DataintensiveDataintensive
Dataintensive
sulfath
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics
iosrjce
 
B017320612
B017320612B017320612
B017320612
IOSR Journals
 
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
IRJET Journal
 
Cost-aware optimal resource provisioning Map-Reduce scheduler for hadoop fram...
Cost-aware optimal resource provisioning Map-Reduce scheduler for hadoop fram...Cost-aware optimal resource provisioning Map-Reduce scheduler for hadoop fram...
Cost-aware optimal resource provisioning Map-Reduce scheduler for hadoop fram...
IAESIJAI
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
huda2018
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
International Journal of Engineering Inventions www.ijeijournal.com
 
IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED-V2I3P43
IJSRED-V2I3P43
IJSRED
 
Iaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasetsIaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasets
Iaetsd Iaetsd
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
eSAT Publishing House
 
Using BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined NetworkingUsing BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined Networking
IJCSIS Research Publications
 
B1803031217
B1803031217B1803031217
B1803031217
IOSR Journals
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
Ahmad El Tawil
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on web
csandit
 
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
cscpconf
 
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCENETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
cscpconf
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
ijwscjournal
 
Big Data Storage System Based on a Distributed Hash Tables System
Big Data Storage System Based on a Distributed Hash Tables SystemBig Data Storage System Based on a Distributed Hash Tables System
Big Data Storage System Based on a Distributed Hash Tables System
IJDMS
 
Big Data Storage System Based on a Distributed Hash Tables System
Big Data Storage System Based on a Distributed Hash Tables SystemBig Data Storage System Based on a Distributed Hash Tables System
Big Data Storage System Based on a Distributed Hash Tables System
ijdmsjournal
 
Dataintensive
DataintensiveDataintensive
Dataintensive
sulfath
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics
iosrjce
 
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
IRJET Journal
 
Cost-aware optimal resource provisioning Map-Reduce scheduler for hadoop fram...
Cost-aware optimal resource provisioning Map-Reduce scheduler for hadoop fram...Cost-aware optimal resource provisioning Map-Reduce scheduler for hadoop fram...
Cost-aware optimal resource provisioning Map-Reduce scheduler for hadoop fram...
IAESIJAI
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
huda2018
 
IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED-V2I3P43
IJSRED-V2I3P43
IJSRED
 
Iaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasetsIaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasets
Iaetsd Iaetsd
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
eSAT Publishing House
 
Using BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined NetworkingUsing BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined Networking
IJCSIS Research Publications
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
Ahmad El Tawil
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on web
csandit
 
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
cscpconf
 
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCENETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
cscpconf
 
Ad

A Survey on Data Mapping Strategy for data stored in the storage cloud 111

  • 1. VOLUME 9,ISSUE 3,MAY 2016 ISSN-2347-8047 INTERNATIONAL JOURNAL OF COGNITIVE SCIENCE, ENGINEERING AND TECHNOLOGY Available online at: http://airnetjournal.org/ A Survey on Data Mapping Strategy for data stored in the storage cloud Navneet Kumar Department Of cse K S Institute Of Technology Bengaluru,India Naseeruddin V N Department Of cse K S Institute Of Technology Bengaluru,India Murali Krishna V Department Of cse K S Institute Of Technology Bengaluru,India S K Manu Department Of cse K S Institute Of Technology Bengaluru,India Swathi K Department Of cse K S Institute Of Technology Bengaluru,India Abstract— In the recent past the data being processed over the internet is increasing exponentially so it’s difficult to store such huge amount of data and It becomes computationally inefficient to analyze such huge data. There is currently considerable enthusiasm around the Map Reduce paradigm for large-scale data analysis. It is inspired by functional programming which allows expressing distributed computation massive amounts of data. It is designed for large-scale data processing as it allows to run on clusters of commodity hardware. A prominent parallel data processing tool Map Reduce is gaining significant momentum from both industry and academia as the volume of data to analyze grows rapidly. In this paper we propose a method to process huge amount of data over the internet. This method involves storing the data to be processed on the cloud and processing the data on hadoop multicluster environment. Keywords— Storage Cloud, Hadoop cluster, Hadoop, Distributed File System, Parallel Processing, MapReduce I.Introduction The very challenging problem is to analyze big data. For the effective handling of such massive data or applications, the use of MapReduce framework has been widely came into focus. Over the last few years, MapReduce has emerged as the most popular computing paradigm for parallel, batch-style and analysis of large amount of data. Many areas where massive data analysis is required, MapReduce is used. There are evolving numbers of applications that handle big data but to handle such huge collection of data is a very challengingproblem today. Here, we got the MapReduce or its opensource equivalent Hadoop which is a powerful tool for building such applications. Data-intensive processing is fast and currently becoming a necessity to handle the large databases efficiently. It is required to design algorithms that must be capable of scaling to real-world datasets. There is currently considerable enthusiasm around the MapReduce paradigm for large-scale data analysis. It is inspired by functional programming which allows expressing distributed computations on massive amounts of data. It is designed for large-scale data processing as it allows running on clusters of commodity hardware. MapReduce is used in the areas where the volume of data to analyze grows speedily. Though, it comprise of such abilities, still there are argument on its concert, effectiveness, and simple concept. At the present time there is outburst of data, so to process such a massive volume of data in a timely manner, parallel processing is important. MapReduce gained its popularity when used successfully by Google. In real, it is a scalable and fault-tolerant data processing tool which provides the ability to process huge voluminous data in parallel with many low-end computing nodes. By virtue of its simplicity, scalability, and fault tolerance, MapReduce is becoming ubiquitous, gaining significant momentum from both industry and academia. However, MapReduce has inherent limitations on its performance and efficiency. Therefore, many studies have endeavoured to overcome the limitations of the MapReduce framework. The goal of this analysis is to provide a timely remark on the status of MapReduce studies and related work focusing on the current research aimed at improving and enhancing the MapReduce framework. This paper is brought into consideration to assist the database in understanding various technical aspects of the MapReduce framework. In this paper, we focus on the working of MapReduce framework
  • 2. and examine its in-built advantages and drawbacks. We then introduce application and effective ways to improve its properties so that we can get the optimized result. We also brought into focus the issues and challenges raised on MapReduce. It is well known for its simplicity, effectiveness and capability to handle “Big Data” in a timely manner. With all these valuable features still it consist of some limitations which is required to be sorted out. II.Working MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks [3]. The MapReduce paradigm of parallel programming provides simplicity, while at the same time offering load balancing and fault tolerance The Google File System (GFS) that typically underlies a MapReduce system provides the efficient and reliable distributed data storage needed for applications involving large databases [10]. MapReduce is inspired by the map and reduces primitives present in functional languages. In its pure form, various implementations of the MapReduce interface are possible, depending on the desired context. Some currently available implementations are: shared-memory multi-core system [11][12], asymmetric multi-core processors[13], graphic processors, and cluster of networked machines[4]. The most popular implementation is probably the one introduced by Google, which utilizes large clusters of commodity computers connected with switched Ethernet. In essence, the Google’s MapReduce technique simplifies the development and lowers the cost of large-scale distributed applications on clusters of commodity machines. MapReduce framework executes its tasks based on runtime scheduling scheme. It means that MapReduce does not build any execution plan that specifies which tasks will run on which nodes before execution [14]. The MapReduce model is capable of parallelly processing large data sets distributed across many nodes. The main goal is to simplify large data processing by using inexpensive cluster computers and to make this easy for users while achieving both load balancing and fault tolerance. Map-Reduce have two primary functions: the Map function and the Reduce function. These functions are defined by the user to meet the specific requirements. The original Map-Reduce software is a proprietary system of Google, and therefore, not available for public use [15]. Although the distributed computing is largely simplified with the notions of Map and Reduce primitives, the underlying infrastructure is non-trivial in order to achieve the desired performance [2]. A key infrastructure in Google’s MapReduce is the underlying distributed file system to ensure data locality and availability [3]. Combining the MapReduce programming technique and an efficient distributed file system, one can easily achieve the goal of distributed computing with data parallelism over thousands of computing nodes. III.Methodlogy The Architecture above illustrates the layout of the project. User uploads the data to the cloud over the internet, then selects the operation to be carried out. the controller present as a middleware interprets the request and forwards the request to hadoop master. hadoop master starts the jobtracker and connects the cloud as the data node and the mapreduce algorithm is run, which maps the data and reduces according to the algorithm implempted. the result is collected and concated and stored back onto the cloud for the use r to download the result. The use of a storage cloud allows the user to upload the data and download the data from places connected to the internet without known where the actual processing is done. IV.Design The goals of application is to provide an easy to use interface so that a user with even little knowledge about using website can use the browser . We have designed few models and structures to explain the design and structure of the application under discussion. Data Flow Model A data flow diagram (DFD) is a graphical representation of the "flow" of data through an information system, modeling its process aspects. Often they are a preliminary step used to create an overview of the system which can later be elaborated. DFDs can also be used for the visualization of data processing (structured design). A DFD shows what kinds of information will be input to and output from the system, where the data will come from and go to, and where the data will be stored. It does not show information about the timing of processes, or information about whether processes will operate in sequence or in parallel. Data flow: A data flow shows the flow of information from
  • 3. its source to its destination. A data flow is represented by a line, with arrowheads showing the direction of flow. Data Store: A data store is a holding place for information within the system. It is represented by an open-ended narrow rectangle. External Entities: It is normal for all information represented within a system to have been obtained from and/or to be passed on to external source recipient. Processes: When naming processes, avoid glossing over them, without really understanding their role. It is descriptive title area – like ‘process’ or ‘update’. Data Flows: Double-headed arrows can be used on all but bottom-level diagrams. Furthermore, in common with most of the other symbol used, a data flow at a particular level of diagram may be decomposed to multiple data flows. V.Snapshots Description: Admin logins using admin as Admin name and Password. If admin name or password not matches it will display message as Wrong admin or wrong password, If matches it will display message as Login Success. Description: Website contains options for user, through
  • 4. which he can do the necessary task. Description: User can select the containers and can upload the data to the cloud. Description: User can select the output container where the result is stored and can download the data from the cloud. Acknowledgment The satisfaction and euphoria that accompany the successful completion of any task will be incomplete without the mention of the individuals, we are greatly indebted to, who through guidance and providing facilities have served as a beacon of light and crowned our efforts with success .we are thankful to Mrs. Swathi K , Assistant Professor, CSE,KSIT for being our Project Guide, under whose able guidance this project work has been carried out and completed successfully. We thank the management, principal, Department of computer science and engineering, KSIT. We thank VGST(Vision Group on Science and Technology) Government of Karnataka, India for providing infrastructure facilities through the K-FIST Level II project at KSIT,CSE R&D Department Bengaluru. References [1] Maitrey S, Jha. An Integrated Approach for CURE Clustering using Map- Reduce Technique. In Proceedings of Elsevier, ISBN 978-81- 910691-6-3,2nd August 2013. [2] Kyuseok Shim. MapReduce Algorithms for Big Data Analysis. In Proceedings of the VLDB Endowment, Vol. 5, No. 12, August 27th 2012, Istanbul, Turkey. [3]Jeffrey Dean et al. Mapreduce: Simplified data processing on large clusters. In Proceedings of the 6th USENIX OSDI, pages 137–150, 2004. [4] J. Dean et al. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107– 113, 2008. [5] D. DeWitt and M. Stonebraker. MapReduce: A major step backwards. The Database Column, 1, 2008. [6] A. Pavlo et al. A comparison of approaches to large-scale data analysis. In Proceedings of the ACM SIGMOD, pages 165– 178, 2009. [7] M. Stonebraker et al. MapReduce and parallel DBMSs: friends or foes? Communications of the ACM, 53(1):64–71, 2010. [8] A. Thusoo et al. Hive: a warehousing solution over a mapreduce framework. Proceedings of the VLDB Endowment, (2):1626–1629, 2009. [9] A.F. Gates et al. Building a high-level dataflow system on top of Map- Reduce: the Pig experience. Proceedings of the VLDB Endowment, 2(2):1414–1425, 2009. [10] S. Ghemawat et al. The google file system. ACM SIGOPS Operating Systems Review, 37(5):29–43, 2003. [11]OpenStack Installation Guide for Ubuntu 14.04 ,February 26, 2015. [12]http://www.stackoverflow.com/ [13]https://github.com/