0% found this document useful (0 votes)

43 views4 pages

Mapreduce With Hadoop For Simplified Analysis of Big Data: Ch. Shobha Rani Dr. B. Rama

Hadoop Research Paper

Uploaded by

Anil Kashyap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views4 pages

Mapreduce With Hadoop For Simplified Analysis of Big Data: Ch. Shobha Rani Dr. B. Rama

Hadoop Research Paper

Uploaded by

Anil Kashyap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

ISSN No.

0976-5697
Volume 8, No. 5, May-June 2017
International Journal of Advanced Research in Computer Science
RESEARCH PAPER
Available Online at www.ijarcs.info
MapReduce with Hadoop for Simplified Analysis of Big Data
Ch. Shobha Rani Dr. B. Rama
Research Scholar Assistant Professor
Department of Computer Science Department of Computer Science
Kakatiya University, Warangal, Telangana Kakatiya University, Warangal, Telangana

Abstract: With the development of web based applications and mobile computer technology there is a rapid growth of data, their computations
and analysis continuously in the recent years. Various fields around the globe are facing a big problem with this large scale data which highly
supports in decision making. The traditional relational DBMS’s were unable to handle this Big Data. The most classical data mining methods are
also not suitable for handling this big data. Efficient algorithms are required to process Big Data. Out of the many parallel algorithms
MapReduce is adopted by many popular and huge IT companies such as Google, Yahoo, FaceBook etc. In Big data world MapReduce has been
playing a vital role in meeting the increasing demands on computing resources affected by voluminous data sets. MapReduce is a popular
programming model suitable for Big Data Analysis in distributed and parallel computing. The high scalability of MapReduce is one of the
reasons for adapting this model. Hadoop is an open source; distributed programming framework with enables the storage and processing of large
data sets. [1] In this paper we try to focus especially on MapReduce with Hadoop for the analytical processing of big data.

Keywords: Big Data, Hadoop, MapReduce, BigData Analytics.

I. INTRODUCTION

In the current era, enormous data is being generated day by

day continuously. With the rapid expansion of data, we are
moving from the Petabyte to exabytes and zettabytes age. At
the same time, new technologies progressing with high
speed make it possible to organize and manipulate the
voluminous amounts of data presently being generated. With
this trend there exists a greater demand for new data storage
and analysis methods. [2] Especially, the real world aspects
of extracting knowledge from huge data sets have become
utmost important.
“Big Data” is the biggest observable fact that has captured
the attention of the modern computing industry today since
the expansion of Internet globally. Big Data is gaining more Fig. 1 : 5v’s of big data
popularity today is because of the technological revolutions
that have emerged are providing the capability to process Along with the three V’s, there also exists ambiguity,
data of multiple formats and structures without worrying viscosity, and virality.
about the constraints associated with traditional systems and
database platforms.  Ambiguity— which comes into existence if the metadata
lags behind the clarity of data in Big Data. For example,
II. IMPORTANCE OF BIG DATA in a graph, 1 and 0 can depict degree or can depict status
as true and false.
Big Data can be defined as large volumes of data which is  Viscosity—measures the resistance (slow down) to flow
either structured or unstructured and generated at high in the volume of data. Resistance can manifest in
speeds globally by various new technological devices. dataflow, business rules, and even be a limitation of
Big Data includes the data that is generated every second by technology. For example, social network predictions
sensors, mobiles, and consumer-driven data from social come under this category, where a number of enterprises
networks. Big Data is evolving from various facets within just cannot understand what impact is there on business
organizations legal, sales, marketing, procurement, finance, and how it resists the usage of the data in many cases.
and human resources departments etc.  Virality—measures and describes how quickly data is
shared in a people-to-people (peer) network.

Big Data is non relational

MapReduce is complementary to DBMS phenomenon, not a
competing technology. [3]
•
Parallel DBMS are for efficient querying of large data
sets. [4]

© 2015-19, IJARCS All Rights Reserved 853

Ch. Shobha Rani et al, International Journal of Advanced Research in Computer Science, 8 (5), May-June 2017,853-856

• Big Data exits mostly in real-manner rather than the

traditional Data Warehouse applications.[8]
• Traditional DW architectures (like Exadata, Teradata)
are not well suited for Big data applications.
• The architectures like Shared nothing and massively
parallel processing are very well suited for big data
applications.
• MR-style systems are suitable for complex analytics and
especially ETL tasks.
• Parallel DBMS require data to fit into the traditional Fig. 2 : Map Reduce workflow
relational representation of rows and columns.
• In contrast, the MapReduce architecture does not require Map Phase: In the map phase, the master node takes the
that data files must and should stick to a particular input, divides it into smaller sub-tasks, and distributes them
schema such as the relational data model. That is, the to worker nodes. A worker node may do this again
MR programmer can structure their data in any manner repeatedly, leading to a multi-level tree structure. The
or even to have no structure at all. As MR supports both worker node processes the smaller task only, and passes the
structured and unstructured data this is possible. intermediated result back to its master node.
Reduce Phase: During the reduce phase, the master node
Big Data Pillars collects all the intermediated outputs of all the sub-tasks
•
Big Table – consisting of relational tables. generated by various worker nodes and combines them in
•
Big Text – comprising of text in the form of structured, some way to form the final output – the solution to the
semi-structured data, natural language, and semantic problem it was originally trying to solve.
data. [4] a) Input reader: The input reader splits the input file into
• Big Metadata – collects and stores the data about data appropriate sizes (in practice typically 64 MB to 512 MB
stored in big data. as per HDFS) and one split is assigned to one Map
• Big Graphs – Graphs include connections between function by MapReduce framework. The input reader
objects, their semantic discovery, and the degree of takes input from stable storage (typically as in our case
separation, linguistic analytics, and subject predicates. Hadoop distributed file system) and generates the output
as key/value pairs.
III. MAPREDUCE b) Map function: Each Map function takes a series of
key/value pairs generated by the input reader, processes
MapReduce is a an emerging programming paradigm which each, and in turn produces zero or more output key/value
is designed for processing extremely large volumes of data pairs.[5] The input and output types of the map can be and
in parallel mode by splitting the job into various often are different from each other.
independent tasks.[3] A MapReduce program in general is a c) Partition function: Each Map function output is
combination of a Map() function and a Reduce() function. assigned to a particular reducer by the application'
The job of Map() is to perform filtering and sorting partition function for sharing purposes. The partition
operations as such, sorting customers by first name into function is given as input the key and the number of
queues, by generating one queue for each name and the reducers and it return the index of desired reduce.
Reduce() performs a summary/aggregate operations like d) Comparison function: The input for every Reduce is
counting the number of customers in each queue, thereby fetched from the machine where the Map run and sorted
yielding the name counts. [3] The "MapReduce System" well using comparison function.
known as MapReduce "framework" or “architecture” e ) Reduce function: The frame work calls the applications
demonstrates the processing with the distributed servers, Reduce function for each unique key in the sorted order.
running the various tasks in parallel, managing all It also iterates through the values that are associated with
communications and data transfers between the various parts that key and produce zero or more outputs.[6]
of the system, and providing for redundant data and fault f) Output writer: It writes the output of the Reduce
tolerance.[3] function to stable storage, usually a Hadoop distributed
MapReduce is a framework for processing voluminous data file system.
splitted and distributed across huge datasets using a large
number of computers (nodes). The group of nodes Performance:
collectively treated as a cluster, if all nodes are with similar MapReduce programs do not produce the output with high
hardware configurations working on the same local network speed. The main benefit of this programming model is to
or else the nodes are treated as a grid, if they are make use of the optimized shuffle operation of the platform,
geographically shared and distributed with varying hardware and the only task of the programmer is to write
specifications. Processing may occur on the data that is the Map and reduce functions of the program. While
stored either in system log files (unstructured) or in execution, the author of a MapReduce program needs to
a database (structured). MapReduce takes advantage of shuffle the intermediate results.[8] However, the partition
locality of data, to minimise the data transfer distance. function and the amount of data generated by
the Map function highly influence the performance of the
program. In addition to the partitioner,
the Combiner function helps to reduce the amount of data
written to storage (disk), and transmitted over the network.

© 2015-19, IJARCS All Rights Reserved 854

Ch. Shobha Rani et al, International Journal of Advanced Research in Computer Science, 8 (5), May-June 2017,853-856

IV HADOOP Fig.3 HDFS architecture

Apache Hadoop[1] which is an open-source software V MAPREDUCE IMPLEMENTATION

in Java is used majorly for distributed storage and
processing of extremely large data sets on computer clusters. While designing the MapReduce programs the user may not
Apache Hadoop mainly coupled with a storage part (Hadoop specify the mappers since it depends on the file size and the
Distributed File System (HDFS)) and a processing part block size, where as the number of reducers can be
(MapReduce). Splitting up of files into large blocks and configured by user based on number of mappers. In general
distributing them on the nodes in the cluster is take care by the Partitioner decides to choose reducers or else Hadoop
Hadoop only. It is not the job of the programmer to do the takes over the job. With the help of the combiner the
distribution over the cluster, Hadoop itself looks into it. In network traffic will be highly reduced.
Hadoop the data processing is done with MapReduce, by If map() is not defined by the user then the output of Record
transferring the program code to the nodes in parallel based reader is sent to identity mapper(without any logic) then to
on the data requirements of each node i.e, the process code reduce without any reducer defined in the program then the
travels to the node.[7] output of the identity reducer is stored in the data node itself
The Hadoop framework is encapsulated with the following and is not sent to HDFS.
modules: [1] When multiple mappers are running there may be a situation
• Hadoop Common where some mappers may be running very slow, Hadoop
• Hadoop Distributed File System (HDFS then identifies such slow running jobs and triggers the same
• Hadoop MapReduce. job to other data node, this concept is called as Speculator
execution in Hadoop.

SHUF
INPUT SPLITTING MAPPING FLE / REDUCE OUTPUT
SORT

Fig .4. MapReduce Program execution sequence

If we consider a sample input file consisting the text as :

hello hadoop bye hadoop
hello google goodbye google
The internal execution process will be as follows:

Ch. Shobha Rani et al, International Journal of Advanced Research in Computer Science, 8 (5), May-June 2017,853-856

VII CONCLUSION REFERENCES

With the invent of new technologies emerging at a rapid [1] http://hadoop.apache.org,2010

rate, one must be very careful to understand the global [2] V. Patil, V.B. Nikam, “Study of Mining Algorithm in cloud
competition and the big data analysis that support decision computing using MapReduce Framework”, Journal of
making. This paper analyzes the concept of big data analysis Engineering, Computers & Applied Sciences (JEC&AS)
Vol.2, No.7, July 2013.
and how it can be simplified as from existing traditional [3] https://en.wikipedia.org/wiki/MapReduce
relational database technologies. This paper clearly specifies [4] D. Usha, A.P.S. AslinJenil, “ A Survey of Big Data
the Hadoop environment, its architecture and how it can be Processing in Perspective of Hadoop and Mapreduce”,
implemented using MapReduce along with various International Journal of Current Engineering and Technology,
functions. As Big Data Analysis is still in its infancy stage Vol.4, No.2,April 2014.
we are sure that this paper helps the researchers to better [5] S. Ghemawat et al .“The Google File System.” ACM SIGOPS
understand the concepts of Big Data its processing and Operating Systems Review, 37(5):29–43, 2003.
analysis. Big Data will definitely bring a major social [6] J. Dean and S. Ghemawat, “Mapreduce: Simplified data
change. Though programming languages like R, SPSS are processing on large clusters,” in Proceedings of OSDI’04:
Sixth Symposium on Operating System Design and
evolving for Big Data analytics further research is still Implementation, December 2004.
required to ensure integrity, security for the large data sets [7] T. White. Hadoop: “The Definitive Guide.”,Yahoo
being processed. Big Data Analytics should be exploited for Press,2010.
sustainable and unbiased society. [8] Russom,P “Big Data Analytics”, TDWI Best Practices Report,
pp.1-40, 2011.

Sample Questions for Cisco Certified Support Technician Ccst Networking Exam by Wade
No ratings yet
Sample Questions for Cisco Certified Support Technician Ccst Networking Exam by Wade
10 pages
cp5293 Big Data Analytics Question Bank
0% (1)
cp5293 Big Data Analytics Question Bank
13 pages
5 - UI Automation (Part 1 of 5) Client Interfaces
No ratings yet
5 - UI Automation (Part 1 of 5) Client Interfaces
254 pages
Latest Microsoft AZ-900 Dumps Questions
75% (4)
Latest Microsoft AZ-900 Dumps Questions
13 pages
4.2.6 Lab - Working With Text Files in The CLI
No ratings yet
4.2.6 Lab - Working With Text Files in The CLI
10 pages
Ccna v4 Lab Guide
100% (1)
Ccna v4 Lab Guide
48 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
SAS Interview Questions
100% (1)
SAS Interview Questions
40 pages
Nmap Cheat Sheet PDF
100% (4)
Nmap Cheat Sheet PDF
1 page
Big Data and Hadoop: A Review Paper
No ratings yet
Big Data and Hadoop: A Review Paper
3 pages
An Insight On Big Data Analytics Using Pig Script
No ratings yet
An Insight On Big Data Analytics Using Pig Script
7 pages
Big Data Analysis Using Hadoop: A Survey: August 2015
No ratings yet
Big Data Analysis Using Hadoop: A Survey: August 2015
6 pages
Ashish_Presentation_Stage1_modify_LR
No ratings yet
Ashish_Presentation_Stage1_modify_LR
24 pages
Replication-Based Query Management For Resource Allocation Using Hadoop and MapReduce Over Big Data
No ratings yet
Replication-Based Query Management For Resource Allocation Using Hadoop and MapReduce Over Big Data
13 pages
Cloud Comp Techno
No ratings yet
Cloud Comp Techno
5 pages
Big Data and Hadoop Self Notes
No ratings yet
Big Data and Hadoop Self Notes
16 pages
Big Data With Cloud Computing Discussions and Challenges
No ratings yet
Big Data With Cloud Computing Discussions and Challenges
9 pages
Big Data Analytics
No ratings yet
Big Data Analytics
31 pages
Big Data Analytics Digital Notes
No ratings yet
Big Data Analytics Digital Notes
119 pages
Big Data With Hadoop - For Data Management, Processing and Storing
No ratings yet
Big Data With Hadoop - For Data Management, Processing and Storing
7 pages
BIG data1
No ratings yet
BIG data1
49 pages
BAD601 Big Data Model Question Paper Solution Search Creators
No ratings yet
BAD601 Big Data Model Question Paper Solution Search Creators
50 pages
Experiment No _ 1 Bda
No ratings yet
Experiment No _ 1 Bda
10 pages
Big Data With Challenges and Future Scope
No ratings yet
Big Data With Challenges and Future Scope
14 pages
Lect 2 Big Data Lesson01
No ratings yet
Lect 2 Big Data Lesson01
26 pages
Big Data Presentation
No ratings yet
Big Data Presentation
22 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
BDH Admin Ebook
No ratings yet
BDH Admin Ebook
807 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Unit 1 (1)
No ratings yet
Unit 1 (1)
89 pages
Data Science
No ratings yet
Data Science
87 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
Hadoop PPT
No ratings yet
Hadoop PPT
25 pages
Big data analytics notes
No ratings yet
Big data analytics notes
33 pages
big-data-2022-notes
No ratings yet
big-data-2022-notes
118 pages
Big Data Analytics Using Apache Hadoop
No ratings yet
Big Data Analytics Using Apache Hadoop
33 pages
Big Data Overview
No ratings yet
Big Data Overview
18 pages
Taming Big Data
No ratings yet
Taming Big Data
268 pages
Big Data ANALYSIS LONG
No ratings yet
Big Data ANALYSIS LONG
117 pages
Intr Oduction of Big Data
No ratings yet
Intr Oduction of Big Data
12 pages
ABSTRACT
No ratings yet
ABSTRACT
9 pages
What Is Bigdata
No ratings yet
What Is Bigdata
5 pages
Cp5293 Big Data Analytics Question Bank
0% (1)
Cp5293 Big Data Analytics Question Bank
13 pages
Big Data Analytics 1-5
100% (1)
Big Data Analytics 1-5
63 pages
Big Data Analytics (R18a0529)
No ratings yet
Big Data Analytics (R18a0529)
134 pages
Big Data
No ratings yet
Big Data
190 pages
Lecture8 -Big Data (Hadoop)
No ratings yet
Lecture8 -Big Data (Hadoop)
29 pages
Bdhs - Ebook
No ratings yet
Bdhs - Ebook
970 pages
4 A Review Paper On Big Data and Hadoop
No ratings yet
4 A Review Paper On Big Data and Hadoop
3 pages
BD IMP QUES 1
No ratings yet
BD IMP QUES 1
22 pages
biggdata
No ratings yet
biggdata
24 pages
UNIT 2 Notes by ARUN JHAPATE
No ratings yet
UNIT 2 Notes by ARUN JHAPATE
22 pages
Jsaer2016 03 01 21 24
No ratings yet
Jsaer2016 03 01 21 24
4 pages
Challenges For Mapreduce in Big Data: Scholarship@Western
No ratings yet
Challenges For Mapreduce in Big Data: Scholarship@Western
10 pages
Big Data
No ratings yet
Big Data
25 pages
Review Paper On Big Data Analytics in Cloud Computing: July 2017
No ratings yet
Review Paper On Big Data Analytics in Cloud Computing: July 2017
6 pages
BCC (IEEE Format) Big Data
No ratings yet
BCC (IEEE Format) Big Data
2 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
No ratings yet
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
6 pages
CC Becse Unit 4 PDF
No ratings yet
CC Becse Unit 4 PDF
32 pages
AREVIEWPAPERONBIGDATAANDHADOOP
No ratings yet
AREVIEWPAPERONBIGDATAANDHADOOP
5 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Developing Analytic Talent: Becoming a Data Scientist
From Everand
Developing Analytic Talent: Becoming a Data Scientist
Vincent Granville
3/5 (7)
The Essential Guide to Database Management
From Everand
The Essential Guide to Database Management
Pasquale De Marco
No ratings yet
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
Precurio Press Release (Version 2 1)
No ratings yet
Precurio Press Release (Version 2 1)
2 pages
Every Body UP 5 - UNIT 8 TEST
No ratings yet
Every Body UP 5 - UNIT 8 TEST
2 pages
IN Installation Note enUS 19540996363
No ratings yet
IN Installation Note enUS 19540996363
2 pages
Sanskar IT Project
No ratings yet
Sanskar IT Project
9 pages
S7-SCL - Working With S7-SCL PDF
No ratings yet
S7-SCL - Working With S7-SCL PDF
28 pages
Address Decoder For PC
No ratings yet
Address Decoder For PC
19 pages
Chapter# 14 Database Design Theory and Normalization
No ratings yet
Chapter# 14 Database Design Theory and Normalization
54 pages
Howtoapply
No ratings yet
Howtoapply
4 pages
QUESTION_BANK FOR MODULE 4&5
No ratings yet
QUESTION_BANK FOR MODULE 4&5
3 pages
Discuss Everything About Dragon Sky Wiki - Fandom
No ratings yet
Discuss Everything About Dragon Sky Wiki - Fandom
12 pages
NewJeans SuperGirls by Frostmade
No ratings yet
NewJeans SuperGirls by Frostmade
1 page
Parsers
No ratings yet
Parsers
11 pages
Avoid Costly Measures: Remote Thickness Tracking
No ratings yet
Avoid Costly Measures: Remote Thickness Tracking
8 pages
Gr3- IT - 1st Term- AGS - Matara
No ratings yet
Gr3- IT - 1st Term- AGS - Matara
4 pages
Registration: References Download Prices Contact
No ratings yet
Registration: References Download Prices Contact
3 pages
Alice Project (Shell Game)
No ratings yet
Alice Project (Shell Game)
5 pages
Functions and Storage Class: #Include Int Int Int Int Return Int Void
100% (1)
Functions and Storage Class: #Include Int Int Int Int Return Int Void
11 pages
NBIS Quick Start Guide
No ratings yet
NBIS Quick Start Guide
4 pages
1 Preliminaries
No ratings yet
1 Preliminaries
11 pages
Essays on Software Architecture
No ratings yet
Essays on Software Architecture
33 pages
New Digital Marketing Proposal For Webminds - Cloud - BrainPulse 01-02-24
No ratings yet
New Digital Marketing Proposal For Webminds - Cloud - BrainPulse 01-02-24
17 pages
Computer Organisation
No ratings yet
Computer Organisation
19 pages
PDF Deep Learning For Remote Sensing Images With Open Source Software 1St Edition Remi Cresson Ebook Full Chapter
No ratings yet
PDF Deep Learning For Remote Sensing Images With Open Source Software 1St Edition Remi Cresson Ebook Full Chapter
53 pages

Mapreduce With Hadoop For Simplified Analysis of Big Data: Ch. Shobha Rani Dr. B. Rama

Uploaded by

Mapreduce With Hadoop For Simplified Analysis of Big Data: Ch. Shobha Rani Dr. B. Rama

Uploaded by

ISSN No.

Keywords: Big Data, Hadoop, MapReduce, BigData Analytics.

In the current era, enormous data is being generated day by

Big Data is non relational

© 2015-19, IJARCS All Rights Reserved 853

• Big Data exits mostly in real-manner rather than the

© 2015-19, IJARCS All Rights Reserved 854

IV HADOOP Fig.3 HDFS architecture

Apache Hadoop[1] which is an open-source software V MAPREDUCE IMPLEMENTATION

Fig .4. MapReduce Program execution sequence

If we consider a sample input file consisting the text as :

© 2015-19, IJARCS All Rights Reserved 855

VII CONCLUSION REFERENCES

With the invent of new technologies emerging at a rapid [1] http://hadoop.apache.org,2010

© 2015-19, IJARCS All Rights Reserved 856

You might also like