0% found this document useful (0 votes)

25 views5 pages

Implementing K Means For Achievement Stu

This document compares Apache Spark and MapReduce frameworks for processing large datasets. It summarizes: 1) Spark and MapReduce both provide scalable and fault-tolerant processing of big data across clusters, but MapReduce uses a rigid job structure while Spark supports iterative and interactive analytics more efficiently with its resilient distributed datasets (RDDs). 2) The paper implements K-Means clustering, a common machine learning algorithm, on both Spark and MapReduce to compare performance. 3) Parameters like scheduling delay, speedup, and energy consumption are measured to evaluate the frameworks. The results show Spark can perform iterative jobs faster than MapReduce by reusing data in memory across jobs rather than reloading from disk.

Uploaded by

Edward

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views5 pages

Implementing K Means For Achievement Stu

Uploaded by

Edward

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

International Journal of Innovative Research in Engineering & Management (IJIREM)

ISSN: 2350-0557, Volume-3, Issue-5, September-2016

Implementing K-Means for Achievement Study

between Apache Spark and Map Reduce
Dr.E.Laxmi Lydia, Dr.A.Krishna Mohan, Dr. M.Ben Swarup,
Associate Professor, Professor, Department of Professor, Department of
Department of Computer Computer Science and Computer Science and
Science and Engineering, Engineering, JNTUK Engineering, Vignan's
Vignan's Institute Of Andhra Pradesh, Institute Of Information
Information Technology, India. Technology, Visakhapatnam,
Visakhapatnam, Andhra Pradesh,
Andhra Pradesh, India.
India.

ABSTRACT [11] spearheaded this model, while systems like Dryad [17]
and Map-Reduce-Merge [24] summed up the sorts of data
Huge Data has for quite some time been the subject of
streams upheld.
enthusiasm for Computer Science fans around the globe,
These systems accomplish their scalability and fault
and has increased much more conspicuousness in the later
tolerance by giving a programming model where the client
times with the constant blast of information coming about
makes non-cyclic data stream graphs to go input data
because of any semblance of online networking and the
through an arrangement of operators. This permits the
journey for tech monsters to get entrance to more profound
hidden framework to oversee scheduling and to respond to
investigation of their information. MapReduce and its
faults without client mediation.
variations have been very fruitful in actualizing vast scale
While this data flow programming model is useful for a
information concentrated applications on ware groups.
large class of applications, there are applications that can't
Then again, a large portion of these frameworks are
be communicated proficiently as non-cyclic data flows. In
manufactured around a non-cyclic information stream
this paper, we concentrate on one such class of
demonstrate that is not suitable for other famous
applications: those that reuse a working arrangement of
applications. Unique MapReduce executes jobs in a
data over numerous parallel operations. This incorporates
straightforward yet unbending structure design.
two use cases where we have seen Hadoop users report that
MapReduce changes step ("map"), a synchronization step
MapReduce is lacking:
("shuffle"), and a stage to join results from every one of the
Iterative employments: Many normal machine learning
nodes in a cluster ("reduce"). Accordingly to defeat the
algorithms apply a capacity more than once to the same
inflexible structure of guide and diminish we proposed the
dataset to upgrade a parameter (e.g., through inclination
as of late presented Apache Spark – both of which give a
plummet). While every iteration can be communicated as a
handling model to breaking down enormous information.
MapReduce/Dryad work, every employment must reload
The primary contender for "successor to MapReduce"
the data from disk, bringing about a huge performance
today is Apache Spark. Like MapReduce, it is an
punishment.
extensively helpful engine, be that as it may it is proposed
Intelligent analytics: Hadoop is frequently used to run
to run various more workloads, and to do in that capacity
specially appointed exploratory questions on large datasets,
much speedier than the more prepared framework. In this
through SQL interfaces, for example, Pig [21] and Hive [1].
paper we contrast these two systems along and giving the
In a perfect world, a user would have the capacity to stack a
execution examination utilizing a standard machine
dataset of enthusiasm into memory over various machines
considering so as learning calculation for bunching (K-
and inquiry it more than once. Be that as it may, with
Means) and through considering some different parameters
Hadoop, every inquiry acquires huge inertness (several
like scheduling delay, speed up, energy consumption than
seconds) because it keeps running as a different
the existing systems.
MapReduce occupation and peruses data from disk.
This paper shows new cluster computing framework
Keywords: called Spark, which bolsters applications with working sets
Spark, MapReduce, Hadoop, Big Data while giving comparative scalability and fault tolerance
properties to MapReduce.
The primary abstraction in Spark is that of a resilient
1. INTRODUCTION distributed dataset (RDD), which speaks to a read-just
Another model of cluster computing has turned out to accumulation of items partitioned over an arrangement of
be broadly well known, in which data-parallel machines that can be reconstructed if a segment is lost.
computations are executed on clusters of questionable Clients can expressly store a RDD in memory crosswise
machines by systems that consequently give locality-aware over machines and reuse it in numerous MapReduce-like
scheduling, fault tolerance, and load balancing. MapReduce parallel operations. RDDs accomplish fault tolerance

Copyright © 2016. Innovative Research Publications. All Rights Reserved 431

Implementing K-Means
Means for Achievement Study between Apache Spark and
and Map Reduce

through an idea of genealogy: if an allotment of a RDD is option to Hadoop MapReduce as opposed to a substitution
lost, the RDD has enough data about how it was gotten to Hadoop. It's not proposed to supplant Hadoop but rather
from different RDDs to have the capacity to remake only to give an extensive and bound
nd together answer for oversee
that parcel. Despite the fact that RDDs are not a general distinctive big data use cases and prerequisites. Figure
Figure1
shared memory abstraction, they speak to a sweet-spotsweet demonstrating the contrast amongst Hadoop and spark.
between expressivity
essivity from one perspective and scalability
and reliability then again, and we have discovered them
appropriate for an assortment of applications.
high-level
Spark is executed in Scala [5], a statically wrote high
programming language for the Java VM, and unc uncovered a
functional programming interface like DryadLINQ [25].
Likewise, Spark can be utilized intelligently from an
altered version of the Scala interpreter, which permits the
client to characterize RDDs, functions, variables and
classes and utilize them in parallel operations on a cluster.
We trust that Spark is the main framework to permit a
productive, universally useful programming language to be
utilized intelligently to process extensive datasets on a
cluster.
Despite the fact that our usage of Sparkk is still a prototype,
early involvement with the framework is empowering. We Figure 1 Difference between Hadoop and Spark
demonstrate that Spark can beat Hadoop by 10x in iterative
machine learning workloads and can be utilized
intelligently to filter a 39 GB dataset with sub sub-second 1.2 SPARK ARCHITECTURE
latency. Spark Architecture incorporates taking after tthree principle
components:
Data Storage: Spark utilizes HDFS document framework
1.1 HADOOP ALONG WITH SPARK for information stockpiling purposes. It works with any
Hadoop as a big data processing technology has been Hadoop perfect information source including HDFS,
around for a long time and has ended up being the solution HBase, Cassandra, and so forth.
of decision for processing large data sets. MapReduce is an Programming interface: The API gives the application
extraordinary solution for one-pass
pass computations, yet not developers to make Spark based applications utilizing a
extremely productive for use cases that require multi-pass
multi standard API interface. Spark gives API to Scala, Java, and
computations and algorithms. Every progression in the data Python programming languages.
processing work process has one Map phase and one Resource Management: Spark can be conveyed as a Stand Stand-
Reduce phase and you'll have to change over any utilization alone server or it can be on a distrib
distributed computing
case into MapReduce pattern to influence this solution. framework like Mesos or YARN. Figure 2 below shows
The Job output data between every progression must be put these components of Spark architecture model.
away in the circulated document framework before the
following stride can start. Thus, this methodology has a
tendency to be moderate because of replication and plate
stockpiling.
ockpiling. Likewise, Hadoop solutions normally
incorporate bunches that are difficult to set up and oversee.
It likewise requires the incorporation of a few devices for
various big data use cases (like Mahout for Machine
Learning and Storm for streaming data ta processing).
On the off chance that you needed to accomplish
something convoluted, you would need to string together a
progression of MapReduce jobs and execute them in
sequence. Each of those jobs was high-latency,
latency, and none
could begin until the past occupation had completed totally.
Spark permits programmers to create complex, multi-step
multi
data pipelines utilizing coordinated non--cyclic diagram
(DAG) design. It additionally underpins in-memory
in data
sharing crosswise over DAGs, so that diverse jobs job can Figure 2. Spark Architecture
work with the same data.
Spark keeps running on top of existing Hadoop Distributed 1.3 INTELLECTION TO DESIG
DESIGNATE SPARK
File System (HDFS) framework to give improved and extra Sparkle utilizes the idea of RDD which permits us to
usefulness. It gives backing to deploying Spark applications store data on memoryy and persevere it according to the
in a current Hadoop v1 cluster (with SIMR – Spark-Inside- prerequisites. This permits a massive increment in batch
MapReduce) or Hadoop v2 YARN cluster or even Apache processing job execution (up to ten to hundred times as
Mesos. We ought to take a gander at Spark as another much as that of routine Map Reduce).

Copyright © 2016. Innovative Research Publications. All Rights Reserved 432

International Journal of Innovative Research in Engineering & Management (IJIREM)
ISSN: 2350-0557, Volume-3, Issue-5, September-2016

Start additionally permits us to reserve the data in memory, 3. COMPARISON

which is valuable if there should arise an occurrence of
iterative algorithms, for example, those utilized as a part of Keeping in mind the end goal to arrive at a decision
machine learning. about the useful correlation of Apache Spark and Map
Conventional MapReduce and DAG engines are Reduce, we performed a near examination utilizing these
problematic for these applications since they depend on systems on a dataset that permits us to perform bunching
acyclic data stream: an application needs to keep running as utilizing the K-Means calculation.
a progression of unmistakable jobs, each of which peruses
data from stable storage (e.g. a disseminated record 3.1 DATASET DESCRIPTION
framework) and composes it back to stable storage. They The Data Set includes healthcare_Sample_datasets size
bring about noteworthy cost stacking the data on every of 3.13 MB collected over the years, and includes
progression and composing it back to replicated storage. patientID, name and other values of the respective records.
Flash permits us to perform stream processing with A sample of the data records is shown as below: The data
extensive information data and manage just a chunk of data record is demonstrated in the table1:
on the fly. This can likewise be utilized for online machine
learning, and is very fitting for use cases with a prerequisite Table 1: Healthcare_sample_datasets
for continuous investigation which happens to be a PatientID: int
practically universal necessity in the business.
MapReduce is ineffective for multi-pass applications that Name: chararray
require low-latency data sharing over multiple parallel DOB: chararray
operations. These applications are very basic in analytics, PhoneNumber: chararray
and include:
 Iterative algorithms, including numerous machine
EmailAddress: chararray
learning algorithms and graph algorithms like SSN: chararray
PageRank. Gender: chararray
 Interactive data mining, where a client might want Disease: chararray
to load data into RAM over a bunch and question it weight: float
more than once.
 Streaming applications that keep up aggregate state
after some time.
Sample Record
2. IMPLEMENTATION 11 aa 12/10/19 123 [email protected] 1
M
Diabet 7
1 1 50 4 m 1 es 8
2.1 K-MEANS CLUSTERING 11
2
aa
2
12/10/19
84
123
4
[email protected]
m
1
1
F PCOS
6
7
K-Means is a simple learning algorithm for clustering
analysis. The goal of K-Means algorithm is to find the best
division of n entities in k groups, so that the total distance 3.2 PERFORMANCE ANALYSIS AND
between the group’s members and its corresponding DESCRIPTION
centroids, representative of the group, is minimized. The k- Post working on the K-Means algorithm on the
means algorithm is used for partitioning where each described data set, we achieved the following results for
cluster’s centre is represented by the mean value of the comparison (shown in the tables). To gain a varied
objects in the cluster. The Pseudo code is as following: analysis, we considered 64MB, 3.13 MB with a single node
Step 1: Begin with n clusters, each containing one object and 3.13MB with two nodes and monitored the
and we will number the clusters 1 through n. performance in terms of the time taken for clustering as per
Step 2: Compute the between-cluster distance D(r, s) as the our requirements using K-Means algorithm. The machines
between-object distance of the two objects in r and s used had a configuration as follows:
respectively, r, s =1, 2, …, n. Let the square matrix D =  4GB RAM
(D(r, s)). If the objects are represented by vectors, we can  Linux Ubuntu
use the Euclidean distance.  500 GB Hard Drive
Step 3: Next, find the most similar pair of clusters r and s,
such that the distance, D(r, s), is minimum among all the The results clearly showed that the performance of Spark
pair wise distances. turn out to be considerably higher in terms of time, where
Step 4: Merge r and s to a new cluster t and compute the each of the dataset size results in a decrease in the
between-cluster distance D(t, k) for any existing cluster k ≠ processing time of up to three times as compared to that of
r, s . Once the distances are obtained, delete the rows and Map Reduce. Although there exists a minor fluctuation in
columns corresponding to the old cluster r and s in the D this result, this is due to the random nature of the K-Means
matrix, since r and s do not exist anymore. Then add a new algorithm and does not affect the analysis to a large extent.
row and column in D corresponding to cluster t.
Step 5: Repeat Step 3 a total of n − 1 times until there is
only one cluster left.

Copyright © 2016. Innovative Research Publications. All Rights Reserved 433

Implementing K-Means for Achievement Study between Apache Spark and Map Reduce

Table 2 Results for K-Means using Spark (MLib)

Dataset Size Nodes Time (s)

64 MB 1 18
3.13MB 1 149

Table 3. Results for K-Means using Map

Reduce (Mahout)

Dataset Size Nodes Time (s)

64MB 1 44
3.13 MB 1 291
Figure 4: Result of speed up with respect to Spark and
3.13 MB 2 163 Map Reduce
The performance of the spark and Map Reduce are
compared with the metrics used for the analysis are:
3.2.3 ENERGY CONSUMPTION: SPARK VS MAP
scheduling delay, speed up, energy consumption with REDUCE
respect to the number of nodes in the cluster. Figure 5 shows the result of energy consumption with
respect to the Spark and Map Reduce Model. The Spark
3.2.1 SCHEDULING DELAY: SPARK VS MAP consumes less energy than Map Reduce. Its value gradually
REDUCE increases in regards to the number of cluster resource.
Figure 3 shows the result of scheduling delay with respect
to the spark and map reduce in the Hadoop cluster. The
spark is showing the good scheduling length compare to the
map reduce.

Figure 3. The result of scheduling delay with respect to

Spark and Map Reduce in the Hadoop Cluster

3.2.2 SPEED UP: SPARK VS MAP REDUCE

The speed up is the ratio of the sequential execution time to
the schedule length of the output schedule. Figure 4 shows Figure 5: Shows the results of energy consumption with
the result of speed up with respect to spark and Map respect to the Spark and Map Reduce
Reduce. The speed up of spark model is higher than the
other Map Reduce approaches, where its value is gradually
increasing with regard to the number of clusters. 4. CONCLUSION
This research paper gives a review of both the systems
furthermore analyzes these on different parameters took
after by an execution investigation utilizing K-Means
calculation. Our outcomes for this examination demonstrate
that Spark is an extremely solid contender and would
without a doubt achieve a change by utilizing as a part of
memory preparing. Watching Spark's capacity to perform
group handling, gushing, and machine learning on the same
bunch and taking a gander at the present rate of reception
of Spark all through the business, Spark will be the true
system for countless cases including Big Data preparing.

International Journal of Innovative Research in Engineering & Management (IJIREM)
ISSN: 2350-0557, Volume-3, Issue-5, September-2016

REFERENCES
[1] Apache Hive. http://hadoop.apache.org/hive
5Scalaprogramming language. http://www.scala-lang.org.
[2] C. Olston, B. Reed, U. Srivastava, R. Kumar, and A.
Tomkins. Pig latin: a not-so-foreign language for data

[3] Y. Yu, M. Isard, D. Fetterly, M. Budiu, U´ .

Erlingsson,P. K. Gunda, and J. Currey. DryadLINQ: A
system for general-purpose distributed data-parallel
computing using a high-level language. In OSDI ’08, San
Diego, CA, 2008.

[4] J. Dean and S. Ghemawat. MapReduce: Simplified data

processing on large clusters. Commun. ACM, 51(1):107
113, 2008.

[5] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly.

Dryad: Distributed data-parallel programs from sequential
building blocks. In EuroSys 2007, pages 59–72, 2007.

[6] B. Nitzberg and V. Lo. Distributed shared memory: a

survey of issues and algorithms. Computer, 24(8):52 –60,
Aug 1991.

[7] Spark Main Website

[8] Spark Examples

[9]Spark Summit 2014 Conference Presentation and Videos

[10]Spark on Databricks website

Testmagzine Admin,+115+manuscript 6
No ratings yet
Testmagzine Admin,+115+manuscript 6
8 pages
Ch. 4
No ratings yet
Ch. 4
4 pages
226 Unit-7
No ratings yet
226 Unit-7
26 pages
Unit 4
No ratings yet
Unit 4
8 pages
Performance Comparison of Apache Hadoop and Apache Spark
No ratings yet
Performance Comparison of Apache Hadoop and Apache Spark
5 pages
ICAI_2023_paper_3719
No ratings yet
ICAI_2023_paper_3719
6 pages
Big Data Processing With Apache Spark - Infoqdotcom
No ratings yet
Big Data Processing With Apache Spark - Infoqdotcom
16 pages
30 Comparative Performance Analysis of Apache Spark and Map Reduce Using K-Means E
No ratings yet
30 Comparative Performance Analysis of Apache Spark and Map Reduce Using K-Means E
7 pages
spark
No ratings yet
spark
9 pages
Fast and Interactive Analytics Over Hadoop Data With Spark
No ratings yet
Fast and Interactive Analytics Over Hadoop Data With Spark
7 pages
Hadoop Vs Apache Spark
No ratings yet
Hadoop Vs Apache Spark
6 pages
Hadoop Based Feature Selection and Decision Making Models On Big Data
No ratings yet
Hadoop Based Feature Selection and Decision Making Models On Big Data
6 pages
Spark SQL
100% (1)
Spark SQL
25 pages
Poetic Seminar
No ratings yet
Poetic Seminar
17 pages
Big Data Notes (All Lectures)
No ratings yet
Big Data Notes (All Lectures)
44 pages
Lab Manual BDA
No ratings yet
Lab Manual BDA
36 pages
A Critical Analysis of Apache Hadoop and Spark For Big Data Processing
No ratings yet
A Critical Analysis of Apache Hadoop and Spark For Big Data Processing
6 pages
Parallel Data Processing in The Cloud
No ratings yet
Parallel Data Processing in The Cloud
25 pages
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
From Everand
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Term Paper Java
No ratings yet
Term Paper Java
14 pages
V3i308 PDF
No ratings yet
V3i308 PDF
9 pages
Apache Spark
No ratings yet
Apache Spark
7 pages
Lec no 10
No ratings yet
Lec no 10
17 pages
Data Processing For Large Database Using Mapreduce Approach Using Apso
No ratings yet
Data Processing For Large Database Using Mapreduce Approach Using Apso
59 pages
Learn Apache Spark
100% (1)
Learn Apache Spark
31 pages
Apache Hadoop and Spark:: and Use Cases For Data Analysis
No ratings yet
Apache Hadoop and Spark:: and Use Cases For Data Analysis
48 pages
Replication-Based Query Management For Resource Allocation Using Hadoop and MapReduce Over Big Data
No ratings yet
Replication-Based Query Management For Resource Allocation Using Hadoop and MapReduce Over Big Data
13 pages
Big Data Analytics Litrature Review
No ratings yet
Big Data Analytics Litrature Review
7 pages
Big Data Analysis Using Apache Spark Mllib and Hadoop Hdfs With Scala and Java
No ratings yet
Big Data Analysis Using Apache Spark Mllib and Hadoop Hdfs With Scala and Java
8 pages
Introduction To Big Data PDF
No ratings yet
Introduction To Big Data PDF
16 pages
Big Data Architecture
No ratings yet
Big Data Architecture
17 pages
ECS765P_W4_Introduction to Spark
No ratings yet
ECS765P_W4_Introduction to Spark
39 pages
Hadoop: What Is Data Engineering? Hadoop Overview Hadoop Ecosystem
No ratings yet
Hadoop: What Is Data Engineering? Hadoop Overview Hadoop Ecosystem
9 pages
Master Cheat Sheet
No ratings yet
Master Cheat Sheet
61 pages
Big Data Analytics Presentation
No ratings yet
Big Data Analytics Presentation
30 pages
Map-Reduce (Hadoop) Based Data Clustering For BigData A Survey
No ratings yet
Map-Reduce (Hadoop) Based Data Clustering For BigData A Survey
6 pages
Spark Streaming Research
No ratings yet
Spark Streaming Research
6 pages
Big Data Emerging Technologie
No ratings yet
Big Data Emerging Technologie
10 pages
MA_VaishuAchini_VIT_24 - ICT703 - A3
No ratings yet
MA_VaishuAchini_VIT_24 - ICT703 - A3
21 pages
Bootcamp Keynote
No ratings yet
Bootcamp Keynote
47 pages
A Comparative Between Hadoop MapReduce and Apache
No ratings yet
A Comparative Between Hadoop MapReduce and Apache
4 pages
Act4 May2 6E BDA SEC
No ratings yet
Act4 May2 6E BDA SEC
4 pages
BIG DATA FRAMEWORKS
No ratings yet
BIG DATA FRAMEWORKS
3 pages
Data Science and Big Data UNIT 3
No ratings yet
Data Science and Big Data UNIT 3
11 pages
Spark Introduction
No ratings yet
Spark Introduction
25 pages
In9040 Phd Presentation Selimozcan 2
No ratings yet
In9040 Phd Presentation Selimozcan 2
36 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
Introduction To Big Data Technologies
No ratings yet
Introduction To Big Data Technologies
10 pages
BDA
No ratings yet
BDA
8 pages
Spark
No ratings yet
Spark
96 pages
Big Data Mining: A Challenge and How To Manage It: Csa Deptt. Pdmce Jitender Csa Deptt. Pdmce
No ratings yet
Big Data Mining: A Challenge and How To Manage It: Csa Deptt. Pdmce Jitender Csa Deptt. Pdmce
3 pages
Mapreduce: Simplified Data Analysis of Big Data: Sciencedirect
No ratings yet
Mapreduce: Simplified Data Analysis of Big Data: Sciencedirect
9 pages
Big Data Application Performance Monitoring in Retail ECommerce using Spark
No ratings yet
Big Data Application Performance Monitoring in Retail ECommerce using Spark
4 pages
Hadoop Job Runner UI Tool
No ratings yet
Hadoop Job Runner UI Tool
10 pages
StarPU: Parallel Computing and Task Scheduling Techniques
From Everand
StarPU: Parallel Computing and Task Scheduling Techniques
Richard Johnson
No ratings yet
Big Data and Hadoop: A Review Paper
No ratings yet
Big Data and Hadoop: A Review Paper
3 pages
IE494_Big_Data_Processing_Course_File_Autumn24_PMJ - PM Jat
No ratings yet
IE494_Big_Data_Processing_Course_File_Autumn24_PMJ - PM Jat
5 pages
Spark: Fast, Interactive, Language-Integrated Cluster Computing
No ratings yet
Spark: Fast, Interactive, Language-Integrated Cluster Computing
25 pages
Ijettjournal V1i1p20
No ratings yet
Ijettjournal V1i1p20
5 pages
Iijcs 2018 05 07 2
No ratings yet
Iijcs 2018 05 07 2
15 pages
What If We Knew What Happens After Death?: Naresh Kumar
No ratings yet
What If We Knew What Happens After Death?: Naresh Kumar
11 pages
Clustering Algorithms For Mixed Datasets: A Review: K. Balaji and K. Lavanya
No ratings yet
Clustering Algorithms For Mixed Datasets: A Review: K. Balaji and K. Lavanya
10 pages
MPRA Paper 20588
No ratings yet
MPRA Paper 20588
10 pages
Healthcare E Guide System Using K Means
No ratings yet
Healthcare E Guide System Using K Means
90 pages
Prediction Analysis Techniques of Data Mining: A Review
No ratings yet
Prediction Analysis Techniques of Data Mining: A Review
7 pages
5 CS 03 Ijsrcse
No ratings yet
5 CS 03 Ijsrcse
4 pages
1 A Modified Version
No ratings yet
1 A Modified Version
7 pages
Ijctt V71i2p105
No ratings yet
Ijctt V71i2p105
7 pages
A Fuzzy Approach For Multi-Type Relational Data Clustering
No ratings yet
A Fuzzy Approach For Multi-Type Relational Data Clustering
14 pages
Final Term Paper
No ratings yet
Final Term Paper
13 pages
LJ 9
No ratings yet
LJ 9
7 pages
JETIR1503025
No ratings yet
JETIR1503025
4 pages
Int J Communication - 2006 - Zheng - Energy Efficient Network Protocols and Algorithms For Wireless Sensor Networks
No ratings yet
Int J Communication - 2006 - Zheng - Energy Efficient Network Protocols and Algorithms For Wireless Sensor Networks
4 pages
Sensors 24 02197
No ratings yet
Sensors 24 02197
23 pages
SMK Means An Improved Mini Batch K Means Algorithm
No ratings yet
SMK Means An Improved Mini Batch K Means Algorithm
16 pages
Springer Jeevan Sensor Accepted Version
No ratings yet
Springer Jeevan Sensor Accepted Version
15 pages
HL Icdcs2020
No ratings yet
HL Icdcs2020
11 pages
Survey On Energy-Efficient Techniques For Wireless Sensor Networks
No ratings yet
Survey On Energy-Efficient Techniques For Wireless Sensor Networks
15 pages
Enery Saving Algorithms in Sensor Systems
No ratings yet
Enery Saving Algorithms in Sensor Systems
4 pages
Energy Efficient Routing Protocols For Wireless Sensor Network
No ratings yet
Energy Efficient Routing Protocols For Wireless Sensor Network
5 pages
Hierarchical Energy-Saving Routing Algorithm Using Fuzzy Logic in Wireless Sensor Networks
No ratings yet
Hierarchical Energy-Saving Routing Algorithm Using Fuzzy Logic in Wireless Sensor Networks
11 pages
Template
No ratings yet
Template
14 pages
An Energy Efficient Routing Protocol For Wireless Sensor Networks Using A-Star Algorithm
No ratings yet
An Energy Efficient Routing Protocol For Wireless Sensor Networks Using A-Star Algorithm
8 pages
Energy Efficient Routing Protocol
No ratings yet
Energy Efficient Routing Protocol
14 pages
Energy-Aware Data Processing Techniques For Wireless Sensor Networks: A Review
No ratings yet
Energy-Aware Data Processing Techniques For Wireless Sensor Networks: A Review
21 pages
Roostapour Dy KCo SMC08
No ratings yet
Roostapour Dy KCo SMC08
6 pages
IET Communications - 2019 - Nilsaz Dezfuli - Distributed Energy Efficient Algorithm For Ensuring Coverage of Wireless
No ratings yet
IET Communications - 2019 - Nilsaz Dezfuli - Distributed Energy Efficient Algorithm For Ensuring Coverage of Wireless
7 pages
Energy Saving With Node Sleep and Power Control Mechanisms For Wireless Sensor Networks
No ratings yet
Energy Saving With Node Sleep and Power Control Mechanisms For Wireless Sensor Networks
1 page
Dunkels 07 Demo
No ratings yet
Dunkels 07 Demo
2 pages
1 s2.0 S1877050914009077 Main
No ratings yet
1 s2.0 S1877050914009077 Main
8 pages
Array Programs
No ratings yet
Array Programs
7 pages
Account Determination for Accruals 1734972456
No ratings yet
Account Determination for Accruals 1734972456
12 pages
Welcome To Wildflower Design Co
No ratings yet
Welcome To Wildflower Design Co
14 pages
MM Homework 1
No ratings yet
MM Homework 1
13 pages
Samsung ML 1520
No ratings yet
Samsung ML 1520
138 pages
Central Processing Unit
No ratings yet
Central Processing Unit
38 pages
iOS DEVELOPER Interview Questions
No ratings yet
iOS DEVELOPER Interview Questions
6 pages
Database Group Assignment
No ratings yet
Database Group Assignment
34 pages
Cisco Script PT 3.6.1 Packetracer Skills Challenge
No ratings yet
Cisco Script PT 3.6.1 Packetracer Skills Challenge
5 pages
Provisioning Manager
No ratings yet
Provisioning Manager
21 pages
Online Training For: Free Access To The Ieee Elearning Library
No ratings yet
Online Training For: Free Access To The Ieee Elearning Library
2 pages
Telefire ADR-3000 Brochure PDF
No ratings yet
Telefire ADR-3000 Brochure PDF
2 pages
Rotary Ac Buyers Guide.2023.07
No ratings yet
Rotary Ac Buyers Guide.2023.07
17 pages
Anum Momin Resume
No ratings yet
Anum Momin Resume
1 page
Formatted User Agents
No ratings yet
Formatted User Agents
397 pages
Apex Developer Guide
No ratings yet
Apex Developer Guide
584 pages
Alibaba Group 2
No ratings yet
Alibaba Group 2
34 pages
Practical UML - A Hands-On Introduction For Developers
No ratings yet
Practical UML - A Hands-On Introduction For Developers
7 pages
Unit 5
No ratings yet
Unit 5
5 pages
Ldi Plus Device SCCM Setup Guide
No ratings yet
Ldi Plus Device SCCM Setup Guide
30 pages
Kunci Jawaban Merged
100% (1)
Kunci Jawaban Merged
350 pages
dbms q
No ratings yet
dbms q
17 pages
DrWeb_Crash (1)
No ratings yet
DrWeb_Crash (1)
20 pages
Amazon Web Services: Project
No ratings yet
Amazon Web Services: Project
13 pages
E 7751v2.2
No ratings yet
E 7751v2.2
94 pages
7.2 Netezza Odbc JDBC Guide
No ratings yet
7.2 Netezza Odbc JDBC Guide
74 pages
Karandeep Singh
No ratings yet
Karandeep Singh
2 pages
Workshop 2.1 Geometry Repair Engine Block: Introduction To ANSYS Icem CFD
No ratings yet
Workshop 2.1 Geometry Repair Engine Block: Introduction To ANSYS Icem CFD
20 pages
The Cybercrimes Act, 2015 (Tanzania)
100% (2)
The Cybercrimes Act, 2015 (Tanzania)
54 pages
Flow Chart and Algorithm - Development of Algorithms For Simple Problems
No ratings yet
Flow Chart and Algorithm - Development of Algorithms For Simple Problems
23 pages

Implementing K Means For Achievement Stu

Uploaded by

Implementing K Means For Achievement Stu

Uploaded by

International Journal of Innovative Research in Engineering & Management (IJIREM)

ISSN: 2350-0557, Volume-3, Issue-5, September-2016

Implementing K-Means for Achievement Study

Copyright © 2016. Innovative Research Publications. All Rights Reserved 431

Copyright © 2016. Innovative Research Publications. All Rights Reserved 432

Start additionally permits us to reserve the data in memory, 3. COMPARISON

Copyright © 2016. Innovative Research Publications. All Rights Reserved 433

Table 2 Results for K-Means using Spark (MLib)

Dataset Size Nodes Time (s)

Table 3. Results for K-Means using Map

Dataset Size Nodes Time (s)

Figure 3. The result of scheduling delay with respect to

3.2.2 SPEED UP: SPARK VS MAP REDUCE

Copyright © 2016. Innovative Research Publications. All Rights Reserved 434

[3] Y. Yu, M. Isard, D. Fetterly, M. Budiu, U´ .

[4] J. Dean and S. Ghemawat. MapReduce: Simplified data

[5] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly.

[6] B. Nitzberg and V. Lo. Distributed shared memory: a

[7] Spark Main Website

[8] Spark Examples

[9]Spark Summit 2014 Conference Presentation and Videos

[10]Spark on Databricks website

Copyright © 2016. Innovative Research Publications. All Rights Reserved 435

You might also like