A20528094_Assign2

This document summarizes two research papers on big data technologies: one surveys the landscape, technologies, applications, and challenges of big data, while the other analyzes the performance of Apache Hadoop and Apache Spark. The survey highlights key characteristics of big data and its applications across various sectors, while the performance analysis reveals that Spark outperforms Hadoop in specific workloads. Together, these papers provide valuable insights into the current state and future directions of big data research.

Uploaded by

gaurav

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

A20528094_Assign2

Uploaded by

gaurav

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

1.

Summary
This document summarizes two research papers focused on different facets of big data technologies: a
survey of big data technologies, terminologies, and applications, and a performance analysis of Apache
Hadoop and Apache Spark.

Big Data Survey

The first paper provides a broad overview of the rapidly evolving landscape of big data. It addresses the
increasing importance of managing and processing vast amounts of data for decision-making, scientific
research, and business intelligence. The survey defines big data by its key characteristics: volume,
velocity, variety, veracity, and value (the 5 Vs). It examines various big data technologies related to
storage, processing, and security. These include tools like NoSQL databases (e.g., Cassandra), Hadoop,
Storm, Spark, Hive, and OpenRefine, and frameworks like Apache Flume, Sqoop, Pig, Hive, ZooKeeper,
Cassandra, and Hadoop, and also, components of Hadoop's ecosystem.
The survey highlights the importance of visual analytics. Visual analytics (VA) depends on three main
layers: Visualization, analytics, and data management. Besides the technologies, the survey also
explores the application of big data across various sectors like smart cities, network communication,
business management, IoT, cloud computing, fog computing, edge computing, health care, and
agriculture, and the challenges associated with them. Most of the papers reviewed utilized databases,
such as machine learning, Deep-learning, cloud computing, and big data analytics methods. Big data
analytics is the most commonly utilized method among the different applications
Finally, the survey identifies issues related to big data management like the separation of the data storage
layer and the management layer, and also, discusses challenges, and proposes future research directions
such as dynamic edge computing, and ensemble algorithms for classifying data and management layers.

Performance Analysis of Hadoop and Spark

The second paper presents a comprehensive performance comparison between Apache Hadoop and
Apache Spark. These distributed computing frameworks are essential for analyzing large-scale datasets,
the study aims to identify the most impacting parameters (resource utilization, input splits, and shuffle
behavior) that influence the performance of these frameworks. A real cluster, with a large-scale dataset is
implemented to do the performance analysis based on the workloads WordCount and TeraSort.
Performance metrics for this implementation included execution time, throughput, and speedup.
The experiment results show that the performance is highly dependent on data input size and correct
parameter selection. Spark has better performance as compared to Hadoop, and using a factory set is not
always the optimal approach for better performance, with the right reconfiguration, the performance can
be improved further. Also, spark excels in workloads with small data sets achieving two times the
speedup in WordCount and up to 14 times speedup in TeraSort workloads.

Combined Insights & Conclusion

Together, these papers provide a holistic view of big data. The survey gives insight into terminologies,
technologies, applications, and challenges while the analysis shows the practical considerations of big
data implementation. Both papers show the impact of big data, and both show challenges and constraints
that should be considered for future research.
2. hadoop fs –ls /
3. /user
4. /user/csp554

5. /user/csp554-5
6. Copy

7. Copy

8. GCS to hadoop master node

Introduction To Big Data PDF
No ratings yet
Introduction To Big Data PDF
16 pages
Review Paper On Big Data Analytics in Cloud Computing: July 2017
No ratings yet
Review Paper On Big Data Analytics in Cloud Computing: July 2017
6 pages
A Study of Big Data Analytics Using Apache Spark With Python and Scala
No ratings yet
A Study of Big Data Analytics Using Apache Spark With Python and Scala
8 pages
Big Data
No ratings yet
Big Data
4 pages
Big Data Analysis Using Apache Spark Mllib and Hadoop Hdfs With Scala and Java
No ratings yet
Big Data Analysis Using Apache Spark Mllib and Hadoop Hdfs With Scala and Java
8 pages
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Introduction To Big Dat1
No ratings yet
Introduction To Big Dat1
6 pages
A Survey On Estimation of Time On Hadoop Cluster For Data Computation
No ratings yet
A Survey On Estimation of Time On Hadoop Cluster For Data Computation
4 pages
Iijcs 2018 05 07 2
No ratings yet
Iijcs 2018 05 07 2
15 pages
Jifs223295 2
No ratings yet
Jifs223295 2
25 pages
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
No ratings yet
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
6 pages
Big Data Analytics Using Apache Hadoop
No ratings yet
Big Data Analytics Using Apache Hadoop
33 pages
Data Analytics mid sem notes
No ratings yet
Data Analytics mid sem notes
9 pages
Bigdata
No ratings yet
Bigdata
12 pages
ucPDF (14)
No ratings yet
ucPDF (14)
10 pages
V3i308 PDF
No ratings yet
V3i308 PDF
9 pages
Big Data Analysis Using Apache Spark Mllib and Hadoop Hdfs With Scala and Java
No ratings yet
Big Data Analysis Using Apache Spark Mllib and Hadoop Hdfs With Scala and Java
9 pages
An Insight On Big Data Analytics Using Pig Script
No ratings yet
An Insight On Big Data Analytics Using Pig Script
7 pages
(IJCST-V5I4P10) :M Dhavapriya
No ratings yet
(IJCST-V5I4P10) :M Dhavapriya
5 pages
BIG DATA PYQ 21-22
No ratings yet
BIG DATA PYQ 21-22
9 pages
A Comparative Study On Apache Spark and Map Reduce With Performance Analysis Using KNN and Page Rank Algorithm
No ratings yet
A Comparative Study On Apache Spark and Map Reduce With Performance Analysis Using KNN and Page Rank Algorithm
6 pages
Last Min Preparation -Big Data
No ratings yet
Last Min Preparation -Big Data
5 pages
Big Data Analysis Using Hadoop and Spark
No ratings yet
Big Data Analysis Using Hadoop and Spark
36 pages
IOT and Comp.architecture
No ratings yet
IOT and Comp.architecture
17 pages
Big Data Platforms
No ratings yet
Big Data Platforms
8 pages
Ashish_Presentation_Stage1_modify_LR
No ratings yet
Ashish_Presentation_Stage1_modify_LR
24 pages
Big Data Processing With Apache Spark - Infoqdotcom
No ratings yet
Big Data Processing With Apache Spark - Infoqdotcom
16 pages
Hadoop
No ratings yet
Hadoop
14 pages
Big Data Problems: Understanding Hadoop Framework: G S Aditya Rao, Palak Pandey
No ratings yet
Big Data Problems: Understanding Hadoop Framework: G S Aditya Rao, Palak Pandey
3 pages
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
No ratings yet
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
8 pages
2 emerging
No ratings yet
2 emerging
10 pages
30 Comparative Performance Analysis of Apache Spark and Map Reduce Using K-Means E
No ratings yet
30 Comparative Performance Analysis of Apache Spark and Map Reduce Using K-Means E
7 pages
BDA
No ratings yet
BDA
8 pages
The Big Big Data' Question Hadoop or Spark
No ratings yet
The Big Big Data' Question Hadoop or Spark
3 pages
Experiment No _ 1 Bda
No ratings yet
Experiment No _ 1 Bda
10 pages
Analyzing Bigdata With Hadoop Cluster in Hdinsight Azure Cloud
No ratings yet
Analyzing Bigdata With Hadoop Cluster in Hdinsight Azure Cloud
5 pages
A Comparative Between Hadoop MapReduce and Apache
No ratings yet
A Comparative Between Hadoop MapReduce and Apache
4 pages
BATCH12
No ratings yet
BATCH12
32 pages
Terminologies Used in Big Data Environments
No ratings yet
Terminologies Used in Big Data Environments
3 pages
Big_Data_Research_Paper
No ratings yet
Big_Data_Research_Paper
3 pages
Master Spark Concepts
No ratings yet
Master Spark Concepts
112 pages
Cours BI 23 24 Session 4 2
No ratings yet
Cours BI 23 24 Session 4 2
46 pages
Big Data Analytics
From Everand
Big Data Analytics
Venkat Ankam
No ratings yet
PySpark Essentials: A Practical Guide to Distributed Computing
From Everand
PySpark Essentials: A Practical Guide to Distributed Computing
Robert Johnson
No ratings yet
Big Data Tools and Its Framework
No ratings yet
Big Data Tools and Its Framework
5 pages
Assignment 1 Spec
No ratings yet
Assignment 1 Spec
5 pages
Big Data Analytics Litrature Review
No ratings yet
Big Data Analytics Litrature Review
7 pages
Cloud Comp Techno
No ratings yet
Cloud Comp Techno
5 pages
IRJET - Big Data-A Review Study With Comp
No ratings yet
IRJET - Big Data-A Review Study With Comp
6 pages
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
From Everand
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
David R Swinburne
No ratings yet
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
36 pages
Big Data Analysis Using Hadoop: A Survey: August 2015
No ratings yet
Big Data Analysis Using Hadoop: A Survey: August 2015
6 pages
Apache Hadoop and Spark:: and Use Cases For Data Analysis
No ratings yet
Apache Hadoop and Spark:: and Use Cases For Data Analysis
48 pages
Data Science
No ratings yet
Data Science
87 pages
Hadoop Blueprints
From Everand
Hadoop Blueprints
Anurag Shrivastava
No ratings yet
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
17 pages
BDT Viva Questions
No ratings yet
BDT Viva Questions
2 pages
BIT4440 BSE4040 CloudComputing 3.big Data Technologies
No ratings yet
BIT4440 BSE4040 CloudComputing 3.big Data Technologies
43 pages
Hadoop Vs Apache Spark
No ratings yet
Hadoop Vs Apache Spark
6 pages
ICAI_2023_paper_3719
No ratings yet
ICAI_2023_paper_3719
6 pages
DSBDA GRP B Print
No ratings yet
DSBDA GRP B Print
21 pages
ITM 100 All Notes
No ratings yet
ITM 100 All Notes
58 pages
Reference: Apache Hadoop: Hadoop: The Definitive Guide, by Tom White, 2 Edition, Oreilly's, 2010
100% (1)
Reference: Apache Hadoop: Hadoop: The Definitive Guide, by Tom White, 2 Edition, Oreilly's, 2010
57 pages
9-10 Spark Architecture
No ratings yet
9-10 Spark Architecture
25 pages
IT6006 Data Analytics
No ratings yet
IT6006 Data Analytics
12 pages
Learning Apache Kafka 2nd Edition Start from scratch and learn how to administer Apache Kafka effectively for messaging Nishant Garg pdf download
100% (1)
Learning Apache Kafka 2nd Edition Start from scratch and learn how to administer Apache Kafka effectively for messaging Nishant Garg pdf download
75 pages
Installation Guide Apache Kylin
100% (1)
Installation Guide Apache Kylin
17 pages
CC Question Bank All Units
No ratings yet
CC Question Bank All Units
28 pages
3.A Sample Case Study On MongoDB
No ratings yet
3.A Sample Case Study On MongoDB
9 pages
A Guide To Best Practices: Putting The Data Lake To Work
No ratings yet
A Guide To Best Practices: Putting The Data Lake To Work
12 pages
Scalable Machine-Learning Algorithms For Big Data Analytics: A Comprehensive Review
No ratings yet
Scalable Machine-Learning Algorithms For Big Data Analytics: A Comprehensive Review
21 pages
Oracle MySQL PreSales Specialist Assessment 1
0% (1)
Oracle MySQL PreSales Specialist Assessment 1
14 pages
Hadoop HDFS Commands With Examples
No ratings yet
Hadoop HDFS Commands With Examples
3 pages
构建基于Apache Kylin的大数据分析平台讲话
No ratings yet
构建基于Apache Kylin的大数据分析平台讲话
37 pages
Modal Question Paper
No ratings yet
Modal Question Paper
1 page
Hadoop Installation Steps
100% (1)
Hadoop Installation Steps
6 pages
Parallel Processing
No ratings yet
Parallel Processing
38 pages
RTNU PHD Syllabus - Computer Application
No ratings yet
RTNU PHD Syllabus - Computer Application
14 pages
Buy ebook MATLAB Parallel Computing Toolbox User s Guide The Mathworks cheap price
No ratings yet
Buy ebook MATLAB Parallel Computing Toolbox User s Guide The Mathworks cheap price
55 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
Big Data Now 2012 Edition O'Reilly Radar Team All Chapters Instant Download
100% (7)
Big Data Now 2012 Edition O'Reilly Radar Team All Chapters Instant Download
50 pages
Abdul Atif-IICS Support Engineer - INT - 1-24
No ratings yet
Abdul Atif-IICS Support Engineer - INT - 1-24
12 pages
CC Unit 5 Notes
No ratings yet
CC Unit 5 Notes
30 pages
2019 2020 Ieee Hadoop Big Data Project Titles
No ratings yet
2019 2020 Ieee Hadoop Big Data Project Titles
2 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
1 page
Cloud Computoing Module I
No ratings yet
Cloud Computoing Module I
24 pages
Pig Vs Hive VS Native Map Reduc E: Pangool
No ratings yet
Pig Vs Hive VS Native Map Reduc E: Pangool
6 pages
HDFS Vs CFS
No ratings yet
HDFS Vs CFS
14 pages
454U8-Big Data Analytics
No ratings yet
454U8-Big Data Analytics
22 pages
AIA 6550 Module 4
No ratings yet
AIA 6550 Module 4
13 pages

A20528094_Assign2

Uploaded by

A20528094_Assign2

Uploaded by

1.

Big Data Survey

Performance Analysis of Hadoop and Spark

Combined Insights & Conclusion

8. GCS to hadoop master node

You might also like