0% found this document useful (0 votes)

83 views

Big Data Hadoop Insight

The document provides an overview of big data, Hadoop Distributed File System (HDFS), and Hadoop. It discusses HDFS architecture and data flows. For Hadoop, it explains MapReduce and how Hadoop runs jobs and schedules tasks. Various use cases for big data analytics in domains like financial services, healthcare, retail, web/social media are also covered. The document is a presentation from the Data Science Analytics & Research Centre.

Uploaded by

S Samitt

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views

Big Data Hadoop Insight

Uploaded by

S Samitt

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Data Science Analytics &

Research Centre

9/20/2014

Data Science Analytics & Research Centre

Big Data

HDFS

Hadoop

9/20/2014

Big Data Overview

Characteristics
Applications & Use Case

Hadoop Distributed File System (HDFS) Overview

HDFS Architecture
Data replication
Node types
Jobtracker / Tasktracker
HDFS Data Flows
HDFS Limitations

Hadoop Overview
Inputs & Outputs
Data Types
What is MapReduce (MR)
Example
Functionalities of MR
Speculative Execution
Hadoop Streaming
Hadoop Job Scheduling

Data Science Analytics & Research Centre

Big Data Overview

Characteristics
Applications & Use Case
Data Footprint & Time Horizon
Technology Adoption Lifecycle

9/20/2014

Data Science Analytics & Research Centre

9/20/2014

Data Science Analytics & Research Centre

9/20/2014

Data Science Analytics & Research Centre

9/20/2014

Data Science Analytics & Research Centre

Real
Time

Near
Real
Time

Hourly

Daily
Weekly

Monthly
Quarterly

Yearly

3 Years

5 Years

10 Years

Highly
Summarized
Visualization &
Dashboards

Aggregated

Analytic
Marts & Cubes

Detailed
Events / Facts

Predictive
Analytics

Core ERP
& Legacy Applications
& Data Warehouse

Unstructured
Web /
Telemetry

Big Data
Hadoop etc.

Consumption
Source

9/20/2014

Real Time
GB

Daily

Monthly
TB

Data Science Analytics & Research Centre

Yearly
PB

9/20/2014

Data Science Analytics & Research Centre

9/20/2014

Data Science Analytics & Research Centre

9/20/2014

Data Science Analytics & Research Centre

Financial Services

Healthcare

Detect fraud

Optimal treatment pathways

Model and manage risk

Remote patient monitoring

Improve debt recovery rates

Predictive modeling for new drugs

Personalize banking/insurance
products

Personalized medicine

Retail
In-store behavior analysis

Cross selling
Optimize pricing, placement, design
Optimize inventory and distribution
9/20/2014

Data Science Analytics & Research Centre

Web / Social / Mobile

Government

Location-based marketing

Reduce fraud

Social segmentation

Segment populations, customize

action

Sentiment analysis

Price comparison services

Support open data initiatives

Automate decision making

Manufacturing
Design to value
Crowd-sourcing
Digital factory for lean manufacturing
Improve service via product sensor data
9/20/2014

Data Science Analytics & Research Centre

9/20/2014

Data Science Analytics & Research Centre

Hadoop Distributed File System (HDFS)

Overview
HDFS Architecture
Data replication
Node types
Jobtracker / Tasktracker
HDFS Data Flows
HDFS Limitations

9/20/2014

Data Science Analytics & Research Centre

Hadoop own implementation of distributed file system.

Is coherent and provides all facilities of a file system.
Implements ACLs and provides a subset of usual UNIX
commands for accessing or querying the filesystem.
It has large block size (default 64MB) 128MB
recommended for storage to compensate for seek time to
network bandwidth. So very large files for storage are
ideal.
Streaming data access. Write once and read many times
architecture. Since files are large time to read is significant
parameter than seek to first record.
Commodity hardware. It is designed to run on commodity
hardware which may fail. HDFS is capable of handling it.
E.g.: 420MB file is split as:
128 MB

9/20/2014

128 MB

36 MB

Data Science Analytics & Research Centre

9/20/2014

Data Science Analytics & Research Centre

9/20/2014

Data Science Analytics & Research Centre

File 1

Create
Complete
B1

Namenode

Rack 1

9/20/2014

Rack 2

Data Science Analytics & Research Centre

Rack 3

9/20/2014

Data Science Analytics & Research Centre

9/20/2014

Data Science Analytics & Research Centre

HDFS Flow Read

9/20/2014

HDFS Flow Write

Data Science Analytics & Research Centre

Command

Usage

Syntax

cat

Copies source paths to stdout

hadoop dfs -cat URI [URI ]

chgrp

Change group association of files. With -R, make the

change recursively through the directory structure. hadoop dfs -chgrp [-R] GROUP URI [URI ]

chmod

Change the permissions of files. With -R, make the hadoop dfs -chmod [-R] <MODE[,MODE]... |
change recursively through the directory structure OCTALMODE> URI [URI ]

chown
copyFromLocal
copyToLocal
cp
du
dus
expunge
get
getmerge
ls (or) lsr
9/20/2014

Change the owner of files. With -R, make the

hadoop dfs -chown [-R] [OWNER][:[GROUP]] URI
change recursively through the directory structure [URI ]
Similar to put command, except that the source is
restricted to a local file reference.
hadoop dfs -copyFromLocal <localsrc> URI
Similar to get command, except that the destination hadoop dfs -copyToLocal [-ignorecrc] [-crc] URI
is restricted to a local file reference.
<localdst>
Copy files from source to destination
hadoop dfs -cp URI [URI ] <dest>
Displays aggregate length of files contained in the
directory or the length of a file in case its just a file.
Displays a summary of file lengths.
Empty the Trash
Copy files to the local file system
Concatenates files in source into the destination
local file
File - returns stat on the file
Directory - returns list of its direct children

hadoop dfs -du URI [URI ]

hadoop dfs -dus <args>
hadoop dfs -expunge
hadoop dfs -get [-ignorecrc] [-crc] <src> <localdst>
hadoop dfs -getmerge <src> <localdst> [addnl]
hadoop dfs -ls <args>

Data Science Analytics & Research Centre

Command

Usage

Syntax

mkdir

Takes path uri's as argument and creates

directories

hadoop dfs -mkdir <paths>

dfs -moveFromLocal <src> <dst>

movefromLocal
mv

setrep

Moves files from source to destination

hadoop dfs -mv URI [URI ] <dest>
Copy single src, or multiple srcs from local file
system to the destination filesystem
hadoop dfs -put <localsrc> ... <dst>
Delete files specified as args. Only deletes non
empty directory and files
hadoop dfs -rm URI [URI ]
Changes the replication factor of a file. -R option
is for recursively increasing the replication factor
of files within a directory
hadoop dfs -setrep [-R] <path>

stat

Returns the stat information on the path

hadoop dfs -stat URI [URI ]

tail

hadoop dfs -tail [-f] URI

text

Displays last kilobyte of the file to stdout

e - if the file exists
z - if the file is zero length
d - if the path is directory
Takes a source file and outputs the file in text
format

touchz

Create a file of zero length

hadoop dfs -touchz URI [URI ]

put
rm (or) rmr

test

9/20/2014

hadoop dfs -test -[ezd] URI

hadoop dfs -text <src>

Data Science Analytics & Research Centre

Low latency data access. It is not optimized for low latency data access it
trades latency to increase the throughput of the data.
Lots of small files. Since block size is 64 MB and lots of small files(will
waste blocks) will increase the memory requirements of namenode.
Multiple writers and arbitrary modification. There is no support for
multiple writers in HDFS and files are written to by a single writer after
end of each file.

9/20/2014

Data Science Analytics & Research Centre

Hadoop Overview
Inputs & Outputs
Data Types
What is MR
Example
Functionalities of MR
Speculative Execution
How Hadoop runs MR
Hadoop Streaming
Hadoop Job Scheduling

9/20/2014

Data Science Analytics & Research Centre

Hadoop is a framework which provides open source libraries for distributed computing
using simple single map-reduce interface and its own distributed filesystem called HDFS. It
facilitates scalability and takes cares of detecting and handling failures.

9/20/2014

Data Science Analytics & Research Centre

1.0.X

- current stable version, 1.0 release

1.1.X

- current beta version, 1.1 release

2.X.X

- current alpha version

0.23.X - similar to 2.X.X but missing NN HA.

0.22.X - does not include security
0.20.203.X

- old legacy stable version

0.20.X - old legacy version

9/20/2014

Data Science Analytics & Research Centre

9/20/2014

Data Science Analytics & Research Centre

Risk Modeling:
How business/industry can better understand

customers and market.

Customer Churn Analysis:

Why companies really loose customers.

Recommendation Engine:
How to predict customer preferences.

9/20/2014

Data Science Analytics & Research Centre

AD Targeting:

How to increase campaign efficiency.

Point of Sale Transaction Analysis:

Targeting promotions to make customers buy.

Predicting network Failure:

Using machine-generated data to identify trouble spots.

9/20/2014

Data Science Analytics & Research Centre

Threat Analysis:

Detecting threats and fraudulent analysis.

Trade Surveillance:
Help business spot the rogue trader.

Search Quality:
Delivering more relevant search results to customers.

9/20/2014

Data Science Analytics & Research Centre

Framework is introduced by google.

Process vast amounts of data (multi-terabyte data-sets) in-parallel.

Achieves high performance on large clusters (thousands of nodes) of commodity

hardware in a reliable, fault-tolerant manner.

Splits the input data-set into independent chunks.

Sorts the outputs of the maps, which are then input to the reduce tasks.

Takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

9/20/2014

Data Science Analytics & Research Centre

The MapReduce framework operates exclusively on <key, value> pairs, that is, the
framework views the input to the job as a set of <key, value> pairs and produces a set of

<key, value> pairs as the output of the job, conceivably of different types.

The key and value classes have to be serializable by the framework and hence need to
implement the Writable interface. Additionally, the key classes have to implement the

WritableComparable interface to facilitate sorting by the framework.

Input and Output types of a MapReduce job:

(input) <k1, v1> -> map -> <k2, v2> -> combine -> <k2, List(v2)> -> reduce -> <k3, v3> (output)

9/20/2014

Data Science Analytics & Research Centre

9/20/2014

Data Science Analytics & Research Centre

9/20/2014

Data Science Analytics & Research Centre

9/20/2014

Data Science Analytics & Research Centre

Serialization is the process of turning structured objects into a byte stream for transmission over
a network or for writing to persistent storage.

Hadoop has writable interface supporting serialization

There are following predefined implementations available for WritableComparable.
1. IntWritable
2. LongWritable
3. DoubleWritable
4. VLongWritable. Variable size, stores as much as needed. 1-9 bytes storage
5. VIntWritable. Less used ! as it is pretty much represented by Vlong.
6. BooleanWritable
7. FloatWritable

9/20/2014

Data Science Analytics & Research Centre

BytesWritable.

NullWritable.

10. MD5Hash
11. ObjectWritable
12. GenericWritable

Apart from the above there are four Writable Collection types
1.

ArrayWritable

TwoDArrayWritable

MapWritable

SortedMapWritable

9/20/2014

Data Science Analytics & Research Centre

MapperClass

Input Data
Input Data
Format
<K1, V1>
Mapper

<K2, V2>

ReducerClass

<K2, List(V2)>
Reducer
<K3, V3>

9/20/2014

public void map(LongWritable key, Text value,

OutputCollector<Text, IntWritable> output, Reporter
reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}

public void reduce(Text key, Iterator<IntWritable> values,

OutputCollector<Text, IntWritable> output, Reporter
reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}

Data Science Analytics & Research Centre

$ bin/hadoop dfs -cat /usr/joe/wordcount/input/file01

Hello World Bye World
$ bin/hadoop dfs -cat /usr/joe/wordcount/input/file02
Hello Hadoop Goodbye Hadoop
Run the application:
$ bin/hadoop jar /usr/joe/wordcount.jar org.myorg.WordCount /usr/joe/wordcount/input /usr/joe/wordcount/output

Mapper implementation:

Combiner implementation:

Lines: 18 - 25
The first map emits:
< Hello, 1>
< World, 1>
< Bye, 1>
< World, 1>

Line: 46
Output of first map emits:
< Bye, 1>
< Hello, 1>
< World, 2>

The second map emits:

< Hello, 1>
< Hadoop, 1>
< Goodbye, 1>
< Hadoop, 1>

Output of second map emits:

< Goodbye, 1>
< Hadoop, 2>
< Hello, 1>

9/20/2014

Data Science Analytics & Research Centre

Reducer implementation:
Lines: 29 - 35
Output of job:
< Bye, 1>
< Goodbye, 1>
< Hadoop, 2>
< Hello, 2>
< World, 2>

A way of coping with individual Machine performance

The same input can be processed multiple times in parallel, to exploit differences in machine
capabilities
Hadoop platform will schedule redundant copies of the remaining tasks across several nodes which do
not have other work to perform

Name

Value

Description

mapred.map.tasks.
speculative.execution

true

If true, then multiple instances of some map

tasks may be executed in parallel.

Mapred.reduce.tasks.
speculative.execution

true

If true, then multiple instances of some reduce

tasks may be executed in parallel.

9/20/2014

Data Science Analytics & Research Centre

9/20/2014

Data Science Analytics & Research Centre

Utility that comes with the Hadoop distribution

Allows you to create and run map/reduce jobs with any executable or script as the mapper
and/or the reducer
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper\
-reducer /bin/wc \
-jobconf mapred.reduce.tasks=2

9/20/2014

Data Science Analytics & Research Centre

9/20/2014

Data Science Analytics & Research Centre

Default Scheduler

Single priority based queue of jobs.

Scheduling tries to balance map and reduce load on all tasktrackers in the cluster.

Capacity Scheduler

Within a queue, jobs with higher priority will have access to the queue's resources before jobs with
lower priority.

In order to prevent one or more users from monopolizing its resources, each queue enforces a limit on
the percentage of resources allocated to a user at any given time, if there is competition for them.

Fair Scheduler

Multiple queues (pools) of jobs sorted in FIFO or by fairness limits

Each pool is guaranteed a minimum capacity and excess is shared by all jobs using a fairness algorithm.

Scheduler tries to ensure that over time, all jobs receive the same number of resources.

9/20/2014

Data Science Analytics & Research Centre

Thank you !!

Data Science
Analytics &
Research Centre

9/20/2014

Data Science Analytics & Research Centre

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Big Data Exam Correction
100% (1)
Big Data Exam Correction
10 pages
AI Berkeley Solution PDF
No ratings yet
AI Berkeley Solution PDF
9 pages
Big Data Fundamentals
100% (2)
Big Data Fundamentals
235 pages
ITS OD 201 Databases
100% (1)
ITS OD 201 Databases
2 pages
"Resume Ranking Using NLP and Machine Learning": Bachelor of Engineering
No ratings yet
"Resume Ranking Using NLP and Machine Learning": Bachelor of Engineering
41 pages
Military Strategy Theory N Concepts
No ratings yet
Military Strategy Theory N Concepts
302 pages
Nodes - Rahu & Ketu
No ratings yet
Nodes - Rahu & Ketu
14 pages
Modeling With UML: Solutions
No ratings yet
Modeling With UML: Solutions
6 pages
SQL Certification Study Guide
No ratings yet
SQL Certification Study Guide
2 pages
Data Mining
No ratings yet
Data Mining
6 pages
HBase Presentation
No ratings yet
HBase Presentation
23 pages
Test Blanc
No ratings yet
Test Blanc
23 pages
PDF
No ratings yet
PDF
23 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
50 pages
Download Full (Ebook) Artificial Intelligence and Heuristics for Smart Energy Efficiency in Smart Cities: Case Study: Tipasa, Algeria by Mustapha Hatti ISBN 9783030920371, 3030920372 PDF All Chapters
100% (8)
Download Full (Ebook) Artificial Intelligence and Heuristics for Smart Energy Efficiency in Smart Cities: Case Study: Tipasa, Algeria by Mustapha Hatti ISBN 9783030920371, 3030920372 PDF All Chapters
81 pages
Devoir Surveillé: Please Answer The Following Multiple-Choice Questions
No ratings yet
Devoir Surveillé: Please Answer The Following Multiple-Choice Questions
8 pages
Bigdatacourse
No ratings yet
Bigdatacourse
10 pages
File Types in Data Engineering!
No ratings yet
File Types in Data Engineering!
18 pages
BDT Quiz
No ratings yet
BDT Quiz
4 pages
PySpark RDD Assignment
No ratings yet
PySpark RDD Assignment
1 page
Unit 5.2 Issues With and Limitations of Hadoop v1 and MapReduce v1
No ratings yet
Unit 5.2 Issues With and Limitations of Hadoop v1 and MapReduce v1
15 pages
Bigdataaaaa
No ratings yet
Bigdataaaaa
180 pages
Map Reduce Examples
No ratings yet
Map Reduce Examples
16 pages
NoSQL Intro
No ratings yet
NoSQL Intro
26 pages
DDBMS Exam Questions
No ratings yet
DDBMS Exam Questions
3 pages
Matrix Chain Multiplication
No ratings yet
Matrix Chain Multiplication
13 pages
Chapter 17-CORBA Case Study
100% (2)
Chapter 17-CORBA Case Study
47 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
DIGI-Net: A Deep Convolutional Neural Network For Multi-Format Digit Recognition
No ratings yet
DIGI-Net: A Deep Convolutional Neural Network For Multi-Format Digit Recognition
11 pages
Consensus
No ratings yet
Consensus
77 pages
Hive Queries
No ratings yet
Hive Queries
5 pages
10
No ratings yet
10
4 pages
CH 23
No ratings yet
CH 23
126 pages
Data Mining Final Exam
No ratings yet
Data Mining Final Exam
1 page
Constraint Satisfaction Problem
No ratings yet
Constraint Satisfaction Problem
9 pages
Single Link Example
No ratings yet
Single Link Example
8 pages
Data Warehousing
No ratings yet
Data Warehousing
39 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
33 pages
Mod 1-10 Quiz
No ratings yet
Mod 1-10 Quiz
26 pages
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
No ratings yet
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
112 pages
LAST Final Exam For G-12
No ratings yet
LAST Final Exam For G-12
3 pages
MapReduce Example
No ratings yet
MapReduce Example
76 pages
An To An A That It It An: I. (L, The
No ratings yet
An To An A That It It An: I. (L, The
10 pages
LAB03-Creating An ETL Solution With SSIS
No ratings yet
LAB03-Creating An ETL Solution With SSIS
9 pages
Hive Join
No ratings yet
Hive Join
6 pages
Chapter 5 - CPU Scheduling
100% (1)
Chapter 5 - CPU Scheduling
41 pages
ATLAS Transformation Language: Rubby Casallas Grupo de Construcción de Software Uniandes
No ratings yet
ATLAS Transformation Language: Rubby Casallas Grupo de Construcción de Software Uniandes
18 pages
3170722_BDA_Lab Manual(1)
No ratings yet
3170722_BDA_Lab Manual(1)
78 pages
HDFS Exercises - Basic
No ratings yet
HDFS Exercises - Basic
5 pages
DATA ANALYTICS Lab
No ratings yet
DATA ANALYTICS Lab
3 pages
Association Rules FP Growth
No ratings yet
Association Rules FP Growth
32 pages
Course Contents of Hadoop and Big Data
No ratings yet
Course Contents of Hadoop and Big Data
11 pages
Full Stack UNIT 3
No ratings yet
Full Stack UNIT 3
36 pages
Hive Using Hiveql
No ratings yet
Hive Using Hiveql
38 pages
PHP Global Variables
No ratings yet
PHP Global Variables
6 pages
The Coin Changing Problem The Coin Changing Problem
No ratings yet
The Coin Changing Problem The Coin Changing Problem
17 pages
Subject Name Parallel and Distributed Computing
100% (1)
Subject Name Parallel and Distributed Computing
3 pages
Recherche D Information
No ratings yet
Recherche D Information
34 pages
WEEK 1: E-R Model: 1. BUS Entity
No ratings yet
WEEK 1: E-R Model: 1. BUS Entity
8 pages
Spark DataFrames Project Exercise - Jupyter Notebook
No ratings yet
Spark DataFrames Project Exercise - Jupyter Notebook
7 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
1 - Big Data and Hadoop Framework
No ratings yet
1 - Big Data and Hadoop Framework
40 pages
Essence of Dwaadasha Upanishads
100% (1)
Essence of Dwaadasha Upanishads
366 pages
Electoral Roll
100% (1)
Electoral Roll
200 pages
Pepper Grass
No ratings yet
Pepper Grass
14 pages
New Army
No ratings yet
New Army
120 pages
Photographs 19th Century
No ratings yet
Photographs 19th Century
184 pages
Essay White Heron
No ratings yet
Essay White Heron
7 pages
Shahsti Dieties
No ratings yet
Shahsti Dieties
1 page
Pitts Piedras Negras History
No ratings yet
Pitts Piedras Negras History
198 pages
Talent Acquisition and Retention in Social Enterprises
No ratings yet
Talent Acquisition and Retention in Social Enterprises
31 pages
Vimshottari Maha Dasa Secret Code Revealed
100% (1)
Vimshottari Maha Dasa Secret Code Revealed
3 pages
Navamsa in Astrology
0% (1)
Navamsa in Astrology
6 pages
Navamsa and Pushkar Navamsa - Interpretation
100% (2)
Navamsa and Pushkar Navamsa - Interpretation
9 pages
Rectify BTR
100% (2)
Rectify BTR
2 pages
COS20028 Assignment1 2023
No ratings yet
COS20028 Assignment1 2023
3 pages
Unit v Programming Model
No ratings yet
Unit v Programming Model
53 pages
Chapter 10: Big Data: Database System Concepts, 7 Ed
No ratings yet
Chapter 10: Big Data: Database System Concepts, 7 Ed
14 pages
Literature Review On Big Data
No ratings yet
Literature Review On Big Data
10 pages
Amoore - Life Beyond Big Data Governing With Little Analytics
No ratings yet
Amoore - Life Beyond Big Data Governing With Little Analytics
27 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
55 pages
Sravan Literature Survey
No ratings yet
Sravan Literature Survey
6 pages
MapReduce Architecture
No ratings yet
MapReduce Architecture
3 pages
Minor Research Project Report
No ratings yet
Minor Research Project Report
23 pages
A Review of Hadoop Security Issues, Threats and Solutions
No ratings yet
A Review of Hadoop Security Issues, Threats and Solutions
6 pages
Assignment 2 - Spark
No ratings yet
Assignment 2 - Spark
2 pages
Mapreduce
No ratings yet
Mapreduce
7 pages
Data Efficiency PDF
No ratings yet
Data Efficiency PDF
27 pages
Unit 5 Nosql Databases
No ratings yet
Unit 5 Nosql Databases
9 pages
Bda Unit 1
No ratings yet
Bda Unit 1
13 pages
A Comparison of Mapreduce and Parallel Database Management Systems
No ratings yet
A Comparison of Mapreduce and Parallel Database Management Systems
5 pages
CSE545 Sp23 (3) Hadoop MapReduce 2-13
No ratings yet
CSE545 Sp23 (3) Hadoop MapReduce 2-13
96 pages
Apache Hadoop and Hive: Dhruba Borthakur
No ratings yet
Apache Hadoop and Hive: Dhruba Borthakur
32 pages
BDA Module 3
No ratings yet
BDA Module 3
66 pages
Detecting Botnets Using MapReduce
No ratings yet
Detecting Botnets Using MapReduce
6 pages
Final Lab Manual of Big Data Lab
No ratings yet
Final Lab Manual of Big Data Lab
113 pages
Final LP- 1 Problem Statement (ML, DAA or ADBMS)-2023_24
No ratings yet
Final LP- 1 Problem Statement (ML, DAA or ADBMS)-2023_24
21 pages
Spark Transformations and Actions
No ratings yet
Spark Transformations and Actions
24 pages
Introduction To Spark PDF
No ratings yet
Introduction To Spark PDF
37 pages
The Art of Data Science
No ratings yet
The Art of Data Science
12 pages
Big Data Analytics in Oil and Gas Industry - An Emerging Trend
No ratings yet
Big Data Analytics in Oil and Gas Industry - An Emerging Trend
8 pages
VPR - CC - Unit3.2 - Aneka - CometCloud - TSystems - Workflow - MapReduce - PPSX
No ratings yet
VPR - CC - Unit3.2 - Aneka - CometCloud - TSystems - Workflow - MapReduce - PPSX
67 pages
3 Cse Big Data Analytics 19a 05 602p R 19 Lab Manual
No ratings yet
3 Cse Big Data Analytics 19a 05 602p R 19 Lab Manual
29 pages
SalesData Map Reduce
No ratings yet
SalesData Map Reduce
3 pages