0% found this document useful (0 votes)

19 views

Kafka

This document provides an overview of streaming analysis using Spark. It discusses the requirements for streaming frameworks including scalability, low latency processing, and an integrated programming model for both streaming and batch processing. Traditional streaming systems have challenges with stateful processing since node failures can cause state to be lost. Spark Streaming addresses these issues by providing fault-tolerant stateful processing using its resilient distributed datasets (RDDs).

Uploaded by

aryandeo2011

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Kafka

Uploaded by

aryandeo2011

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

Big Data

Streaming Spark

Prafullata Kiran Auradkar

Department of Computer Science and Engineering
[email protected]
Acknowledgements:
Significant information in the slide deck presented through the Unit 4 of the course have been created by Dr. K V Subramaniam’s and would like
to acknowledge and thank him for the same. I may have supplemented the same with contents from books and other sources from Internet and
would like to sincerely thank, acknowledge and reiterate that the credit/rights for the same remain with the original authors/publishers only. These
are intended for classroom presentation only.
BIG DATA
Overview of lecture

• Introduction to streaming analysis

• Streaming query types
• Streaming analysis requirements
• Streaming spark – Dstreams
• Job execution
• Stateless Transformations
• Stateful processing
• Fault Tolerance
• Performance
• Putting it all together
Introduction to streaming analysis
BIG DATA
Examples of Streaming

• Sensor data, e.g.,

• Temperature sensor in the ocean
■ Reports GPS
■ 1 sensor every 150 square miles => 1,000,000 sensors
■ 10 readings/sec => 3.5 TB/day
• Images
• London: 6 million video cameras
• Internet / Web traffic
• Google: hundreds of millions of queries/day
• Yahoo: billions of clicks/day
BIG DATA
Motivation

• Many important applications must

process large streams of live data
and provide results in near-real-
time
• Social network trends
• Website statistics
• Intrusion detection systems
• Transportation system - Uber
• etc.
• Require large clusters to handle
workloads

• Require latencies of few seconds

Streaming data model and queries
BIG DATA
Stream Data Model

• Multiple streams
• Different rates, not synchronized
• Archival store
• Offline analysis, not real-time
• Working store
• Disk or memory
• Summaries
• Parts of streams
• Queries
• Standing queries
• Ad-hoc queries
BIG DATA
Examples of Stream Queries

• Standing queries: produce outputs at appropriate time

• Query is continuously running
• Constantly reading new data
• Query execution can be optimized
• Example: maximum temperature ever recorded
• Ad hoc query: not predetermined, arbitrary query
• Need to store stream
• Approach: store sliding window in SQL DB
• Do SQL query
• Example: number of unique users over last 30 days
• Store logins for last 30 days
BIG DATA
Exercise 1

Consider the queries

• Alert when temperature > threshold
on the right. Which
among them are • Display average of last n temperature readings; n arbitrary
STANDING QUERIES • List of countries from which visits have been received over last year
and which are • Alert if website receives visit from a black-listed country
AD HOC?
BIG DATA
Exercise 1 Solution

Consider the queries Solution

on the right. Which • Alert when temperature > threshold - standing
among them are • Display average of last n temperature readings – ad hoc
STANDING QUERIES
and which are • List of countries from which visits have been received over last year
AD HOC? – ad hoc
• Alert if website receives visit from a black-listed country - standing
BIG DATA
Issues in Stream Processing

• Velocity
• Streams can have high data rate
• Need to process very fast
• Volume
• Low data rate, but large number of streams
• Ocean sensors, pollution sensors
• Need to store in memory
• May not have huge memory
• Approximate solutions
BIG DATA
Need for a framework …

… for building such complex stream processing applications

But what are the requirements

from such a framework?
Streaming analysis framework requirements
BIG DATA
Requirements

• Scalable to large clusters

• Second-scale latencies
• Simple programming model
BIG DATA
Exercise 2

CAN WE USE HADOOP?

Input Stock Data Stream:
• Consider the simple program on the num_stock
right.
• The input is a stream of records from
the stock market.
• Each time a stock is sold, a new
record is created.
• The record contains a field Find_max
num_stock which is the number of
stocks sold.
• Find_max is a program that updated a
variable Max_num_stock which is the
maximum of num_stock. Max_num_stock
BIG DATA
Exercise 2 - Solution CAN WE USE HADOOP?

• Write the pseudo-code for Find_max

Input Stock Data
• If num_stock > Max_num_stock
Stream: num_stock
Max_num_stock = num_stock
• Can this be implemented in Hadoop? Similar problems
• We need to process one record at a can arise in Spark
time. Hadoop processes a full file in
the Map
• Max_num_stock is a global variable Find_max
• Does your solution assume that
Find_max runs on a single node? Is this a
reasonable assumption?
• No, the number of transactions
could be more than what a single
Max_num_stock
node can handle
BIG DATA
Case study: Conviva, Inc.

• Real-time monitoring of online video metadata

HBO, ESPN, ABC, SyFy, …
Since we can’t use
Hadoop Custom-built distributed stream processing system
• 1000s complex metrics on millions of video
sessions
• Two processing stacks • Requires many dozens of nodes for processing

Hadoop backend for offline analysis

• Generating daily and monthly reports
• Similar computation as the streaming system
BIG DATA
Case study: XYZ, Inc.

• Any company who wants to process live streaming data has this problem
• Twice the effort to implement any new function
• Twice the number of bugs to solve
• Twice the headache Custom-built distributed stream processing system
• 1000s complex metrics on millions of video
sessions
• Requires many dozens of nodes for processing
• Two processing stacks
Hadoop backend for offline analysis
• Generating daily and monthly reports
• Similar computation as the streaming system
BIG DATA
Requirements

• Scalable to large clusters

• Second-scale latencies
• Simple programming model
• Integrated with batch & interactive processing
BIG DATA
Stateful Stream Processing

• Traditional streaming systems have a event-

driven record-at-a-time processing model mutable state
• Each node has mutable state input
• For each record, update state & send new records
records
node 1

• State is lost if node dies! node 3

input
records

• Making stateful stream processing be fault- node 2

tolerant is challenging
BIG DATA
Exercise 2 - Solution

CAN WE USE HADOOP?

• Write the pseudo-code for Find_max Input Stock Data
• If num_stock > Max_num_stock Stream: num_stock
Max_num_stock = num_stock
• Can this be implemented in Hadoop?
• We need to process one record at a Similar problems
time. Hadoop processes a full file in the can arise in Spark
Map Find_max
• Max_num_stock is a global variable
• Does your solution assume that Find_max
runs on a single node? Is this a reasonable
assumption?
• No, the number of transactions could be
more than what a single node can handle Max_num_stock
BIG DATA
Exercise 3: Stateful Stream Processing

• Traditional streaming systems have a event-

driven record-at-a-time processing model mutable state
• Each node has mutable state
input
• For each record, update state & send new records
records
In the stock market node 1
example, where is the
• State is lost if node dies! state?
node 3
input
records

• Making stateful stream processing be fault- node 2

tolerant is challenging
BIG DATA
Exercise 3: (solution): Stateful Stream Processing

• Write the pseudo-code for Find_max Input Stock Data Stream:

• If num_stock > Max_num_stock num_stock
• Max_num_stock = num_stock
• Max_num_stock is a global state
• Its value depends upon the entire stream
sequence
• The first num_stock could have been the Find_max
largest
In the stock market
example, where is the
state?

Max_num_stock
BIG DATA
Existing Streaming Systems

• Storm
• Replays record if not processed by a node
• Processes each record at least once
• May update mutable state twice!
• Mutable state can be lost due to failure!

• Trident – Use transactions to update state

• Processes each record exactly once
• Per state transaction updates slow

https://storm.apache.org/
https://storm.apache.org/releases/current/Trident-tutorial.html
BIG DATA
Requirements

• Scalable to large clusters

• Second-scale latencies
• Simple programming model
• Integrated with batch & interactive processing
• Efficient fault-tolerance in stateful computations
Spark Streaming
BIG DATA
What is Spark Streaming?

• Framework for large scale stream processing

• Scales to 100s of nodes
• Can achieve second scale latencies
• Integrates with Spark’s batch and
interactive processing
• Provides a simple batch-like API for
implementing complex algorithm
• Can absorb live data streams from Kafka,
Flume, ZeroMQ, etc.
BIG DATA
Exercise 4:

Can we modify Hadoop? Input Stock Data

Stream: num_stock
• Suppose we don’t want instantaneous updates
• Say 1 second updates are acceptable
• Can we modify Hadoop to do stream processing?
• Ignore the global variable problem for now
Find_max: If num_stock > Max_num_stock
Max_num_stock = num_stock

Max_num_stock
BIG DATA
Exercise 4: Solution

Input Stock Data

• We can batch input records into Stream: num_stock
an HDFS file
• We can batch together the
inputrecords every second
Batcher: group together records
• This file can be processed using
every second into an HDFS file
MapReduce
• We can produce an update every
second
• We have ignored the global
variable problem for now Find_max: If num_stock > Max_num_stock
Max_num_stock = num_stock

Max_num_stock
BIG DATA
Discretized Stream Processing

Run a streaming computation as a series of live data stream

very small, deterministic batch jobs
Spark
• Chop up the live stream into batches of X
Streaming
seconds

• Spark treats each batch of data as RDDs batches of X seconds

and processes them using RDD
operations

• Finally, the processed results of the RDD Spark

operations are returned in batches
processed results
BIG DATA
Discretized Stream Processing

Run a streaming computation as a series of live data stream

very small, deterministic batch jobs
Spark
• Batch sizes as low as ½ second, latency ~
Streaming
1 second

• Potential for combining batch processing batches of X seconds

and streaming processing in the same
system

Spark
processed results
Streaming Spark - DStreams
BIG DATA
Remember!!!!

• In Spark (not Streaming Spark)

• Every variable is an RDD
• There are two kinds of RDDs
• Pair RDDs are RDDs that consist of key-
value pairs
• Special operations, such as reduceByKey
are defined on pair RDDs
BIG DATA
Example 1 – Get hashtags from Twitter

val tweets = ssc.twitterStream(<Twitter username>, <Twitter password>)

Twitter Streaming API batch @ t batch @ t+1 batch @ t+2

tweets DStream

stored in memory as an RDD (immutable,

distributed)
BIG DATA
Example 1 – Get hashtags from Twitter

val tweets = ssc.twitterStream(<Twitter username>, <Twitter password>)

val hashTags = tweets.flatMap (status => getTags(status))

new DStream transformation: modify data in one Dstream to create another DStream

getTags is a function. A DStream is a sequence of RDDs. The function is applied to

each RDD. The result is another DStream

batch @ t batch @ t+1 batch @ t+2

flatMap flatMap flatMap

hashTags Dstream
[#cat, #dog, … ]
BIG DATA
Example 1 – Get hashtags from Twitter

val tweets = ssc.twitterStream(<Twitter username>, <Twitter password>)

val hashTags = tweets.flatMap (status => getTags(status))

hashTags.saveAsHadoopFiles("hdfs://...")

output operation: to push data to external storage

tweets DStream batch @ t batch @ t+1 batch @ t+2

flatMap flatMap flatMap

hashTags
DStream
Every batch saved to save save save
HDFS
BIG DATA
Java Example

Scala
val tweets = ssc.twitterStream(<Twitter username>, <Twitter password>)
val hashTags = tweets.flatMap (status => getTags(status))
hashTags.saveAsHadoopFiles("hdfs://...")

Java
JavaDStream<Status> tweets = ssc.twitterStream(<Twitter username>,
<Twitter password>)
JavaDstream<String> hashTags = tweets.flatMap(new Function<...> { })
hashTags.saveAsHadoopFiles("hdfs://...")

Function object to define the transformation

Streaming Spark - execution
BIG DATA
Streaming Spark Flow

Dstreams and Receivers

• Twitter, HDFS, Kafka, Flume
Transformations
• Standard RDD operations – map,
countByValue, reduce, join, …
• Stateful operations – window,
countByValueAndWindow, …
Output Operations on Dstreams
• saveAsHadoopFiles – saves to
HDFS
• foreach – do anything with each
batch of results
BIG DATA
Dstreams and Receivers

• Streaming Spark processes data in batches

• Together termed the Dstream
• Every Dstream is associated with a Receiver
• Receivers read data from a source and
store into Spark Memory for processing
• Types of sources
• Basic – file systems and sockets
• Advanced – Kafka, flume
• Relationship between Dstream and RDD
BIG DATA
Dstreams and Receivers

• Streaming Spark processes job

• Starts a Receiver on an executor as a long

running job

• Driver starts tasks to process blocks on

every interval
Source: https://www.youtube.com/watch?v=NHfggxItokg
BIG DATA
Transformations in Spark

• Stateless transformations

• Stateful Transformations
Spark Streaming – stateless processing
BIG DATA
Stateless transformations in Spark

• Transformation is applied to every batch of data independently.

• No information is carried forward between one batch and the

next batch

• Examples
• Map()
• FlatMap()
• Filter()
• Repartition()
• reduceByKey()
• groupByKey()
BIG DATA
Class Exercise : Stateless stream processing (10 mins)

• Consider a Dstream on stock quotes generated similar to earlier that

contains

• A sequence of tuples that contain <company name, stock sold>

• Need to find total shares sold per company in the last 1 minute

• Show Streaming spark design for the same.

BIG DATA
Solution – count stock in every window

batch @ t batch @ t+1 batch @ t+2

Stocks sold DStream

reduceByKey reduceByKey reduceByKey

counts DStream

every batch processed independently,

so no state across time
Stateful processing
BIG DATA
Stateful transformations

• Sometimes we need to keep some state across different

batches of data
• For example, what’s the max amount of stock sold across
the whole day for a company?
• In this case we need to store max value for each
company
• How do we store state across batches?
• First we ensure that data is in pairRDDs
• Key, value format
• Helps to ensure that we have a state per key
BIG DATA
Stateful transformations

• Spark provides two options

• Window operator – when we want a state to be
maintained across short periods of time
• Session based – where state is maintained for longer
BIG DATA
Example 3 – Count the hashtags over last 10 mins

val tweets = ssc.twitterStream(<Twitter username>, <Twitter password>)

val hashTags = tweets.flatMap (status => getTags(status))
val tagCounts = hashTags.window(Minutes(10), Seconds(1)).countByValue()

sliding window operation window length sliding interval

BIG DATA
Example 3 – Counting the hashtags over last 10 mins

val tagCounts = hashTags.window(Minutes(10),Seconds(1)).countByValue()

t-1 t t+1 t+2 t+3

hashTags
sliding window

countByValue
tagCounts count over all
the data in the
window
BIG DATA
Class Exercise: (5 mins)

val tagCounts = hashTags.window(Minutes(10), Seconds(1)).countByValue()

How can we make this count operation more efficient?

BIG DATA
Smart window-based countByValue

val tagCounts = hashtags.countByValueAndWindow(Minutes(10), Seconds(1))

t-1 t t+1 t+2 t+3

hashTag
s countByValu
e add the counts
from the new
batch in the
subtract the
counts from – + window
tagCounts batch before ?
the window +
BIG DATA
Smart window-based reduce

• Technique to incrementally compute count generalizes to many

reduce operations
• Need a function to “inverse reduce” (“subtract” for counting)

• Could have implemented counting as:

hashTags.reduceByKeyAndWindow(_ + _, _ - _, Minutes(1), …)

54
BIG DATA
Session based state

▪ Maintaining arbitrary state, track sessions

- Maintain per-user mood as state, and update it with his/her tweets
tweets.updateStateByKey(tweet => updateMood(tweet))
• updateStateByKey uses the
current mood and the mood in the
tweet to update the user’s mood
BIG DATA
Exercise 4 – Maintaining State (10 minutes)

▪ Consider the code to the right ▪ Maintaining arbitrary state, track sessions
▪ What has to be the structure of the RDD - Maintain per-user mood as state, and
tweets? update it with his/her tweets
▪ Hint – note that updateStateByKey tweets.updateStateByKey(tweet =>
needs a key updateMood(tweet))
▪ What does the function updateMood do?
▪ Hint – note that it should update
per-user mood
BIG DATA
Exercise 4 – Maintaining State: solution

▪ What has to be the structure of the RDD ▪ Maintaining arbitrary state, track sessions
tweets? - Maintain per-user mood as state, and
▪ Must consist of key value pairs with update it with his/her tweets
user as key and mood as value
tweets.updateStateByKey(tweet =>
▪ Dinkar Happy updateMood(tweet))
▪ KVS VeryHappy
▪ …
▪ What does the function updateMood do?
▪ Compute the new mood based upon
the old mood and tweet
▪ Suppose user KVS (key) tweets “Eating
icecream” (value)
BIG DATA
Exercise 4 – Maintaining State: solution

▪ Consider the code to the right ▪ Maintaining arbitrary state, track sessions
▪ What has to be the structure of the RDD tweets?
- Maintain per-user mood as state, and update it
▪ Must consist of key value pairs with user as key and mood as
value with his/her tweets
▪ Dinkar Happy
▪ KVS VeryHappy tweets.updateStateByKey(tweet =>
▪ …
updateMood(tweet))
▪ Suppose user Dinkar (key) tweets “Eating icecream”
(value)
▪ updateStateByKey finds the current mood – Happy
▪ current mood (Happy) and tweet (Eating icecream) is passed
to updateMood
▪ updateMood calculates new mood as VeryHappy
▪ updateStateByKey stores the new mood for Dinkar as VeryHappy
Fault Tolerant stateful processing
BIG DATA
Fault-tolerant Stateful Processing

• All intermediate data are RDDs, hence can be recomputed if lost.

BIG DATA
Fault-tolerance

▪ RDDs remember the sequence of

operations that created it from the tweets input data
original fault-tolerant input data RDD replicated
in memory

▪ Batches of input data are replicated in flatMap

memory of multiple worker nodes,
therefore fault-tolerant
hashTags
RDD lost partitions
▪ Data lost due to worker failure, can be recomputed on
recomputed from input data other workers
BIG DATA
Fault Tolerance

• What happens if there is a failure with • Stateless

• Stateless processing • Previous history is not required.
• Stateful processing • Processing can just be
recomputed
• Stateful
• State from previous batches
required for computation.
• How much state to retain?
BIG DATA
Checkpointing

• Sometimes there may be too much

data to be stored t-1 t t+1 t+2 t+3
• For a streaming algorithm, may hashTag
s
have to store all the streams
• Checkpointing
• Stores an RDD tagCount
s
• Forgets the lineage
Checkpoint
• A checkpoint at t+2 will
• store the hashTags and
tagCounts at t+2
• Forget the rest of the lineage
Performance
BIG DATA
Performance

• Can process 6 GB/sec (60M records/sec) of data on 100 nodes at sub-second

latency
- Tested with 100 streams of data on 100 EC2 instances with 4 cores each

67
BIG DATA
Fast Fault Recovery

• Recovers from faults/stragglers within 1 sec

69
BIG DATA
Real Applications: Conviva

• Real-time monitoring of video metadata

• Achieved 1-2 second latency
• Millions of video sessions
Active sessions (millions)

processed
• Scales linearly with cluster
size

# Nodes in Cluster 70
BIG DATA
Real Applications: Mobile Millennium Project

• Traffic transit time estimation using online machine learning on GPS

observations
• Markov chain Monte Carlo
simulations on GPS
GPS observations per second

observations
• Very CPU intensive, requires
dozens of machines for useful
computation
• Scales linearly with cluster size

# Nodes in Cluster
Putting it all together
BIG DATA
Vision - one stack to rule them all
BIG DATA
Spark program vs Spark Streaming program

Spark Streaming program on Twitter stream

val tweets = ssc.twitterStream(<Twitter username>,
<Twitter password>)
val hashTags = tweets.flatMap (status => getTags(status))
hashTags.saveAsHadoopFiles("hdfs://...")

Spark program on Twitter log file

val tweets = sc.hadoopFile("hdfs://...")
val hashTags = tweets.flatMap (status => getTags(status))
hashTags.saveAsHadoopFile("hdfs://...")
BIG DATA
Vision - one stack to rule them all

▪ Explore data interactively $ ./spark-shell

scala> val file = sc.hadoopFile(“smallLogs”)
using Spark Shell / PySpark ...
scala> val filtered = file.filter(_.contains(“ERROR”))
to identify problems ...
scala> valProcessProductionData
object mapped = file.map(...)
{
... def main(args: Array[String]) {
val sc = new SparkContext(...)
val file = sc.hadoopFile(“productionLogs”)
▪ Use same code in Spark val filtered = file.filter(_.contains(“ERROR”))
val mapped = file.map(...)
stand-alone programs to ...
object
} ProcessLiveStream {

identify problems in } def main(args: Array[String]) {

val sc = new StreamingContext(...)
val stream = sc.kafkaStream(...)
production logs val filtered = file.filter(_.contains(“ERROR”))
val mapped = file.map(...)
...
}

▪ Use similar code in Spark

}

Streaming to identify
problems in live log streams
BIG DATA
Vision - one stack to rule them all

▪ Explore data interactively using Spark

$ ./spark-shell
scala> val file = sc.hadoopFile(“smallLogs”)
Shell / PySpark to identify problems ...
scala> val filtered = file.filter(_.contains(“ERROR”))
...
scala> valProcessProductionData
object mapped = file.map(...)
{
... def main(args: Array[String]) {
▪ Use same code in Spark stand-alone val sc = new SparkContext(...)
programs to identify problems in val file = sc.hadoopFile(“productionLogs”)
val filtered = file.filter(_.contains(“ERROR”))
production logs val mapped = file.map(...)
...
object
} ProcessLiveStream {
} def main(args: Array[String]) {

▪ Use similar code in Spark Streaming

val sc = new StreamingContext(...)
val stream = sc.kafkaStream(...)
to identify problems in live log val filtered = file.filter(_.contains(“ERROR”))
val mapped = file.map(...)
streams ...
}
}
BIG DATA
Alpha Release with Spark 0.7

• Integrated with Spark 0.7

• Import spark.streaming to get all the functionality

• Both Java and Scala API

• Give it a spin!
• Run locally or in a cluster

• Try it out in the hands-on

https://spark.apache.org/docs/latest/streaming-programming-guide.html
BIG DATA
Limitations

• Streaming Spark processes data in batches

• Near Real Time

• Not necessarily acceptable for certain scenarios

BIG DATA
Summary

• Stream processing framework that is ...

• Scalable to large clusters
• Achieves second-scale latencies
• Has simple programming model
• Integrates with batch & interactive workloads
• Ensures efficient fault-tolerance in stateful computations

• For more information, checkout the paper:

https://www.usenix.org/system/files/conference/hotcloud12/hotcloud12-
final28.pdf
BIG DATA
References

• https://spark.apache.org/docs/latest/streaming-programming-guide.html
• https://spark.apache.org/streaming/
• Mining of Massive Datasets, Anand Rajaraman, Jure Leskovec, Jeﬀrey D. Ullman
• Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark,
and More Hadoop Alternatives, Vijay Srinivasa Agneeswaran
THANK YOU

Prafullata Kiran Auradkar

Department of Computer Science and Engineering
[email protected]

45 Logistics Interview Questions and Answers
No ratings yet
45 Logistics Interview Questions and Answers
15 pages
Unit 4 Notes PDF
100% (2)
Unit 4 Notes PDF
27 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
A042e368 Cunnins
No ratings yet
A042e368 Cunnins
6 pages
83 Elmsdale DR, Kitchener - Official Plan Amendment & Zone Change Proposal
100% (1)
83 Elmsdale DR, Kitchener - Official Plan Amendment & Zone Change Proposal
71 pages
PDC Solution Functional Document
No ratings yet
PDC Solution Functional Document
13 pages
Army Intelligence Training Strategy 2014
100% (2)
Army Intelligence Training Strategy 2014
27 pages
Stream Processing and Analytics Handout
No ratings yet
Stream Processing and Analytics Handout
8 pages
Spark Streaming: Tathagata "TD" Das
No ratings yet
Spark Streaming: Tathagata "TD" Das
28 pages
Apache Spark Streaming Presentation
100% (1)
Apache Spark Streaming Presentation
28 pages
StreamProcessingAndAnalytics Handout
No ratings yet
StreamProcessingAndAnalytics Handout
7 pages
Learning Real-Time Processing With Spark Streaming - Sample Chapter
No ratings yet
Learning Real-Time Processing With Spark Streaming - Sample Chapter
30 pages
SPA Notes
No ratings yet
SPA Notes
4 pages
Lec 05
No ratings yet
Lec 05
10 pages
BDA-Lec10
No ratings yet
BDA-Lec10
33 pages
lec19
No ratings yet
lec19
24 pages
lec19
No ratings yet
lec19
23 pages
TRabl StreamProcessing
No ratings yet
TRabl StreamProcessing
79 pages
Big Data Engines: Binary Batch Processing
No ratings yet
Big Data Engines: Binary Batch Processing
12 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
19 pages
BDA_Unit_3
No ratings yet
BDA_Unit_3
18 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
57 pages
6- Streaming Part 1
No ratings yet
6- Streaming Part 1
44 pages
Big Data Analysis Apache Storm Perspecti
No ratings yet
Big Data Analysis Apache Storm Perspecti
6 pages
Group 3&4 Assignment
No ratings yet
Group 3&4 Assignment
6 pages
UNIT V Streaming
No ratings yet
UNIT V Streaming
22 pages
Ade Mod 1 Incremental Processing With Spark Structured Streaming
No ratings yet
Ade Mod 1 Incremental Processing With Spark Structured Streaming
73 pages
Module II
No ratings yet
Module II
22 pages
Spark Streaming Through Dynamic Batch Sizing
No ratings yet
Spark Streaming Through Dynamic Batch Sizing
4 pages
Lec 01
No ratings yet
Lec 01
17 pages
Bài Giảng Spark Streaming
No ratings yet
Bài Giảng Spark Streaming
75 pages
Bigdata Unit-Ii
No ratings yet
Bigdata Unit-Ii
33 pages
Spark Streaming
No ratings yet
Spark Streaming
19 pages
Unit-2 BDA
No ratings yet
Unit-2 BDA
30 pages
Real time data streaming new techniques
No ratings yet
Real time data streaming new techniques
5 pages
ECS765P - W10 - Stream Processing
No ratings yet
ECS765P - W10 - Stream Processing
39 pages
SS ZG556 COURSE HANDOUT23 - July23
No ratings yet
SS ZG556 COURSE HANDOUT23 - July23
10 pages
Continuous_Application_1725280881
No ratings yet
Continuous_Application_1725280881
72 pages
Ashish_Presentation_Stage1_modify_LR
No ratings yet
Ashish_Presentation_Stage1_modify_LR
24 pages
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
No ratings yet
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
4 pages
Sigmod Structured Streaming
No ratings yet
Sigmod Structured Streaming
13 pages
Components of A Big Data Architecture
No ratings yet
Components of A Big Data Architecture
3 pages
biggdata
No ratings yet
biggdata
24 pages
Stream Processing and Analytics - Regular-HO
No ratings yet
Stream Processing and Analytics - Regular-HO
7 pages
Full download Hands-on Guide to Apache Spark 3: Build Scalable Computing Engines for Batch and Stream Data Processing 1st Edition Alfonso Antolínez García pdf docx
100% (2)
Full download Hands-on Guide to Apache Spark 3: Build Scalable Computing Engines for Batch and Stream Data Processing 1st Edition Alfonso Antolínez García pdf docx
47 pages
Big Data Analytics - Unit 2
No ratings yet
Big Data Analytics - Unit 2
10 pages
SA Unit 1 PPT 2
No ratings yet
SA Unit 1 PPT 2
27 pages
Big Data pdf
No ratings yet
Big Data pdf
10 pages
BDA UNIT-2 (Final)
No ratings yet
BDA UNIT-2 (Final)
27 pages
Big Data Analytics 0th Lecture
No ratings yet
Big Data Analytics 0th Lecture
19 pages
L8 Big Data Management en
No ratings yet
L8 Big Data Management en
58 pages
7- Streaming 2- Calcite
No ratings yet
7- Streaming 2- Calcite
45 pages
Bda Ut2 Que Ans
No ratings yet
Bda Ut2 Que Ans
14 pages
Real-Time Processing of Big Data Streams: Lifecycle, Tools, Tasks, and Challenges
No ratings yet
Real-Time Processing of Big Data Streams: Lifecycle, Tools, Tasks, and Challenges
7 pages
big data unit 1
No ratings yet
big data unit 1
24 pages
Big Data Architecture
No ratings yet
Big Data Architecture
4 pages
Ebin - Pub Hands On Guide To Apache Spark 3 Build Scalable Computing Engines For Batch and Stream Data Processing 1nbsped 1484293797 9781484293799
100% (1)
Ebin - Pub Hands On Guide To Apache Spark 3 Build Scalable Computing Engines For Batch and Stream Data Processing 1nbsped 1484293797 9781484293799
307 pages
Hands On Guide To Apache Spark 3 Build Scalable Computing Engines For Batch and Stream Data Processing 1nbsped 1484293797 9781484293799
No ratings yet
Hands On Guide To Apache Spark 3 Build Scalable Computing Engines For Batch and Stream Data Processing 1nbsped 1484293797 9781484293799
407 pages
dspl_casestidy.docx
No ratings yet
dspl_casestidy.docx
3 pages
BigData_Mod2
No ratings yet
BigData_Mod2
12 pages
Bdhs - Ebook
No ratings yet
Bdhs - Ebook
970 pages
Big data Handling Techniques
No ratings yet
Big data Handling Techniques
21 pages
Hadoop Beginner's Guide
From Everand
Hadoop Beginner's Guide
Garry Turkington
4/5 (7)
Real-Time Big Data Analytics: Emerging Trends
From Everand
Real-Time Big Data Analytics: Emerging Trends
Trilokesh Khatri
No ratings yet
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
MCGB - Data Sheet For Suppliers Old MAT Nos.: 211, - , - : Heat-Treatable Steel, Low Alloy Steel, Cr-Mo
No ratings yet
MCGB - Data Sheet For Suppliers Old MAT Nos.: 211, - , - : Heat-Treatable Steel, Low Alloy Steel, Cr-Mo
3 pages
Furniture Industry in 2020
No ratings yet
Furniture Industry in 2020
16 pages
Football Report 2015
No ratings yet
Football Report 2015
44 pages
Quarter 4 - Module 2 Exploratory
No ratings yet
Quarter 4 - Module 2 Exploratory
5 pages
Brochure Saf Holland
No ratings yet
Brochure Saf Holland
818 pages
Heavy Oil Upgrading by Supercritical Water Treatment
No ratings yet
Heavy Oil Upgrading by Supercritical Water Treatment
20 pages
Electronics Tutorials Ws
No ratings yet
Electronics Tutorials Ws
11 pages
50 Velilla v. Posadas
No ratings yet
50 Velilla v. Posadas
2 pages
P-Block Class Notes
No ratings yet
P-Block Class Notes
17 pages
Liv BFV Di DFM 001
No ratings yet
Liv BFV Di DFM 001
1 page
Price List Kaze Series As Dated 08-08-2011
100% (2)
Price List Kaze Series As Dated 08-08-2011
49 pages
International Prospectus 2023
No ratings yet
International Prospectus 2023
58 pages
Po WDM Guide en
No ratings yet
Po WDM Guide en
6 pages
Fundamentals of Logistics Fundamentals Final-Coverage
100% (1)
Fundamentals of Logistics Fundamentals Final-Coverage
39 pages
IDL - Final Tir 1400
No ratings yet
IDL - Final Tir 1400
630 pages
AAQUIB, R. CHATURVEDI, A. Cloud Computing - Characteristics and Services A Brief Review PDF
No ratings yet
AAQUIB, R. CHATURVEDI, A. Cloud Computing - Characteristics and Services A Brief Review PDF
6 pages
Molecular Dynamics
No ratings yet
Molecular Dynamics
54 pages
41 - 9 ENG Imager Catalogue
No ratings yet
41 - 9 ENG Imager Catalogue
12 pages
Poptronics 1981 09
100% (1)
Poptronics 1981 09
116 pages
Battalion Team
100% (2)
Battalion Team
268 pages
Braden-PD12C-PD15B-PD17A-Service-Manual (1)
No ratings yet
Braden-PD12C-PD15B-PD17A-Service-Manual (1)
40 pages
IAS
No ratings yet
IAS
1 page
Resolving Social Dilemmas-1
100% (1)
Resolving Social Dilemmas-1
5 pages
Exercise 8 Summarizing Quantitative Studies-1
No ratings yet
Exercise 8 Summarizing Quantitative Studies-1
12 pages
The Power of Transformation PDF
No ratings yet
The Power of Transformation PDF
238 pages