0% found this document useful (0 votes)

34 views

DA Unit 3

Data streaming involves processing continuous data in real-time as it is generated, as opposed to batch processing of stored data. A data stream consists of a continuous flow of data elements ordered in a sequence. Unlike batch processing, data streaming allows processing data as soon as it is created. Key features of data streams include their continuous flow, infinite length, high velocity, and variability. Data streams are essential for modern data processing and decision making by enabling real-time insights from continuous data sources.

Uploaded by

Shruti Saxena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

DA Unit 3

Uploaded by

Shruti Saxena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

WHAT IS DATA STREAM:

Data streaming is a modern approach to processing and analyzing

data in real-time, as opposed to batch processing methods. A data
stream is a continuous flow of data elements that are ordered
in a sequence and processed as they are generated. Data stream
is different from traditional batch processing methods in that they
are continuous, unbounded, and potentially high-velocity with high
variability.
Unlike traditional data processing, where data is collected and
processed in batches, data streams are continuously collecting data,
making it possible to process data as soon as they are created.
Key features of data streams include their continuous flow, infinite
length, unbounded nature, high velocity, and potentially high
variability. They are often used in stream processing systems like
Apache Spark Streaming.
Importance of data streams in modern data processing
Data streams play a critical role in modern data processing, enabling
real-time insights and automated actions.
In the healthcare industry, data streams enable continuous
monitoring of patient data, allowing for early detection and
intervention in the event of critical health issues. Data streaming can
also be used in machine learning algorithms to derive insights from
continuous data and improve predictive analytics.
So evidently, data streams are essential for modern data processing
and decision-making, enabling businesses to derive valuable insights
from the continuous stream of data generated by their internal
IT systems and external data sources.
• Example of Stream Sources: Sensor data, Image data, Internet &
Web traffic.
Issues in Stream Processing
• Stream often deliver elements very rapidly.
• We must process elements in real time.
• Its important that stream processing algorithm is executed in
main memory.
Sampling data in stream
• Data sampling is a statistical analysis technique used to select,
manipulate and analyze a representative subset of data
points to identify patterns and trends in the larger data
set being examined.
• The method of collecting data from a population, regarding a
sample on a group of items and examining it to draw out some
conclusion, is known as Sample Method.
• Probability sampling allows every member of the population a
chance to get selected. It is mainly used in quantitative research
when you want to produce results representative of the whole
population.
• In non-probability sampling, not every individual has a chance
of being included in the sample. This sampling method is easier
and cheaper but also has high risks of sampling bias.
Filtering :
Another common process on stream is selection or filtering.
We want to accept those tuples in the stream that meet a criteria.
Accepted tuples are passed to another process as a stream ,while
other tuples are dropped.
Bloom filtering is the way to eliminate most of the tuples that do not
meet criteria.
RTAP (Real Time Analytics
Platform)
A real-time analytics platform enables organizations to make
the most out of real-time data by helping them to extract the
valuable information and trends from it.
Such platforms help in measuring data from the business
point of view in real time, further making the best use of
data.
An ideal real-time analytics platform would help in analyzing
the data, correlating it and predicting the outcomes on a real-
time basis.

Widely used RTAP:

1.Apache sparkstreaming: A big data platform for data
stream analytics in real time.
2.Oracle stream analytics(OSA): A platform that provides
a graphical interface to “fast data”.
3.SAP HANA: A streaming analytics tool which also does real
time analysis.
4.SQL stream Blaze: an analytics platform , offering a real
time ,easy to use and powerful visual development
environment.
RTAP Applications:
 Fraud detection systems for online transactions.
 Social media analytics
 Click analysis for online recommendations.
Advantages of RTAP:
 Create our interactive analytics tool.
 Make use of ML.
 Transparent dashboards allow users to share information.

Stock market prediction:

What is the Stock Market

A stock market is a public market where you can buy and sell shares for publicly listed
companies. The stocks, also known as equities, represent ownership in the company. The
stock exchange is the mediator that allows the buying and selling of shares.

Importance of Stock Market

 Stock markets help companies to raise capital.

 It helps generate personal wealth.

 Stock markets serve as an indicator of the state of the economy.

 It is a widely used source for people to invest money in companies with high
growth potential.

Stock Price Prediction

Stock Price Prediction using machine learning helps you discover the future value of
company stock and other financial assets traded on an exchange. The entire idea of predicting
stock prices is to gain significant profits. Predicting how the stock market will perform is a
hard task to do. There are other factors involved in the prediction, such as physical and
psychological factors, rational and irrational behaviour, and so on. All these factors combine
to make share prices dynamic and volatile. This makes it very difficult to predict stock prices
with high accuracy.

FLAJOLET-MARTIN
ALGORITHM(counting distinct
elements in a stram):
The Flajolet-Martin algorithm is also known as
probabilistic algorithm which is manly used to count the
number of unique elements in a stream or database.

The steps for the Flajolet-Martin algorithm are:

 First step is to choose a hash function that can be used to

map the elements in the database to fixed-length binary
strings. The length of the binary string can be chosen
based on the accuracy desired.
 Next step is to apply the hash function to each data item
in the dataset to get its binary string representation.
 Next step includes determinig the position of the
rightmost zero in each binary string.
 Next we compute the maximum position of the rightmost
zero for all binary strings.
 Now we estimate the number of distinct elements in the
dataset as 2 to the power of the maximum position of the
rightmost zero which we calculated in previous step.

Pseudo Code-Stepwise Solution:

1. Selecting a hash function h so each element in the
set is mapped to a string to at least log2n
bits.
2. For each element x, r(x)= length of trailing zeroes in
h(x)
3. R=max(r(x))

=> Distinct elements= 2R

Example:
S= 1, 3, 2, 1, 2, 3, 4, 3, 1, 2, 3, 1
H(x)= (6x+1)mod 5
Assume b=5

R = max( r(a) ) = 2
So no. of distinct elements = N=2^2=4

DGIM(Datar Gionis Indyk Motwani for

oneness in a window):
• Suppose we have a window of length N on a binary
stream. We want at all times to be able to answer
queries of the form “how many 1’s are there in the last
k bits?” for any k≤ N. For this purpose we use the DGIM
algorithm.
• The basic version of the algorithm uses O(log2 N) bits
to represent a window of N bits, and allows us to
estimate the number of 1’s in the window with an error
of no more than 50%.
• To begin, each bit of the stream has a timestamp, the
position in which it arrives. The first bit has timestamp
1, the second has timestamp 2, and so on.
• We divide the window into buckets, 5 consisting of:
1. The timestamp of its right (most recent) end.
2. The number of 1’s in the bucket. This number must
be a power of 2, and we refer to the number of 1’s
as the size of the bucket.
There are six rules that must be followed when representing
a stream by buckets.
• The right end of a bucket is always a position with a 1.
• Every position with a 1 is in some bucket.
• No position is in more than one bucket.
• There are one or two buckets of any given size, up to
some maximum size.
• All sizes must be a power of 2.
• Buckets cannot decrease in size as we move to the left
(back in time).

ESTIMATING MOMENTS:
Estimating moments is a generalization of the problem of
counting distinct elements in a stream. The problem, called
computing "moments," involves the distribution of
frequencies of different elements in the stream.

Winning On Wall Street (Revised Edition) by Martin Zweig
100% (6)
Winning On Wall Street (Revised Edition) by Martin Zweig
153 pages
Unit 4 Notes PDF
100% (2)
Unit 4 Notes PDF
27 pages
Unit 3
No ratings yet
Unit 3
30 pages
Mining Data Streams
No ratings yet
Mining Data Streams
37 pages
Big Data Unit III
No ratings yet
Big Data Unit III
20 pages
UNIT 2 BDA
No ratings yet
UNIT 2 BDA
13 pages
Mining Data Streams
No ratings yet
Mining Data Streams
34 pages
Methodologies for Stream Data Processing and Stream Data Systems
No ratings yet
Methodologies for Stream Data Processing and Stream Data Systems
20 pages
BDA
No ratings yet
BDA
6 pages
Data Analytics Assignment
No ratings yet
Data Analytics Assignment
20 pages
unit-3 notes
No ratings yet
unit-3 notes
10 pages
a.
No ratings yet
a.
3 pages
Unit 4
No ratings yet
Unit 4
10 pages
Streaming Algorithms: Ajinkya Potdar Hemanga Krishna Borah
No ratings yet
Streaming Algorithms: Ajinkya Potdar Hemanga Krishna Borah
47 pages
unit-3.pptx
No ratings yet
unit-3.pptx
49 pages
Bda Mid Ans
No ratings yet
Bda Mid Ans
18 pages
MMD3
No ratings yet
MMD3
17 pages
4 Bda Chapter4 Answer
No ratings yet
4 Bda Chapter4 Answer
6 pages
PUT Solutions_ Paper Data Analytics 2024
No ratings yet
PUT Solutions_ Paper Data Analytics 2024
20 pages
DataStreamsCRC Anjaly
No ratings yet
DataStreamsCRC Anjaly
258 pages
Mining Data Streams
No ratings yet
Mining Data Streams
17 pages
Big Data Analytics Rajnish)
No ratings yet
Big Data Analytics Rajnish)
13 pages
Unit 2
No ratings yet
Unit 2
23 pages
Big Data Ppt
No ratings yet
Big Data Ppt
37 pages
Big Data 3rd Unit
No ratings yet
Big Data 3rd Unit
16 pages
BDA-UNIT3
No ratings yet
BDA-UNIT3
22 pages
CSE545 Sp23 (2) Streaming Algorithms 2-4
No ratings yet
CSE545 Sp23 (2) Streaming Algorithms 2-4
60 pages
Mod4_DWDM_BTECH
No ratings yet
Mod4_DWDM_BTECH
9 pages
5.1 Mining Data Streams
No ratings yet
5.1 Mining Data Streams
16 pages
BigData_Mod2
No ratings yet
BigData_Mod2
12 pages
Swe2011 Bda - III
No ratings yet
Swe2011 Bda - III
53 pages
Unit II(Big Data)
No ratings yet
Unit II(Big Data)
19 pages
Swe2011 Bda - III
No ratings yet
Swe2011 Bda - III
50 pages
Unit Ii BD
No ratings yet
Unit Ii BD
74 pages
Unit3 Mining Data Streams
No ratings yet
Unit3 Mining Data Streams
18 pages
Mod2_Data_Streams
No ratings yet
Mod2_Data_Streams
75 pages
Stream Computing Methods
No ratings yet
Stream Computing Methods
35 pages
Module 3 Mining Data Streams
No ratings yet
Module 3 Mining Data Streams
96 pages
Data Mining Unit-V
No ratings yet
Data Mining Unit-V
19 pages
Mining Data Streams
No ratings yet
Mining Data Streams
33 pages
U3 Notes
No ratings yet
U3 Notes
27 pages
KCA 034 - Unit 3
No ratings yet
KCA 034 - Unit 3
21 pages
DWDM - Unit - VII
No ratings yet
DWDM - Unit - VII
42 pages
Data analytics Question bank
No ratings yet
Data analytics Question bank
5 pages
DA CIA 3 Answers
No ratings yet
DA CIA 3 Answers
20 pages
Data Mining_Unit-V
No ratings yet
Data Mining_Unit-V
12 pages
BDA Mod 3
No ratings yet
BDA Mod 3
57 pages
BIG_DATA
No ratings yet
BIG_DATA
8 pages
UNIT IV
No ratings yet
UNIT IV
11 pages
UNIT IV
No ratings yet
UNIT IV
5 pages
BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
Dwdm Unit 5 Part One
No ratings yet
Dwdm Unit 5 Part One
29 pages
Unit 2 BD Mining Data Streams
No ratings yet
Unit 2 BD Mining Data Streams
34 pages
Crash Course On Data Stream Algorithms: Part I: Basic Definitions and Numerical Streams
No ratings yet
Crash Course On Data Stream Algorithms: Part I: Basic Definitions and Numerical Streams
76 pages
Bda Unit - 2
No ratings yet
Bda Unit - 2
12 pages
FALLSEM2024-25_SWE2011_ETH_VL2024250103282_2024-08-19_Reference-Material-I
No ratings yet
FALLSEM2024-25_SWE2011_ETH_VL2024250103282_2024-08-19_Reference-Material-I
53 pages
BDA Unit-2
No ratings yet
BDA Unit-2
12 pages
mining data stream
No ratings yet
mining data stream
31 pages
3. Unit 3 - BD - Streaming
No ratings yet
3. Unit 3 - BD - Streaming
42 pages
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
Investools Introduction To Trading Stocks Slides PDF
No ratings yet
Investools Introduction To Trading Stocks Slides PDF
285 pages
Relig Are
No ratings yet
Relig Are
45 pages
Hardik CV Final
No ratings yet
Hardik CV Final
2 pages
Record Decreased in Equity in Subsidiary From Subsidiary Stock Reacquisition
No ratings yet
Record Decreased in Equity in Subsidiary From Subsidiary Stock Reacquisition
15 pages
Srei Equipment Finance Limited DRHP PDF
No ratings yet
Srei Equipment Finance Limited DRHP PDF
465 pages
Bharat Reddy Excel Sheet
100% (1)
Bharat Reddy Excel Sheet
7 pages
Dayananda Sagar College of Management and It, Bangalore Iv Sem Mba Project Assignments
No ratings yet
Dayananda Sagar College of Management and It, Bangalore Iv Sem Mba Project Assignments
11 pages
Stan Lee Media Boxes Index of Files and Docs
No ratings yet
Stan Lee Media Boxes Index of Files and Docs
170 pages
Ipo-In-India-List-Main-Board-Sme (2022
No ratings yet
Ipo-In-India-List-Main-Board-Sme (2022
11 pages
Lecture-20 Equity Valuation
No ratings yet
Lecture-20 Equity Valuation
10 pages
Chartered Institute For Securities & Investment (CISI) Certificate For Introduction To Securities and Investment
No ratings yet
Chartered Institute For Securities & Investment (CISI) Certificate For Introduction To Securities and Investment
45 pages
CHAPTERS - Introduction To Stock Market
No ratings yet
CHAPTERS - Introduction To Stock Market
62 pages
Irrational Exhuberance With Chinese Characteristics: Origins of The 2007 A-Share Bubble
No ratings yet
Irrational Exhuberance With Chinese Characteristics: Origins of The 2007 A-Share Bubble
23 pages
Right Stock at Right Price For - Shabbir Bhimani
No ratings yet
Right Stock at Right Price For - Shabbir Bhimani
249 pages
2) Stop Trading Slow As Fuck Megacap Stocks With A Small Account
No ratings yet
2) Stop Trading Slow As Fuck Megacap Stocks With A Small Account
15 pages
Types of Equity
No ratings yet
Types of Equity
2 pages
The Rate of Return On Everything - 1870-2015 PDF
No ratings yet
The Rate of Return On Everything - 1870-2015 PDF
174 pages
Investors Awareness Regarding Depository in India: Taranjeet Kaur MBA3rd 1272395
No ratings yet
Investors Awareness Regarding Depository in India: Taranjeet Kaur MBA3rd 1272395
31 pages
Capital Market
No ratings yet
Capital Market
84 pages
Composite: Idx Index Fact Sheet
No ratings yet
Composite: Idx Index Fact Sheet
3 pages
Iiswbm Finance Project
No ratings yet
Iiswbm Finance Project
48 pages
Payout Policy: Price Per Share $ 1.6billion Shares Repurchased $ 120 Million $64
No ratings yet
Payout Policy: Price Per Share $ 1.6billion Shares Repurchased $ 120 Million $64
5 pages
The Ascent of Money
No ratings yet
The Ascent of Money
3 pages
Beyond Cash Dividends: Buybacks, Spin Offs and Divestitures
No ratings yet
Beyond Cash Dividends: Buybacks, Spin Offs and Divestitures
49 pages
1929-1946 Anjali The GREAT DREPERSSION.... !!!
100% (1)
1929-1946 Anjali The GREAT DREPERSSION.... !!!
24 pages
AUDITING-Audit of Shareholders' Equity
No ratings yet
AUDITING-Audit of Shareholders' Equity
6 pages
Allocating Cash Dividends Between Preferred and Co PDF
No ratings yet
Allocating Cash Dividends Between Preferred and Co PDF
3 pages
OTC Exchange of India
No ratings yet
OTC Exchange of India
17 pages
A Reflection Essay: My Experience in Buying and Selling of Stocks (Investagrams)
No ratings yet
A Reflection Essay: My Experience in Buying and Selling of Stocks (Investagrams)
3 pages

DA Unit 3

Uploaded by

DA Unit 3

Uploaded by

WHAT IS DATA STREAM:

Data streaming is a modern approach to processing and analyzing

Widely used RTAP:

Stock market prediction:

What is the Stock Market

Importance of Stock Market

 Stock markets help companies to raise capital.

 It helps generate personal wealth.

 Stock markets serve as an indicator of the state of the economy.

Stock Price Prediction

The steps for the Flajolet-Martin algorithm are:

 First step is to choose a hash function that can be used to

Pseudo Code-Stepwise Solution:

=> Distinct elements= 2R

DGIM(Datar Gionis Indyk Motwani for

You might also like