0% found this document useful (0 votes)
27 views3 pages

72 Soham Naik BDA EXP7

1. The document describes the Datar-Gionis-Indyk-Motwani (DGIM) algorithm for tracking the number of 1s in a sliding window of a binary stream using only O(log N) bits of space. 2. The algorithm divides the window into buckets representing powers of 2 numbers of 1s. It stores the timestamp and size of each bucket to estimate query results within 50% error. 3. An example shows estimating the number of 1s in the last 16 bits by identifying the oldest relevant bucket and summing bucket sizes, obtaining an estimate within the actual value.

Uploaded by

Soham Naik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views3 pages

72 Soham Naik BDA EXP7

1. The document describes the Datar-Gionis-Indyk-Motwani (DGIM) algorithm for tracking the number of 1s in a sliding window of a binary stream using only O(log N) bits of space. 2. The algorithm divides the window into buckets representing powers of 2 numbers of 1s. It stores the timestamp and size of each bucket to estimate query results within 50% error. 3. An example shows estimating the number of 1s in the last 16 bits by identifying the oldest relevant bucket and summing bucket sizes, obtaining an estimate within the actual value.

Uploaded by

Soham Naik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

EXPERIMENT NO.

7
Name: Soham Naik
Div: A
Roll No. : 72

AIM: Aim-Implementing DGIM algorithm using any Programming Language.

THEORY:

1. Datar-Gionis-Indyk-Motwani Algorithm :
Suppose we have a window of length N on a binary stream. We want at all times to be able to
answer queries of the form “how many 1’s are there in the last k bits?” for any k≤ N. For this
purpose we use the DGIM algorithm.
The basic version of the algorithm uses O(log2 N) bits to represent a window of N bits, and
allows us to estimate the number of 1’s in the window with an error of no more than 50%.
To begin, each bit of the stream has a timestamp, the position in which it arrives. The first bit
has timestamp 1, the second has timestamp 2, and so on.
Since we only need to distinguish positions within the window of length N, we shall represent
timestamps modulo N, so they can be represented by log2 N bits. If we also store the total
number of bits ever seen in the stream (i.e., the most recent timestamp) modulo N, then we
can determine from a timestamp modulo N where in the current window the bit with that
timestamp is.
We divide the window into buckets, 5 consisting of:
1. The timestamp of its right (most recent) end.
2. The number of 1’s in the bucket. This number must be a power of 2, and we refer to
the number of 1’s as the size of the bucket.
To represent a bucket, we need log2 N bits to represent the timestamp (modulo N) of its right
end. To represent the number of 1’s we only need log2 log2 N bits. The reason is that we know
this number i is a power of 2, say 2j , so we can represent i by coding j in binary. Since j is at
most log2 N, it requires log2 log2 N bits. Thus, O(logN) bits suffice to represent a bucket.
There are six rules that must be followed when representing a stream by buckets.
 The right end of a bucket is always a position with a 1.
 Every position with a 1 is in some bucket.
 No position is in more than one bucket.
 There are one or two buckets of any given size, up to some maximum size.
 All sizes must be a power of 2.
 Buckets cannot decrease in size as we move to the left (back in time).

2. Explanation
We divide the window into buckets, consisting of:
1. The timestamp of its right (most recent) 2. The number of 1s in the bucker. This
number must be a power of 2, and we refer to the number of 1s as the size of the bucket.
For example, in our figure, a bucket with timestamp 1 and size 2 represents a bucket that
contains the two most recent Is with timestamps and 3. Note that the timestamp of a
bucket increases as w elements arrive. When the timestamp of a bucket expires that is it
reaches (N + 1), we are no longer interested in data elements contained in it, so we drop
that bucket and reclaim its memory. If a bucker is still active, we are guaranteed that it
contains at least a single 1 that has not expired. Thus, at any instant, there is at most one
bucket (the last bucket) containing Is that may have expired.
To represent a bucket, we need log,/ bits to represent the timestamp (modulo N) of its
right end To represent the number of Is we only need (log2 log2 M) bits. Since we insist
that the number of Is" is a power of 2, say 2, so we can represent "i" by coding j in
binary. Since j is at most log,N, it requires log logNbits. Thus, 0 (log M) bits suffice to
represent a bucket.
There are certain constraints that must be satisfied for representing a stream by buckets
using the DGIM algorithm. The right end of a bucket always starts with a position with a
Number of Is must be a power of 2. That explains the O(log log M) bits needed for
storing the number of Is. 3. Either one or two buckets with the same power-of-2 number
of Is exists.4. Buckets do not overlap in timestamps. 5. Buckets are sorted by size. Earlier
buckets are not smaller than later buckets. 6 Buckets disappear when their end-time is >N
time units in the past

3. Example :
Query Answering in the DGIM Algorithm
If the query is to calculate the number of 1's in the last k bits of the window then: o Find
the oldest timestamp bucket b which has at least one bit of the most recent k bits. The
estimate of the number of 1's Half of the size of the oldest bucket+sizes of all the newer
buckets.
thus apply the above mentioned steps to the buckets in the previous example. Suppose we
want number of 1's in the most recent 16 bits, then we observe that starting from the
rightmost side the latest 16 bits fall inside both the size 1 buckets, one size 2 bucket and
partially the size 4 bucket. This means in this case the oldest timestamp bucket is the size
4 bucket.
Thus the estimate of the number of 1's in the latest 16 bits = (4/2)+2+1+1=6. But the
actual number of 1's is 7.

Program :
pip install dgim
from dgim import Dgim
dgim = Dgim(N=32, error_rate=0.1)
for i in range(100):
    dgim.update(True)
dgim_result = dgim.get_count() # 30 (exact result is 32)

Output :

CONCLUSION: This dgim algo is implemented and output is shown for Query
Answering in the DGIM Algorithm

You might also like