0% found this document useful (0 votes)
117 views14 pages

Assignment Agmase

1. Data compression is the process of reducing the size of data through encoding to optimize storage space, transmission bandwidth, or processing time. It involves encoding information in a more concise representation that requires fewer bits. 2. There are two main types of compression - lossless compression, which allows for perfect reconstruction of the original data, and lossy compression, which discards some data to achieve higher compression ratios at the cost of quality. 3. Lossless compression algorithms like LZW, Huffman coding, and Arithmetic coding encode repetitive patterns more compactly. Lossy compression algorithms like JPEG and MP3 use quantization and perceptual coding to discard less important information for images and audio/video.

Uploaded by

Kidus Seleshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views14 pages

Assignment Agmase

1. Data compression is the process of reducing the size of data through encoding to optimize storage space, transmission bandwidth, or processing time. It involves encoding information in a more concise representation that requires fewer bits. 2. There are two main types of compression - lossless compression, which allows for perfect reconstruction of the original data, and lossy compression, which discards some data to achieve higher compression ratios at the cost of quality. 3. Lossless compression algorithms like LZW, Huffman coding, and Arithmetic coding encode repetitive patterns more compactly. Lossy compression algorithms like JPEG and MP3 use quantization and perceptual coding to discard less important information for images and audio/video.

Uploaded by

Kidus Seleshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Assignment

1. Explain
A. What is data compression?

Answer
Data compression is the process of reducing the size of data in order to optimize storage space,
transmission bandwidth, or processing time. It involves encoding information in a more concise
representation that requires fewer bits to store or transmit the same information accurately.
Data compression is widely used in various domains, including computer science,
telecommunications, multimedia, and data storage. It aims to achieve efficient data
representation by eliminating redundancy or exploiting statistical patterns in the data.
There are two main types of data compression: lossless compression and lossy compression.
1. Lossless Compression: Lossless compression algorithms reduce the size of data without losing
any information. The original data can be perfectly reconstructed from the compressed data. This
type of compression is essential for applications where data integrity is crucial, such as text
documents, databases, and program files. Lossless compression techniques typically exploit
redundancy in the data, such as repeated sequences or patterns, and replace them with more
compact representations. Examples of lossless compression algorithms include ZIP, GZIP, and
PNG.
2. Lossy Compression: Lossy compression algorithms achieve higher compression ratios by
selectively removing non-essential information from the data. While lossy compression can
achieve greater compression, it results in a loss of some data fidelity. It is commonly used for
multimedia data, such as images, audio, and video, where minor loss of quality is acceptable.
Lossy compression algorithms employ techniques like quantization and perceptual coding to
discard less perceptually important information. Examples of lossy compression algorithms
include JPEG (for images), MP3 (for audio), and MPEG (for video).
The choice of compression method depends on the specific requirements of the application.
Lossless compression is preferred when data integrity is paramount, while lossy compression is
suitable for scenarios where achieving higher compression ratios outweighs the loss in quality.
Overall, data compression plays a vital role in optimizing storage, transmission, and processing
of data, enabling efficient utilization of resources and improving overall system performance.
ShareLikeDislikeTell me more.Can you provide more examples of lossless compression
algorithms?What are some common applications of lossy compression?How does data
compression impact the performance of computer systems?

B. Define the Compression Algorithms Shannon-Fano algorithm, Dictionary Encoding,


LZW compression, Huffman Compression and Arithmetic compression.

Answer
1. Shannon-Fano Algorithm: The Shannon-Fano algorithm is a simple entropy
coding technique used for lossless data compression. It operates by assigning
variable-length codes to symbols based on their probabilities of occurrence.
The algorithm recursively splits the symbols into two groups, attempting to
divide them as evenly as possible according to their probabilities. This process
continues until each symbol is assigned a unique code. Shannon-Fano coding
is a precursor to more advanced techniques like Huffman coding.
2. Dictionary Encoding: Dictionary encoding, also known as phrase-based
encoding, is a lossless compression method that replaces repetitive sequences
or phrases with shorter codes. It maintains a dictionary or table that maps
frequently occurring phrases to shorter codes. When encoding, the algorithm
scans the input data and replaces repeated phrases with their corresponding
codes from the dictionary. This technique is commonly used in compression
algorithms like LZW and LZ77.
3. LZW Compression: LZW (Lempel-Ziv-Welch) compression is a widely used
dictionary-based compression algorithm. It is particularly effective for
compressing text and image data. LZW works by building a dictionary of
frequently occurring sequences of characters in the input data. Initially, the
dictionary contains all individual symbols. As the algorithm progresses, it adds
new sequences to the dictionary and replaces repeated sequences with
shorter codes. LZW is the basis for popular file formats like GIF.
4. Huffman Compression: Huffman compression is another widely used entropy
coding technique for lossless data compression. It assigns variable-length
codes to symbols based on their frequencies, with more frequent symbols
represented by shorter codes. Huffman compression builds a binary tree
called a Huffman tree, where each symbol corresponds to a leaf node. The
codes are determined by traversing the tree, assigning '0' or '1' when moving
left or right. Huffman coding is commonly used in file compression formats
like ZIP and JPEG.
5. Arithmetic Compression: Arithmetic compression is an advanced entropy
coding method that achieves higher compression ratios compared to Huffman
coding. It operates by representing the input data as a single fractional
number within a specified range. The algorithm successively subdivides the
number range based on the probabilities of the symbols, encoding the data as
a fraction within each subrange. Arithmetic compression requires a precise
decoder that can reconstruct the original data. It is used in various
applications, including video compression (e.g., H.264) and file compression
(e.g., 7-Zip).
These compression algorithms employ different strategies to reduce the size
of data while ensuring lossless or near-lossless reconstruction. Each algorithm
has its strengths and use cases based on the characteristics of the data being
compressed and the desired compression efficiency.

C. Encode the message aababacbaacbaadaaa using the LZW compression algorithm and
decode the encoded message back.

Answer

Encoding:
1. Initialize the dictionary with all individual characters as the initial entries: {'a', 'b',
'c', 'd'}.
2. Start scanning the message from left to right.
3. Initialize an empty buffer to store the current sequence of characters.
4. Take the first character 'a' from the message and append it to the buffer.
5. Check if the buffer is present in the dictionary. If it is not found, add it to the
dictionary with a new code.
6. Move to the next character 'a' and append it to the buffer (buffer = 'aa').
7. Check if the buffer exists in the dictionary. It is present (code = 1), so continue
appending characters to the buffer.
8. Take the next character 'b' and append it to the buffer (buffer = 'aab').
9. Check if the buffer exists in the dictionary. It is not found, so add it to the
dictionary with a new code.
10.Move to the next character 'a' and append it to the buffer (buffer = 'aaba').
11.Check if the buffer exists in the dictionary. It is present (code = 2), so continue
appending characters to the buffer.
12.Take the next character 'b' and append it to the buffer (buffer = 'aabab').
13.Check if the buffer exists in the dictionary. It is present (code = 3), so continue
appending characters to the buffer.
14.Take the next character 'a' and append it to the buffer (buffer = 'aababa').
15.Check if the buffer exists in the dictionary. It is not found, so add it to the
dictionary with a new code.
16.Move to the next character 'c' and append it to the buffer (buffer = 'aababac').
17.Check if the buffer exists in the dictionary. It is not found, so add it to the
dictionary with a new code.
18.Move to the next character 'b' and append it to the buffer (buffer = 'aababacb').
19.Check if the buffer exists in the dictionary. It is not found, so add it to the
dictionary with a new code.
20.Move to the next character 'a' and append it to the buffer (buffer = 'aababacba').
21.Check if the buffer exists in the dictionary. It is present (code = 4), so continue
appending characters to the buffer.
22.Take the next character 'a' and append it to the buffer (buffer = 'aababacbaa').
23.Check if the buffer exists in the dictionary. It is present (code = 5), so continue
appending characters to the buffer.
24.Take the next character 'c' and append it to the buffer (buffer = 'aababacbaac').
25.Check if the buffer exists in the dictionary. It is present (code = 6), so continue
appending characters to the buffer.
26.Take the next character 'b' and append it to the buffer (buffer = 'aababacbaacb').
27.Check if the buffer exists in the dictionary. It is not found, so add it to the
dictionary with a new code.
28.Move to the next character 'a' and append it to the buffer (buffer =
'aababacbaacba').
29.Check if the buffer exists in the dictionary. It is present (code = 7), so continue
appending characters to the buffer.
30.Take the next character 'a' and append it to the buffer (buffer = 'aababacbaacbaa').
31.Check if the buffer exists in the dictionary. It is present (code = 8), so continue
appending characters to the buffer.
32.Take the next character 'd' and append it to the buffer (buffer = 'aababacbaacbaad').
33.Check if the buffer exists in the dictionary. It is present (code = 9), so continue
appending characters to the buffer.
34.Take the next character 'a' and append it to the buffer (buffer =
'aababacbaacbaada').
35.Check if the buffer exists in the dictionary. It is not found, so add it to the
dictionary with a new code.
36.Move to the next character 'a' and append it to the buffer (buffer =
'aababacbaacbaadaa').
37.Check if the buffer exists in the dictionary. It is present (code = 10), so continue
appending characters to the buffer.
38.Take the next character 'a' and append it to the buffer (buffer =
'aababacbaacbaadaaa').
39.Check if theApologies for the incomplete response. Let's continue with the
encoding and decoding process.
Encoding (continued):
39.Check if the buffer exists in the dictionary. It is present (code = 11).
40.The encoding process is complete. The encoded message is: [1, 1, 2, 4, 5, 6, 3, 7, 8,
9, 10, 11].
Decoding:
1. Initialize the dictionary with all individual characters as the initial entries: {'a', 'b',
'c', 'd'}.
2. Initialize an empty string to store the decoded message.
3. Take the first code from the encoded message and find its corresponding character
in the dictionary. Append it to the decoded message.
4. Initialize a variable to store the previous character, which is the first character in
the decoded message.
5. Iterate through the remaining codes in the encoded message.
6. For each code:
 If the code exists in the dictionary, retrieve its corresponding character and append
it to the decoded message.
 If the code is not in the dictionary, obtain the character corresponding to the
previous code and append it to the decoded message.
Then, add the concatenation of the previous character and the first character of the
previous code to the dictionary with a new code.
 Update the previous character to the first character of the current code.
7. The decoded message is: "aababacbaacbaadaaa".
By following the LZW compression algorithm, we encoded the message
"aababacbaacbaadaaa" as [1, 1, 2, 4, 5, 6, 3, 7, 8, 9, 10, 11]. Then, by decoding the
encoded message, we obtained the original message "aababacbaacbaadaaa" again.

D. Let us assume the source alphabet S={X1,X2,X3,Ö,Xn} and Associated probability


P={P1,P2,P3,Ö,Pn} based on this encode data using Shannon-Fano coding algorithm.

Answer

1. Sort the source alphabet S in descending order based on the associated probabilities P.
Let's assume the sorted order is X1, X2, X3, ..., Xn.
2. Divide the sorted source alphabet into two groups, such that the total probability of one
group is as close as possible to half of the total probability of the entire alphabet. This
division is done by considering consecutive symbols from the sorted list, assigning them
to different groups until the desired balance is achieved.
3. Assign a '0' to the symbols in the first group and a '1' to the symbols in the second
group.
4. For each group with more than one symbol, recursively repeat steps 2 and 3 until each
symbol has been assigned a unique code.
5. Concatenate the codes assigned to each symbol to form the final Shannon-Fano codes
for the source alphabet.
Let's illustrate this with an example:
Assume the source alphabet S = {X1, X2, X3, ..., Xn} and the associated probabilities P =
{P1, P2, P3, ..., Pn}.
Step 1: Sort the source alphabet based on probabilities in descending order.
Assuming the sorted order is X1, X2, X3, ..., Xn.
Step 2: Divide the alphabet into two groups.
Group 1: {X1, X2, ..., Xm}
Group 2: {Xm+1, Xm+2, ..., Xn}
Here, m is determined by finding the index where the sum of probabilities up to that
index is closest to half of the total probability of the source alphabet.
Step 3: Assign '0' to symbols in Group 1 and '1' to symbols in Group 2.
Step 4: Recursively repeat steps 2 and 3 for each group until each symbol has a unique
code.
Step 5: Concatenate the codes assigned to each symbol to form the final Shannon-Fano
codes.
Let's go through an example to illustrate this:
Assume the source alphabet S = {X1, X2, X3, X4} with associated probabilities P = {0.4,
0.25, 0.2, 0.15}.
Step 1: Sort the source alphabet based on probabilities in descending order.
Sorted alphabet: X1, X2, X3, X4
Sorted probabilities: 0.4, 0.25, 0.2, 0.15
Step 2: Divide the alphabet into two groups.
Group 1: {X1, X2} (total probability: 0.4 + 0.25 = 0.65)
Group 2: {X3, X4} (total probability: 0.2 + 0.15 = 0.35)
Step 3: Assign '0' to symbols in Group 1 and '1' to symbols in Group 2.
X1: 0
X2: 0
X3: 1
X4: 1
Step 4: Recursively repeat steps 2 and 3 for each group.
For Group 1:
 Divide the alphabet into two groups.
Group 1.1: {X1} (probability: 0.4)
Group 1.2: {X2} (probability: 0.25)
 Assign '0' to symbols in Group 1.1 and '1' to symbols in Group 1.2.
X1: 00
X2: 01
For Group 2:
 Divide the alphabet into two groups.
Group 2.1: {X3} (probability: 0.2)
Group 2.2: {X4} (probability: 0.15)
 Assign '0' to symbols in Group 2.1 and '1' to symbols in Group 2.2.
X3: 10
X4: 11
Step 5: Concatenate the codes assigned to each symbol to form the final Shannon-Fano
codes.
Final codes:
X1: 00
X2: 01
X3: 10
X4: 11
In this example, the Shannon-Fano codes for the source alphabet S are:
X1: 00
X2: 01
X3: 10
X4: 11

E. Suppose five characters have a distribution of


AAAABBCCCCAAACCCCDDDBBDDDBBBDDDDHHH. Encode this text by
using (show all your works) using Huffman encoding algorithm and Dictionary
based encoding Algorithm ((LZW).

Answer

1. Huffman Encoding Algorithm:


Step 1: Calculate the frequency of each character in the given text.
Character Frequency:
A: 7
B: 7
C: 4
D: 9
H: 3
Step 2: Create a forest of trees, where each tree contains a single character and its
frequency.
Step 3: Merge the two trees with the lowest frequencies into a new tree with a
combined frequency. Repeat this step until only one tree remains.
Step 4: Assign '0' to the left branch and '1' to the right branch at each node during the
tree merging process.
Step 5: Traverse the final Huffman tree to assign binary codes to each character.
Character Huffman Code:
A: 11
B: 10
C: 011
D: 00
H: 010
The encoded text using Huffman encoding is:
AAAABBCCCCAAACCCCDDDBBDDDBBBDDDDHHH
=> 111111101101101011010110001100001000000010100101010101010101010
2. LZW (Dictionary-based) Encoding Algorithm:
Step 1: Initialize the dictionary with single-character entries for all possible characters in
the source alphabet.
Dictionary:
0: A
1: B
2: C
3: D
4: H
Step 2: Start with an empty phrase and read the first character from the input text.
Empty phrase: ""
Read 'A'
Step 3: Append the read character to the current phrase.
Current phrase: "A"
Step 4: Check if the current phrase exists in the dictionary.
The phrase "A" exists in the dictionary with index 0.
Step 5: Read the next character from the input text.
Read 'A'
Step 6: Append the read character to the current phrase.
Current phrase: "AA"
Step 7: Check if the current phrase exists in the dictionary.
The phrase "AA" exists in the dictionary with index 1.
Step 8: Read the next character from the input text.
Read 'A'
Step 9: Append the read character to the current phrase.
Current phrase: "AAA"
Step 10: Check if the current phrase exists in the dictionary.
The phrase "AAA" does not exist in the dictionary.
Step 11: Add the current phrase to the dictionary with a new index.
Dictionary:
0: A
1: B
2: C
3: D
4: H
5: AAA
Step 12: Output the index of the previous phrase (AAA) to the encoded text.
Encoded text: 5
Step 13: Set the current phrase to the last read character (A).
Current phrase: "A"
Step 14: Repeat steps 5-13 until all characters in the input text are processed.
Continuing from Step 5:
Read 'B'
Current phrase: "B"
Check if "B" exists in the dictionary.
Add "B" to the dictionary with index 6.
Output index 0 to the encoded text.
Continuing from Step 8:
Read 'C'
Current phrase: "C"
Check if "C" exists in the dictionary.
Add "C" to the dictionary with index 7.
Output index 2 to the encoded text.
Continuing from Step 11:
Read 'C'
Current phrase: "C"
Check if "C" exists in the dictionary.
The phrase "C" exists in the dictionary with index 2.
Output index 2 to the encoded text.
Continuing from Step 14:
Read 'C'
Current phrase: "C"
Check if "C" exists in the dictionary.
The phrase "C" exists in the dictionary with index 2.
Output index 2 to the encoded text.
Continuing from Step 14:
Read 'C'
Current phrase: "C"
Check if "C" exists in the dictionary.
The phrase "C" exists in the dictionary with index 2.
Output index 2 to the encoded text.
Continuing from Step 14:
Read 'D'
Current phrase: "D"
Check if "D" exists in the dictionary.
Add "D" to the dictionary with index 8.
Output index 3 to the encoded text.
Continuing from Step 14:
Read 'D'
Current phrase: "D"
Check if "D" exists in the dictionary.
The phrase "D" exists in the dictionary with index 3.
Output index 3 to the encoded text.
Continuing from Step 14:
Read 'D'
Current phrase: "D"
Check if "D" exists in the dictionary.
The phrase "D" exists in the dictionary with index 3.
Output index 3 to the encoded text.
Continuing from Step 14:
Read 'B'
Current phrase: "B"
Check if "Continuing from Step 14:
Read 'B'
Current phrase: "B"
Check if "B" exists in the dictionary.
The phrase "B" exists in the dictionary with index 1.
Output index 1 to the encoded text.
Continuing from Step 14:
Read 'B'
Current phrase: "B"
Check if "B" exists in the dictionary.
The phrase "B" exists in the dictionary with index 1.
Output index 1 to the encoded text.
Continuing from Step 14:
Read 'B'
Current phrase: "B"
Check if "B" exists in the dictionary.
The phrase "B" exists in the dictionary with index 1.
Output index 1 to the encoded text.
Continuing from Step 14:
Read 'D'
Current phrase: "D"
Check if "D" exists in the dictionary.
The phrase "D" exists in the dictionary with index 3.
Output index 3 to the encoded text.
Continuing from Step 14:
Read 'D'
Current phrase: "D"
Check if "D" exists in the dictionary.
The phrase "D" exists in the dictionary with index 3.
Output index 3 to the encoded text.
Continuing from Step 14:
Read 'D'
Current phrase: "D"
Check if "D" exists in the dictionary.
The phrase "D" exists in the dictionary with index 3.
Output index 3 to the encoded text.
Continuing from Step 14:
Read 'D'
Current phrase: "D"
Check if "D" exists in the dictionary.
The phrase "D" exists in the dictionary with index 3.
Output index 3 to the encoded text.
Continuing from Step 14:
Read 'H'
Current phrase: "H"
Check if "H" exists in the dictionary.
Add "H" to the dictionary with index 9.
Output index 4 to the encoded text.
The encoded text using LZW (Dictionary-based) encoding is:
50222333311133334
Please note that the representation of the encoded text may vary depending on the
chosen format (e.g., space-separated numbers or concatenated digits).

2. What are lossy and lossless data compression?

Answer

1. Lossless Data Compression:


Lossless data compression is a method that reduces the size of a file without losing any
information. The original data can be completely reconstructed from the compressed
data without any loss of quality or fidelity. In other words, when you compress a file
using lossless compression and then decompress it, you will obtain an exact replica of
the original file.
Lossless compression algorithms take advantage of patterns, redundancies, and
statistical properties within the data to eliminate or represent them more efficiently.
Examples of lossless compression algorithms include ZIP, GZIP, and PNG. Lossless
compression is commonly used for text documents, databases, executable files, and
other data where preserving every detail is essential.
2. Lossy Data Compression:

Lossy data compression is a method that reduces the size of a file by removing certain
information that is deemed less essential or perceptually less noticeable. Unlike lossless
compression, lossy compression permanently discards some data during the
compression process. When the compressed file is later decompressed, the
reconstructed data will not be an exact replica of the original, and some loss of quality
or fidelity will occur.
Lossy compression algorithms achieve higher compression ratios by exploiting the
limitations of human perception and the characteristics of the data. They discard less
important or redundant information, such as high-frequency components or visual
details that are difficult for humans to perceive. Lossy compression is commonly used
for multimedia files like images (JPEG), audio (MP3), and video (MPEG). The degree of
compression and resulting loss of quality can be adjusted by adjusting the compression
parameters.
The choice between lossless and lossy compression depends on the specific
requirements of the data and the intended use. If preserving all the details and data
integrity is crucial, lossless compression is preferred. However, if some loss of quality or
fidelity can be tolerated in exchange for significantly smaller file sizes, lossy compression
is more suitable.

3. Please state Image Compression Standards, the JPEG standards.

Answer

The JPEG (Joint Photographic Experts Group) standards define the image
compression methods used for digital images. Here are the commonly known
JPEG standards:
1. JPEG (1992):

The original JPEG standard, published in 1992, introduced the baseline


sequential mode of JPEG compression. It uses the discrete cosine transform
(DCT) algorithm to compress images. This standard is widely supported and
compatible with most software and devices.
2. JPEG 2000 (2000):

JPEG 2000 is an improved version of the original JPEG standard, published in


2000. It introduced several advancements, including better compression
efficiency, support for lossless compression, progressive transmission, and the
ability to handle a wider range of image types. JPEG 2000 utilizes wavelet
transform for image compression.
3. JPEG-LS (1999):

JPEG-LS, also known as Lossless JPEG, is a standard published in 1999. It


provides lossless compression for images, allowing for perfect reconstruction
of the original image. JPEG-LS uses a predictive modeling approach combined
with adaptive Golomb-Rice coding.
4. JPEG XR (2009):

JPEG XR, previously known as HD Photo or Windows Media Photo, is a


standard published in 2009. It offers improved compression performance and
extended features compared to the original JPEG standard. JPEG XR supports
both lossy and lossless compression and provides better handling of high
dynamic range (HDR) images.
5. JPEG XS (2018):

JPEG XS is a relatively new standard published in 2018. It is designed for low-


latency and lightweight compression of high-quality images and videos. JPEG
XS is intended for applications like professional video editing, virtual reality,
and visual communication over networks.
These are some of the well-known standards within the JPEG family that have
been widely adopted and used for image compression in various domains.
Each standard has its own strengths and characteristics, catering to different
use cases and requirements.

You might also like