0% found this document useful (0 votes)

2 views

UNIT-5 Entropy Encoding

The document discusses various data compression techniques, focusing on entropy encoding, repetitive character encoding, and statistical encoding methods like Huffman and arithmetic coding. It highlights applications in text processing, error detection, and multimedia formats, emphasizing the advantages and disadvantages of each method. Additionally, it covers source encoding and vector quantization, detailing their roles in reducing redundancy and optimizing data representation.

Uploaded by

kartik101203

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

UNIT-5 Entropy Encoding

Uploaded by

kartik101203

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

UNIT-5

Entropy encoding

An entropy encoding is a coding scheme that involves assigning

codes to symbols so as to match code lengths with the probabilities
of the symbols. Typically, entropy encoders are used to compress
data by replacing symbols represented by equal-length codes with
symbols represented by codes proportional to
the negative logarithm of the probability. Therefore, the most
common symbols use the shortest codes.

Repetitive character encoding

Repetitive character encoding is a technique used in computer science and data compression
where sequences of the same character are encoded to save space or simplify representation.
It is commonly associated with Run-Length Encoding (RLE), but the concept applies in
other areas as well.
Run-Length Encoding (RLE)
RLE compresses data by replacing consecutive repeated characters with a single character
followed by the count of repetitions. For example:
● Input: AAAABBBCCDAA
● Encoded Output: 4A3B2C1D2A
This approach is most effective when the data contains many repeated characters.

Other Applications of Repetitive Character Encoding

1. Text Processing:
o When analyzing or searching for patterns in text, repetitive encoding helps
standardize the representation of sequences.
o Example: Replace sequences of whitespace in text with a single space.
2. Error Detection and Correction:
o Repeating characters can add redundancy to help detect and correct
transmission errors.
o For instance, in telecommunication, the data stream 111000 could be
transmitted as 111111000000 to make errors easier to identify.
3. Bioinformatics:
o DNA sequences often contain repetitive patterns (e.g., AAAA or GGGG).
Encoding these sequences can help analyze genetic data efficiently.
Pros and Cons
Pros:
● Reduces storage requirements for repetitive data.
● Simplifies pattern recognition in certain datasets.
Cons:
● Ineffective for non-repetitive data (might even increase size).
● Can be computationally expensive for encoding/decoding large datasets.

Blank/ zero encoding

Blank/Zero Encoding refers to methods of representing sequences of blank (e.g.,
spaces or null values) or zero-valued data in a compressed or efficient format. These
methods are commonly used in data compression, storage optimization, or
transmission efficiency. Here's an overview of the concept and its applications:

Key Concepts of Blank/Zero Encoding

1. Run-Length Encoding (RLE):
o Similar to repetitive character encoding, RLE is also used for blank/zero
encoding.
o Example:
▪ Input: 00000001000000
▪ Encoded: (6,0)(1,1)(6,0) (representing six zeros, one one, and six
zeros).
2. Sparse Matrix Representation:
o Blank/zero encoding is often used in sparse matrices, where most of the
elements are zeros.
o Instead of storing all elements, only the non-zero elements and their positions
are recorded.
o Example:
▪ Matrix:
Copy code
003
040
000
▪ Encoded as: [(0,2,3), (1,1,4)] (row, column, value).
3. Bitmaps and Flags:
o Use a binary representation to encode blank or zero data.
o Example:
▪ Data: 0 0 0 1 0 0 0
▪ Encoded bitmap: 0001000.
4. Huffman Coding:
o Assign shorter codes to blanks/zeros if they appear frequently.
o Example:
▪ If 0 is the most common value in data, assign it a short Huffman code
(e.g., 1).

Applications of Blank/Zero Encoding

1. Data Compression:
o Especially effective for datasets with a high frequency of zeros or blank spaces
(e.g., image compression, audio silences).
2. Transmission Efficiency:
o Reduces bandwidth requirements by omitting redundant zero data in
communication protocols.
3. Database Storage:
o Helps optimize storage for sparse tables or fields with many null/zero entries.
4. Machine Learning:
o In one-hot encoding or sparse feature representations, blank/zero encoding can
optimize memory usage.

Advantages
● Reduces memory and storage usage.
● Improves data transmission speed for repetitive or sparse datasets.
Disadvantages
● Encoding and decoding can add computational overhead.
● Not effective for dense datasets without a high prevalence of blanks/zeros.
Statistical Encoding is a data compression technique that leverages the statistical
properties of data—such as the frequency of occurrence of symbols—to achieve
efficient encoding. The fundamental idea is to assign shorter codes to more frequently
occurring symbols and longer codes to less frequent ones, reducing the overall storage
or transmission cost.

Key Types of Statistical Encoding

1. Huffman Coding:
o Constructs a binary tree where frequently occurring symbols are assigned
shorter binary codes.
o Example:
▪ Frequencies: A: 45, B: 13, C: 12, D: 16, E: 9, F: 5
▪ Encoded Output: A: 0, B: 101, C: 100, D: 111, E: 1101, F: 1100.
2. Arithmetic Coding:
o Encodes the entire message as a single fractional value in a range [0,1), based
on symbol probabilities.
o Example:
▪ Message: ABBA
▪ Probabilities: A=0.6, B=0.4
▪ Encoded value: A fractional number (e.g., 0.241) representing the
entire message.
3. Shannon-Fano Coding:
o Symbols are ordered by frequency, then hierarchically divided into groups,
and binary codes are assigned.
o Example:
▪ Frequencies: A: 8, B: 3, C: 1
▪ Encoded Output: A: 0, B: 10, C: 11.
4. Entropy Coding:
o Based on the concept of entropy, which measures the amount of information
or uncertainty in data.
o Typically used in combination with other statistical methods, e.g., Huffman or
Arithmetic coding.

Applications of Statistical Encoding

1. File Compression:
o Algorithms like ZIP and GZIP use statistical encoding methods.
2. Multimedia Data:
o Formats like JPEG, MP3, and MPEG use Huffman coding or similar
techniques for image and audio compression.
3. Network Protocols:
o Statistical encoding is used to minimize bandwidth for transmitting data with
predictable patterns.
4. Natural Language Processing (NLP):
o Encoding words based on their frequency in text corpora.

Advantages
● Optimized for datasets with skewed distributions, reducing average code length.
● Can achieve near-optimal compression ratios for well-understood distributions.
Disadvantages
● Ineffective for uniformly distributed data (no significant frequency difference).
● Requires preprocessing to calculate symbol frequencies, which can be
computationally expensive.

Source Encoding
Source Encoding, also known as data compression or source coding, is the process
of converting data from its original format into a compressed representation to reduce
redundancy and optimize storage or transmission. It is widely used in
telecommunications, multimedia, and data processing.

Key Concepts in Source Encoding

1. Redundancy Reduction:
o Removes unnecessary or repetitive information from data.
o Example: Compressing a text file by removing extra spaces or repeated
characters.
2. Entropy:
o Based on Shannon's Information Theory, entropy measures the minimum
average number of bits needed to encode symbols based on their probabilities.
o Example:
▪ Symbol probabilities: A=0.5, B=0.25, C=0.25.
▪ Minimum average bits per symbol ≈ 1.5 bits.
3. Lossless vs. Lossy Compression:
o Lossless Compression: No information is lost; the original data can be
perfectly reconstructed (e.g., ZIP, PNG).
o Lossy Compression: Irrelevant or less noticeable data is discarded to achieve
higher compression ratios (e.g., JPEG, MP3).

Methods of Source Encoding

1. Statistical Encoding:
o Encodes data based on symbol frequencies.
o Examples:
▪ Huffman Coding: Assigns shorter codes to frequent symbols.
▪ Arithmetic Coding: Encodes entire data as a fractional number based
on probabilities.
2. Dictionary-Based Encoding:
o Builds a dictionary of patterns (substrings) and replaces patterns with
references.
o Examples:
▪ Lempel-Ziv (LZ) Coding: Replaces repeated sequences with pointers.
▪ LZW Compression: A variant of LZ78 used in formats like GIF.
3. Run-Length Encoding (RLE):
o Replaces consecutive identical symbols with a single symbol and a count.
o Example: AAAABBB → 4A3B.
4. Transform-Based Compression:
o Converts data to a different domain where it is easier to compress.
o Examples:
▪ Discrete Cosine Transform (DCT): Used in JPEG.
▪ Fourier Transform: Used in audio compression.

Applications of Source Encoding

1. File Compression:
o Tools like ZIP, RAR, and 7z use source encoding to reduce file sizes.
2. Multimedia Formats:
o Used in MP3, JPEG, PNG, and MPEG for audio, image, and video
compression.
3. Data Transmission:
o Reduces bandwidth requirements in telecommunications (e.g., cellular data,
satellite communication).

Advantages
● Saves storage space.
● Reduces transmission time and bandwidth costs.
● Optimizes performance in systems with limited resources.
Disadvantages
● Encoding/decoding can be computationally expensive.
● Lossy methods may degrade data quality.

What is Vector Quantization?

Vector Quantization (VQ) is a technique used in signal processing, data compression,
and pattern recognition that involves quantizing continuous or discrete data into a
finite set of representative vectors, known as codebook vectors or centroids. The goal
of Vector Quantization is to minimize the distortion between the input data and the
codebook vectors, thereby achieving a compact representation of the data while
preserving as much information as possible.

What does Vector Quantization do?

Vector Quantization performs the following tasks:

● Clustering: Vector Quantization groups similar data points together based on a
similarity metric, such as Euclidean distance or cosine similarity, in order to create
clusters of similar data.
● Codebook generation: Vector Quantization creates a codebook, which is a set of
representative vectors (centroids) for each cluster. The codebook serves as a
compressed representation of the original data.
● Quantization: Vector Quantization replaces each data point with the index of the
closest codebook vector, effectively quantizing the data and reducing its size.
Some benefits of using Vector Quantization
Vector Quantization offers several benefits for data analysis and compression tasks:
● Data compression: Vector Quantization can achieve significant data compression with
minimal loss of information, making it suitable for applications like image and audio
compression.
● Noise reduction: Vector Quantization can help reduce noise in the data by replacing
individual data points with representative codebook vectors, leading to smoother and
more robust representations.
● Pattern recognition: Vector Quantization can be used to identify patterns or structures
in the data, which can be useful for tasks like classification, clustering, and feature
extraction.

Assembly Language:Simple, Short, And Straightforward Way Of Learning Assembly Programming
From Everand
Assembly Language:Simple, Short, And Straightforward Way Of Learning Assembly Programming
Sherwyn Allibang
2/5 (1)
Primer Main 2
100% (1)
Primer Main 2
34 pages
Smart Parking System
100% (2)
Smart Parking System
56 pages
chapter 7
No ratings yet
chapter 7
70 pages
Module 3
No ratings yet
Module 3
23 pages
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
No ratings yet
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
37 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
Chapter 3 Multimedia Data Compression
100% (2)
Chapter 3 Multimedia Data Compression
23 pages
Unit 5 Data Compression (1)
No ratings yet
Unit 5 Data Compression (1)
12 pages
HTCS501 unit 4
No ratings yet
HTCS501 unit 4
17 pages
Entropy & Run Length Coding
No ratings yet
Entropy & Run Length Coding
45 pages
Chapter Five Lossless Compression
No ratings yet
Chapter Five Lossless Compression
49 pages
Module IV
No ratings yet
Module IV
37 pages
Jancy-Jayakumar2019 Article SequenceStatisticalCodeBasedDa
No ratings yet
Jancy-Jayakumar2019 Article SequenceStatisticalCodeBasedDa
15 pages
UNIT - IV - PPT
100% (1)
UNIT - IV - PPT
18 pages
Module 5 - Info Theory and Compression Algo
No ratings yet
Module 5 - Info Theory and Compression Algo
58 pages
Literature Survey
No ratings yet
Literature Survey
5 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
Compression Techniques and Cyclic Redundency Check
No ratings yet
Compression Techniques and Cyclic Redundency Check
5 pages
EEE 427_4
No ratings yet
EEE 427_4
9 pages
Compression PDF
No ratings yet
Compression PDF
55 pages
Data Compression Seminar Report
67% (6)
Data Compression Seminar Report
34 pages
Introduction To Data Compression - Guy E. Blelloch PDF
No ratings yet
Introduction To Data Compression - Guy E. Blelloch PDF
54 pages
MMC MODULE III-1
No ratings yet
MMC MODULE III-1
73 pages
Analog & Digital Communication Presentation On Data Compression
No ratings yet
Analog & Digital Communication Presentation On Data Compression
31 pages
Chapter 5 New
No ratings yet
Chapter 5 New
19 pages
unit 5 data compression
No ratings yet
unit 5 data compression
98 pages
Aadel Veri
No ratings yet
Aadel Veri
37 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
Data Compression: CS 147 Minh Nguyen
No ratings yet
Data Compression: CS 147 Minh Nguyen
25 pages
Data Compression 2
No ratings yet
Data Compression 2
19 pages
Compression
100% (1)
Compression
38 pages
Tutorial 8
No ratings yet
Tutorial 8
20 pages
Image Compression
No ratings yet
Image Compression
38 pages
CHAPTER 7
No ratings yet
CHAPTER 7
36 pages
Data Compression
No ratings yet
Data Compression
22 pages
Presentation Multimedia
No ratings yet
Presentation Multimedia
15 pages
Umit;1 Mmdcs
No ratings yet
Umit;1 Mmdcs
17 pages
Chapter 4 Multi
No ratings yet
Chapter 4 Multi
45 pages
Chapter 4 - Introduction To Source Coding
No ratings yet
Chapter 4 - Introduction To Source Coding
72 pages
CHAPTER FOURmultimedia
No ratings yet
CHAPTER FOURmultimedia
23 pages
MM Unit-III - 0
No ratings yet
MM Unit-III - 0
22 pages
Main Techniques and Performance of Each Compression
No ratings yet
Main Techniques and Performance of Each Compression
23 pages
Image Compression: Transmit
No ratings yet
Image Compression: Transmit
16 pages
Assignment Agmase
No ratings yet
Assignment Agmase
14 pages
DC-PPT 5
No ratings yet
DC-PPT 5
44 pages
Lesson - Huffman and Entropy Coding
No ratings yet
Lesson - Huffman and Entropy Coding
31 pages
CH 6
No ratings yet
CH 6
21 pages
MMC Unit II
No ratings yet
MMC Unit II
40 pages
Ip-Un3 1
No ratings yet
Ip-Un3 1
44 pages
3.source Coding Data Compression
No ratings yet
3.source Coding Data Compression
25 pages
2017 May 24 Huffman Lecture1
No ratings yet
2017 May 24 Huffman Lecture1
24 pages
3 Source Coding
No ratings yet
3 Source Coding
31 pages
Data Compression Explained
No ratings yet
Data Compression Explained
110 pages
Data Compression Techniques: Pushpender Rana, Student
No ratings yet
Data Compression Techniques: Pushpender Rana, Student
4 pages
Algorithms in The Real World: Data Compression: Lectures 1 and 2
No ratings yet
Algorithms in The Real World: Data Compression: Lectures 1 and 2
55 pages
Image Compression
100% (1)
Image Compression
47 pages
Mad Unit 3-Jntuworld
No ratings yet
Mad Unit 3-Jntuworld
53 pages
L15-Compression
No ratings yet
L15-Compression
63 pages
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Application and Implementation of DES Algorithm Based on FPGA
From Everand
Application and Implementation of DES Algorithm Based on FPGA
madhav
No ratings yet
1 Binary Arithmetic - OPERATIONS
100% (1)
1 Binary Arithmetic - OPERATIONS
13 pages
CP 2
No ratings yet
CP 2
2 pages
(Tedx) (Team Operation) (NGUYEN MINH TRI) (Official Summission)
No ratings yet
(Tedx) (Team Operation) (NGUYEN MINH TRI) (Official Summission)
3 pages
Unit-II-CNS PPT
No ratings yet
Unit-II-CNS PPT
118 pages
Business Quiz QUESTIONS
No ratings yet
Business Quiz QUESTIONS
4 pages
OpenSCAD User Manual
No ratings yet
OpenSCAD User Manual
76 pages
[Communications in Computer and Information Science 281] Muhammad Aamir, Javier Poncela (Auth.), Bhawani Shankar Chowdhry, Faisal Karim Shaikh, Dil Muhammad Akbar Hussain, Muhammad Aslam Uqaili (Eds.) - Eme
No ratings yet
[Communications in Computer and Information Science 281] Muhammad Aamir, Javier Poncela (Auth.), Bhawani Shankar Chowdhry, Faisal Karim Shaikh, Dil Muhammad Akbar Hussain, Muhammad Aslam Uqaili (Eds.) - Eme
567 pages
1 - Magspin MSK MRI
No ratings yet
1 - Magspin MSK MRI
4 pages
User-Centered Website Development: A Human-Computer Interaction Approach
No ratings yet
User-Centered Website Development: A Human-Computer Interaction Approach
37 pages
3 AAPP005-4-2 Assignment
No ratings yet
3 AAPP005-4-2 Assignment
12 pages
EU_HFRadar_inventory
No ratings yet
EU_HFRadar_inventory
23 pages
DataShed Administrator Workbook 2017
100% (1)
DataShed Administrator Workbook 2017
11 pages
Describe The Basic Internal Layout and Operation of A Magnetic Hard Disk Drive
No ratings yet
Describe The Basic Internal Layout and Operation of A Magnetic Hard Disk Drive
2 pages
EECS 370 Final Review
No ratings yet
EECS 370 Final Review
16 pages
AccountStatement_Report_6092264012_17122024_10_32
No ratings yet
AccountStatement_Report_6092264012_17122024_10_32
1 page
Offshore Access Gangways Guide e
100% (1)
Offshore Access Gangways Guide e
64 pages
Series: Beautifully Engineered Sound
No ratings yet
Series: Beautifully Engineered Sound
8 pages
IAGuidelines Physics
No ratings yet
IAGuidelines Physics
14 pages
Administrators Manual.: Traffpro (Traffic & Protection)
No ratings yet
Administrators Manual.: Traffpro (Traffic & Protection)
47 pages
Module 1 Introduction To Microsoft Azure
No ratings yet
Module 1 Introduction To Microsoft Azure
19 pages
36x48 Trifold Templatev12
No ratings yet
36x48 Trifold Templatev12
1 page
Week 01 Assignment Solution
No ratings yet
Week 01 Assignment Solution
5 pages
Student Industrial Training Report
No ratings yet
Student Industrial Training Report
19 pages
A Verilog BFM Methodology
No ratings yet
A Verilog BFM Methodology
16 pages
RDR GX350
No ratings yet
RDR GX350
5 pages
Python Notes For BCA
86% (7)
Python Notes For BCA
101 pages
Instant ebooks textbook Windows Game Programming with Visual Basic and DirectX 1st Edition Freeze download all chapters
100% (5)
Instant ebooks textbook Windows Game Programming with Visual Basic and DirectX 1st Edition Freeze download all chapters
81 pages
TRX Cooperation
No ratings yet
TRX Cooperation
22 pages

UNIT-5 Entropy Encoding

Uploaded by

UNIT-5 Entropy Encoding

Uploaded by

UNIT-5

An entropy encoding is a coding scheme that involves assigning

Repetitive character encoding

Other Applications of Repetitive Character Encoding

Blank/ zero encoding

Key Concepts of Blank/Zero Encoding

Applications of Blank/Zero Encoding

Key Types of Statistical Encoding

Applications of Statistical Encoding

Key Concepts in Source Encoding

Methods of Source Encoding

Applications of Source Encoding

What is Vector Quantization?

What does Vector Quantization do?

Vector Quantization performs the following tasks:

You might also like