0% found this document useful (0 votes)
103 views

Synopsis On: Data Compression

This document discusses data compression techniques. It begins by defining data compression as representing information using the fewest number of bits while maintaining accuracy. Data compression is useful for reducing storage and transmission costs. Compression works by removing redundant information from data. Techniques can be lossless, allowing exact reconstruction, or lossy, which permanently eliminates some data. Examples of lossless techniques discussed include run-length encoding and static Huffman coding.

Uploaded by

luckshay
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views

Synopsis On: Data Compression

This document discusses data compression techniques. It begins by defining data compression as representing information using the fewest number of bits while maintaining accuracy. Data compression is useful for reducing storage and transmission costs. Compression works by removing redundant information from data. Techniques can be lossless, allowing exact reconstruction, or lossy, which permanently eliminates some data. Examples of lossless techniques discussed include run-length encoding and static Huffman coding.

Uploaded by

luckshay
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

SYNOPSIS ON

DATA COMPRESSION
Submitted To:
Mr. DEVENDER Hi-Tech Insitute

Submitted By:
ANUJ BHARDWAJ

Gurgaon

Introduction to Data Compression


What is Data Compression? Why Data Compression? How is Data Compression possible? Lossless and Lossy Data Compression. Static, Adaptive, and Hybrid Compression. Compression Utilities and Formats. Run-length Encoding. Static Huffman Coding. Scope for Future Work.

What is Data Compression?


Data compression is the representation of an information

source (e.g. a data file, a speech signal, an image, or a video signal) as accurately as possible using the fewest number of bits.
Technique is to identify redundancy in data and to eliminate

it.
Compressed data can only be understood if the decoding

method is known by the receiver.

Why Data Compression?


Data storage and transmission cost money. This cost increases with the amount of data available. Make the capability as the Internet fixed bandwidth. This cost can be reduced by processing the data so that it takes less memory and less transmission time. Many files can be combined into one compressed document making sending easier, provided combined file size is not huge.

How is data compression possible?

Compression is possible because information usually contains redundancies, or information that is often repeated. Examples :include reoccurring letters, numbers or pixels. File compression programs remove this redundancy.

Lossless and Lossy Compression Techniques.


Lossless techniques enable exact reconstruction of the original document from the compressed information. I. Exploit redundancy in data II. Applied to general data III. Examples: Run-length, Huffman.

Lossy compression - reduces a file by permanently eliminating certain redundant information I. Exploit redundancy and human perception II. Applied to audio, image, and video III. Examples: JPEG and MPEG

Lossy techniques usually achieve higher compression rates than lossless ones but the latter are more accurate.

Classification of Lossless Compression Techniques.


Lossless techniques are classified into static, adaptive (or dynamic), and hybrid. 1. In a static method the mapping from the set of messages to the set of codeword's is fixed before transmission begins, so that a given message is represented by the same codeword every time it appears in the message being encoded. Static coding requires two passes: one pass to compute probabilities (or frequencies) and determine the mapping, and a second pass to encode. Examples: Static Huffman Coding

Classification of Lossless Compression Techniques.


Lossless techniques are classified into static, adaptive (or dynamic), and hybrid. 2. In an adaptive method the mapping from the set of messages to the set of codeword changes over time.

All of the adaptive methods are one-pass methods; only one scan of the message is required. Examples: Adaptive Huffman Coding
3. An algorithm may also be a hybrid, neither completely static nor completely dynamic.

Compression Utilities and Formats.


Compression tool examples:

winzip, 7zip, winrar, powerarc General compression formats: .zip, .gz Common image compression formats: JPEG, BMP, GIF, PNG, TGA,WMP Common audio (sound) compression formats: MPEG-1 Layer III (known as MP3), RealAudio (RA, RAM, RP), Common video (sound and image) compression formats: MPEG-1, MPEG-2, MPEG-4, DivX, QuickTime (MOV), RealVideo (RM), Video for Windows (AVI), Flash video (FLV)

Run-length encoding.
Run-length encoding (RLE) is a very simple form of data compression in which runs of data are stored as a single data value and count, rather than as the original run. Consider the following string: BBBBHHDDXXXXKKKKWWZZZZ can be encoded more compactly by replacing each repeated string of characters by a single instance of the repeated character and a number that represents the number of times it is repeated: 4B2H2D4X4K2W4Z Here "4B" means four B's, and 2H means two H's, and so on. Compressing a string in this way is called run-length encoding.

Run-length encoding.
As another example, consider the storage of a rectangular image. As a single color bitmapped image, it can be stored as: The rectangular image can be compressed with run-length encoding by counting identical bits as follows:
0, 40 0, 40 0,10 1,20 0,10 0,10 1,1 0,18 1,1 0,10 0,10 1,1 0,18 1,1 0,10 0,10 1,1 0,18 1,1 0,10 0,10 1,20 0,10 0,40

The rectangular image can be compressed with run-length encoding by counting identical bits as above

The first line says that the first line of the bitmap consists of 40 0's. The third line says that the third line of the bitmap consists of 10 0's followed by 20 1's followed by 10 more 0's, and so on for the other lines

Static Huffman Coding.


Static Huffman coding assigns variable length codes to symbols based on their frequency of occurrences in the given message. Low frequency symbols are encoded using many bits, and high frequency symbols are encoded using fewer bits. The message to be transmitted is first analyzed to find the relative frequencies of its constituent characters. The coding process generates a binary tree, the Huffman code tree, with branches labeled with bits (0 and 1). The Huffman tree (or the character codeword pairs) must be sent with the compressed information to enable the receiver decode the message.

Huffman Coding example.


Example: Information to be transmitted over the internet contains the following characters with their associated frequencies
Character Frequency

a 45

e 65

l 13

n 45

o 18

s 22

t 53

Use Huffman technique : Build the Huffman code tree for the message.

Use the Huffman tree to find the codeword for each character.
Verify that computed Huffman codewords satisfy the Prefix property.

Static Huffman Coding example (contd).

Static Huffman Coding example (contd).

Static Huffman Coding example (contd).

Static Huffman Coding example (contd).

Static Huffman Coding example (contd).

The sequence of zeros and ones that are the arcs in the path from the root to each leaf node are the desired codes:
character Huffman codeword a 110 e 10 l 0110 n 111 o 0111 s 010 t 00

Static Huffman Coding example (contd).


If we assume the message consists of only the characters a,e,l,n,o,s,t then the number of bits for the compressed message will be 696:

If the message is sent uncompressed with 8-bit ASCII representation for the characters, we have 261*8 = 2088 bits.

The Prefix Property


To see why the prefix property is essential, consider the codewords given below in which e is encoded with 110 which is a prefix of f
character codeword a 0 b 101 c 100 d 111 e 110 f 1100

The decoding of 11000100110 is ambiguous:

11000100110 => face 11000100110 => eaace

Encoding and decoding examples


Encode (compress) the message tenseas using the following codewords:
character Huffman codeword a 110 e 10 l 0110 n 111 o 0111 s 010 t 00

Answer: Replace each character with its codeword: 001011101010110010

Decode (decompress) each of the following encoded messages, if possible, using the Huffman codeword tree given 001011101010110010

Technology Used

Microsoft
.NET

Microsoft .NET includes a large library of coded solutions to common programming problems and a common language infrastructure(CLI) that manages the execution of programs written specifically for the framework.

Technology Used Microsoft .NET


LINQ - Language INtergrated Query pronounced as LINK A technology with capaplities to execute queries on objects. which will return an collection object. ADO.NET is a set of computer software components that programmers can use to access data and data services. It is a part of the base class library that is included with the Microsoft .NET Framework. WPF Windows Presentation Foundation Windows Presentation Foundation (or WPF) is a graphical subsystem for rendering user interfaces in Windowsbased applications.

Windows Communication Foundation (WCF) is programming model for using managed code to build unified Web services and other distributed systems that can talk to each other.

Technology Used Microsoft .NET


WF ,A workflow based program is like traditional programs that allow us to coordinate work and perform operations

The Common Language Runtime (CLR) is a core component of Microsoft's .NET initiative. It is Microsoft's implementation of the Common Language Infrastructure (CLI) standard, which defines an execution environment for program code.

Scope in Future
The acronyms and cryptic names for compressed file formats include Zip files for email and encryption. JAR files for Java programs and libraries. Next Version of TIFF files for black and white images, PNG,JPEG and GIF files for color images MP3 files for audio clips, QuickTime and MPEG for video clips.

Growth in the speed and bandwidth of the internet.

You might also like