Synopsis On: Data Compression
Synopsis On: Data Compression
DATA COMPRESSION
Submitted To:
Mr. DEVENDER Hi-Tech Insitute
Submitted By:
ANUJ BHARDWAJ
Gurgaon
source (e.g. a data file, a speech signal, an image, or a video signal) as accurately as possible using the fewest number of bits.
Technique is to identify redundancy in data and to eliminate
it.
Compressed data can only be understood if the decoding
Compression is possible because information usually contains redundancies, or information that is often repeated. Examples :include reoccurring letters, numbers or pixels. File compression programs remove this redundancy.
Lossy compression - reduces a file by permanently eliminating certain redundant information I. Exploit redundancy and human perception II. Applied to audio, image, and video III. Examples: JPEG and MPEG
Lossy techniques usually achieve higher compression rates than lossless ones but the latter are more accurate.
All of the adaptive methods are one-pass methods; only one scan of the message is required. Examples: Adaptive Huffman Coding
3. An algorithm may also be a hybrid, neither completely static nor completely dynamic.
winzip, 7zip, winrar, powerarc General compression formats: .zip, .gz Common image compression formats: JPEG, BMP, GIF, PNG, TGA,WMP Common audio (sound) compression formats: MPEG-1 Layer III (known as MP3), RealAudio (RA, RAM, RP), Common video (sound and image) compression formats: MPEG-1, MPEG-2, MPEG-4, DivX, QuickTime (MOV), RealVideo (RM), Video for Windows (AVI), Flash video (FLV)
Run-length encoding.
Run-length encoding (RLE) is a very simple form of data compression in which runs of data are stored as a single data value and count, rather than as the original run. Consider the following string: BBBBHHDDXXXXKKKKWWZZZZ can be encoded more compactly by replacing each repeated string of characters by a single instance of the repeated character and a number that represents the number of times it is repeated: 4B2H2D4X4K2W4Z Here "4B" means four B's, and 2H means two H's, and so on. Compressing a string in this way is called run-length encoding.
Run-length encoding.
As another example, consider the storage of a rectangular image. As a single color bitmapped image, it can be stored as: The rectangular image can be compressed with run-length encoding by counting identical bits as follows:
0, 40 0, 40 0,10 1,20 0,10 0,10 1,1 0,18 1,1 0,10 0,10 1,1 0,18 1,1 0,10 0,10 1,1 0,18 1,1 0,10 0,10 1,20 0,10 0,40
The rectangular image can be compressed with run-length encoding by counting identical bits as above
The first line says that the first line of the bitmap consists of 40 0's. The third line says that the third line of the bitmap consists of 10 0's followed by 20 1's followed by 10 more 0's, and so on for the other lines
a 45
e 65
l 13
n 45
o 18
s 22
t 53
Use Huffman technique : Build the Huffman code tree for the message.
Use the Huffman tree to find the codeword for each character.
Verify that computed Huffman codewords satisfy the Prefix property.
The sequence of zeros and ones that are the arcs in the path from the root to each leaf node are the desired codes:
character Huffman codeword a 110 e 10 l 0110 n 111 o 0111 s 010 t 00
If the message is sent uncompressed with 8-bit ASCII representation for the characters, we have 261*8 = 2088 bits.
Decode (decompress) each of the following encoded messages, if possible, using the Huffman codeword tree given 001011101010110010
Technology Used
Microsoft
.NET
Microsoft .NET includes a large library of coded solutions to common programming problems and a common language infrastructure(CLI) that manages the execution of programs written specifically for the framework.
Windows Communication Foundation (WCF) is programming model for using managed code to build unified Web services and other distributed systems that can talk to each other.
The Common Language Runtime (CLR) is a core component of Microsoft's .NET initiative. It is Microsoft's implementation of the Common Language Infrastructure (CLI) standard, which defines an execution environment for program code.
Scope in Future
The acronyms and cryptic names for compressed file formats include Zip files for email and encryption. JAR files for Java programs and libraries. Next Version of TIFF files for black and white images, PNG,JPEG and GIF files for color images MP3 files for audio clips, QuickTime and MPEG for video clips.