Lossy compression algorithms selectively discard less important information to reduce file sizes. This type of compression is commonly used for multimedia like audio and video. For images, lossy compression creates approximations of originals that are close perceptually if not identical. Distortion measures quantify information lost during compression. Rate distortion theory establishes the minimum rate needed for a given distortion level, representing the tradeoff between rate and distortion. Popular lossy techniques include quantization, predictive coding, and transform coding like DCT. Video compression exploits temporal redundancy between frames using motion compensation and spatial redundancy reduction.
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
95 views
Chapter 6 Lossy Compression Algorithms
Lossy compression algorithms selectively discard less important information to reduce file sizes. This type of compression is commonly used for multimedia like audio and video. For images, lossy compression creates approximations of originals that are close perceptually if not identical. Distortion measures quantify information lost during compression. Rate distortion theory establishes the minimum rate needed for a given distortion level, representing the tradeoff between rate and distortion. Popular lossy techniques include quantization, predictive coding, and transform coding like DCT. Video compression exploits temporal redundancy between frames using motion compensation and spatial redundancy reduction.
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46
Lossy Compression Algorithms
Image, Video and Audio Compression Techniques
Chapter SIX Introduction • Lossy compression algorithms are data compression methods that selectively discard certain information in order to reduce the size of the data. • The discarded information is typically chosen to be perceptually less important to the viewer or listener. • This type of compression is commonly used in multimedia applications, where large files such as audio and video need to be compressed to be easily transmitted or stored. Cont’d… • For image compression in multimedia applications, where a higher compression ratio is required, lossy methods are usually adopted. • In lossy compression, the compressed image is usually not the same as the original image but is meant to form a close approximation to the original image perceptually. • To quantitatively describe how close the approximation is to the original data, some form of distortion measure is required. Distortion Measures • A distortion measure is a mathematical quantity that specifies how close an approximation is to its original, using some distortion criteria. • When looking at compressed data, it is natural to think of the distortion in terms of the numerical difference between the original data and the reconstructed data. • However, when the data to be compressed is an image, such a measure may not yield the intended result. Cont’d… • In lossy compression, distortion measures refer to the quantification of the difference between the original uncompressed data and the compressed version that has been reconstructed from the compressed data. • When data is compressed using lossy compression techniques, some of the information is lost, and the reconstructed version of the data is an approximation of the original. • The distortion measures quantify how much of the original information has been lost in the compression process. The rate distortion theory • The rate-distortion function is a mathematical expression that defines the minimum possible rate of compression for a given level of distortion. • The rate-distortion function can be used to determine the optimal compression rate for a given level of distortion, or the optimal level of distortion for a given compression rate. The rate distortion theory • Lossy compression always involves a tradeoff between rate and distortion. • Rate is the average number of bits required to represent each source symbol. • Within this framework, the tradeoff between rate and distortion is represented in the form of a rate-distortion function R(D). • Intuitively, for a given source and a given distortion measure, if D is a tolerable amount of distortion, R(D) specifies the lowest rate at which the source data can be encoded while keeping the distortion bounded above by D. • It is easy to see that when D = 0, we have a lossless compression of the source. Cont’d… Cont’d… • The rate-distortion function is meant to describe a fundamental limit for the performance of a coding algorithm and so can be used to evaluate the performance of different algorithms. Quantization • In practice, lossy compression algorithms use a variety of techniques to achieve the desired trade-off between compression rate and distortion. • These techniques include quantization, predictive coding, and transform coding. • Quantization involves mapping the continuous values in the input data to a finite set of discrete values, which reduces the amount of data that needs to be stored or transmitted. Cont’d… • Predictive coding involves using a model to predict the values of the input data based on past values, which can further reduce the amount of data that needs to be stored or transmitted. • Transform coding involves transforming the input data into a new representation that is more compressible, such as through a Fourier or wavelet transform. Uniform quantization: • This is the simplest form of quantization in which the input range is divided into equal intervals, and each interval is assigned a representative value. • The representative values are usually chosen to be the midpoints of the intervals. • For example, if we have a range of input values from 0 to 255, we might choose to divide this range into 256 intervals, with each interval having a width of 1. • We would then assign a representative value to each interval, such as 0, 1, 2, ..., 255. Non-uniform quantization: • This type of quantization is used when the input range is not evenly distributed. • In non-uniform quantization, the input range is divided into intervals of varying widths, and a representative value is assigned to each interval. • For example, in audio compression, the human ear is more sensitive to changes in low-frequency sounds than high-frequency sounds. • Non-uniform quantization can be used to allocate more bits to the low-frequency sounds to improve the overall quality of the compressed audio. Vector quantization: • This type of quantization is used when the input data is in the form of a vector, such as an image or audio signal. • Vector quantization involves dividing the input vector space into smaller subspaces, and assigning a representative vector to each subspace. • This can be useful in situations where there are correlations between the input data, as it allows for more efficient compression of the data. • For example, in image compression, vector quantization can be used to compress blocks of pixels that have similar color values. Transform coding • Transform coding is a popular technique used in lossy compression algorithms to reduce the size of digital data while minimizing the loss of quality. • It involves transforming the original data from the spatial domain to the frequency domain using a mathematical function known as a transform. • The transformed data is then quantized, which means that the data values are reduced to a smaller range of values, resulting in some loss of information. Cont’d… • The most commonly used transform in lossy compression is the Discrete Cosine Transform (DCT), which is widely used in image and video compression algorithms such as JPEG and MPEG. • The DCT is a mathematical function that transforms an image from the spatial domain (i.e., the pixel values of the image) to the frequency domain (i.e., the frequency components of the image). Cont’d… • It is the discrete analog of the formula for the coefficients of a Fourier series. Cont’d… How JPEG compression works? • This technique of image compression developed by the Joint Photography Experts Group so that its name is JPEG. • This compression uses a lossy compression algorithm so that some information is removed from the image when compressing. • The JPEG standard works by averaging color variation and discard the information that the human eye cannot see. Cont’d… • JPEG is compressed into either full-color or grayscale images. • In the case of color images, RGB is transformed into a luminance or chrominance color space. • JPEG compression mainly works by identifying similar areas of color inside the image and converting them to actually the same color code. • JPEG uses the DCT (Discrete Cosine Transform) method to compress for coding transformation. Steps of Compression: 1. The raw image is first converted to a different color model, which separates the color of a pixel from its brightness. 2. image is divided into a small block which is having 8×8 block, each block is called pixel. 3. Then RGB is converted into Y-Cb-Cr, JPEG uses a Y-Cb-Cr model instead of RGB. 4. After that, DCT is applied to each block of pixels and converts the image from the spatial domain to the frequency domain. The formula followed by the DCT method : Cont’d… 5.Then make the resulting image quantized, because human eyes can not see high frequency so to the make the is low quantization is applied. 6.After quantization, zigzag scan is performed on these quantized 8×8 blocks to group the low-frequency coefficients. 7.The coefficients is then encoded by Run Length and Huffman coding algorithm to get the final image. JPEG Standard Introduction to Video Compression • Video is a collection of images taken closely together in time. • Therefore, in most cases, the difference between adjacent images is not large. • Video compression techniques take advantage of the repetition of portions of the picture from one image to another by concentrating on the changes between neighboring images. • In other words, there is a lot of redundancy in video frames. There are two types of redundancy: Spatial and Temporal Redundancy Cont’d… • Spatial redundancy: pixel-to-pixel or spectral correlation within the same frame • Temporal redundancy: similarity between two or more different frames Video compression based on motion compensation • The MPEG video compression algorithm relies on two basic techniques: • • motion compensation for the reduction of the temporal redundancy and transform domain-(DCT)based compression for the reduction of spatial redundancy. • Motion-compensated techniques the techniques that exploit the temporal redundancy of video signals. • The concept of motion compensation is based on the estimation of motion between video frames, i.e. if all elements in a video scene are approximately spatially displaced, the motion between frames can be described by a limited number of motion parameters (by motion vectors for translatory motion of pixels). Cont’d… • The remaining signal (prediction error) is further compressed with spatial redundancy reduction (DCT). • The information relative to motion is based on 16 X 16 blocks and is transmitted together with the spatial information. • The motion information is compressed using variable-length codes to achieve maximum efficiency. Types of Frames Cont’d… • Because of the importance of random access for stored video and the significant bit-rate reduction afforded by motion-compensated interpolation, four types of frames are defined in MPEG: • Intraframes(I-frames), • Predicted frames(P-frames), • Interpolated frames (B-frmes) and • DC-Frames(D-frames) I-Frames • I-frames (Intra-coded frames) are coded independently with no reference to other frames. • I-frames provide random access points in the compressed video data, since the I-frames can be decoded independently without referencing to other frames. • With I-frames, an MPEG bit-stream is more editable. • Also, error propagation due to transmission errors in previous frames will be terminated by an I-frame since the I-frame does not have a reference to the previous frames. • Since I-frames use only transform coding without motion compensated predictive coding, it provides only moderate compression P-Frames • P-frames (Predictive-coded frames) are coded using the forward motion-compensated prediction from the preceding I- or P-frame. • P-frames provide more compression than the I-frames by virtue of motion-compensated prediction. • They also serve as references for B frames and future P-frames • Transmission errors in the I-frames and P-frames can propagate to the succeeding frames since the I-frames and P-frames are used to predict the succeeding frame B-Frame • B-frames (Bi-directional-coded frames) allow macroblocks to be coded using bidirectional motion-compensated prediction from both the past and future reference Iframes or P-frames. • In the B-frames, each bi-directional motion-compensated macroblock can have two motion vectors: a forward motion vector which references to a best matching block in the previous I-frames or P- frames, and a backward motion vector which references to a best matching block in the next I-frames or P-frames. Cont’d… • The motion compensated prediction can be formed by the average of the two referenced motion compensated blocks. • By averaging between the past and the future reference blocks, the effect of noise can be decreased. B-frames provide the best compression compared to I- and P-frames. • I- and P-frames are used as reference frames for predicting B-frames. • To keep the structure simple and since there is no apparent advantage to use Bframes for predicting other B-frames, the B-frames are not used as reference frames Hence, B-frames do not propagate Cont’d… D-frames • D-frames (DC-frames) are low-resolution frames obtained by decoding only the DC coefficient of the Discrete Cosine Transform coefficients of each macroblock. • They are not used in combination with I-, P-, or B-frames. • D-frames are rarely used, but are defined to allow fast searches on sequential digital storage media Zig-Zag Scan for Entropy encoding Apply Huffman Encoding MPEG Audio Compression • Psychoacoustics • The range of human hearing is about 20 Hz to about 20 kHz. • The frequency range of the voice is typically only from about 500 Hz to 4 kHz. • The dynamic range, the ratio of the maximum sound amplitude to the quietest sound that humans can hear, is on the order of about 120 dB. Fletcher-Munson Curves • Equal loudness curves that display the relationship between perceived loudness (“Phons”, in dB) for a given stimulus sound volume (“Sound Pressure Level”, also in dB), as a function of frequenc Frequency Masking • Lossy audio data compression methods, such as MPEG/Audio encoding, remove some sounds which are masked anyway. • The general situation in regard to masking is as follows: • 1. A lower tone can effectively mask (make us unable to hear) a higher tone • 2. The reverse is not true – a higher tone does not mask a lower tone well • 3. The greater the power in the masking tone, the wider is its influence – the broader the range of frequencies it can mask. • 4. As a consequence, if two tones are widely separated in frequency then little masking occur Bark Unit • Bark unit is defined as the width of one critical band, for any masking frequency. MPEG audio compression • It takes advantage of psychoacoustic models, constructing a large multi-dimensional lookup table to transmit masked frequency components using fewer bits • MPEG Audio Overview • 1. Applies a filter bank to the input to break it into its frequency components • 2. In parallel, a psychoacoustic model is applied to the data for bit allocation block • 3. The number of bits allocated are used to quantize the info from the filter bank – providing the compression MPEG Audio Layers • Layer 1 quality can be quite good provided a comparatively high bit- rate is available – Digital Audio Tape typically uses Layer 1 at around 192 kbps • Layer 2 has more complexity; was proposed for use in Digital Audio Broadcasting • Layer 3 (MP3) is most complex, and was originally aimed at audio transmission over ISDN lines • Most of the complexity increase is at the encoder, not the decoder – accounting for the popularity of MP3 player MPEG Audio Strategy • MPEG approach to compression relies on: • – Quantization • Human auditory system is not accurate within the width of a critical band (perceived loudness and audibility of a frequency) • -bank of filters • Analyze the frequency (“spectral”) components of the audio signal by calculating a frequency transform of a window of signal values • Decompose the signal into subbands by using a bank of filters (Layer 1 & 2: “quadrature-mirror”; Layer 3: adds a DCT; psychoacoustic model: Fourier transform) Cont’d… • Frequency masking: by using a psychoacoustic model to estimate the just noticeable noise level: • Encoder balances the masking behavior and the available number of bits by discarding inaudible frequencies • Scaling quantization according to the sound level that is left over, above masking levels The End!!!