0% found this document useful (0 votes)
11 views

1.3 Compression

1. File compression techniques can be either lossless or lossy. Lossless compression allows reconstruction of the original file, while lossy compression discards some data, compromising quality. 2. Common lossy formats are MP3 for audio and JPEG for images. They reduce file sizes significantly by removing imperceptible components like frequencies outside human hearing. 3. Lossless run-length encoding encodes repeated data as a count-value pair, reducing file sizes when there are long runs. It is reversible and preserves all information.

Uploaded by

Ahmed Irfan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

1.3 Compression

1. File compression techniques can be either lossless or lossy. Lossless compression allows reconstruction of the original file, while lossy compression discards some data, compromising quality. 2. Common lossy formats are MP3 for audio and JPEG for images. They reduce file sizes significantly by removing imperceptible components like frequencies outside human hearing. 3. Lossless run-length encoding encodes repeated data as a count-value pair, reducing file sizes when there are long runs. It is reversible and preserves all information.

Uploaded by

Ahmed Irfan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

1

1.3
compression

Zain Merchant
2

Zain Merchant
3

Zain Merchant
4
It is often necessary to reduce the le size of a le to either save storage space or
to reduce the time taken to stream or transmit data from one device to another.
The two most common forms of le compression are lossless le compression and
lossy le compression.

Lossless file compression


With this technique, all the data from the original le can be reconstructed when
the le is uncompressed again. This is particularly important for les where loss of
any data would be disastrous (such as a spreadsheet le of important results).

Lossy file compression


With this technique, the le compression algorithm eliminates unnecessary data
(as with MP3 and JPEG formats, for example).
Lossless le compression is designed to lose none of the original detail from the
le (such as Run-Length Encoding (RLE) which is covered later in this chapter).
Lossy le compression usually results in some loss of detail when compared to the
original; it is usually impossible to reconstruct the original le. The algorithms used
in the lossy technique have to decide which parts of the le are important (and
need to be kept) and which parts can be discarded.
We will now consider le compression techniques applied to multimedia les.

File compression applications


MPEG-3 (MP3) and MPEG-4 (MP4)
MPEG-3 (MP3) uses technology known as audio compression to convert
music and other sounds into an MP3 le format. Essentially, this compression
technology will reduce the size of a normal music le by about 90%. For example,
an 80 MB music le on a CD can be reduced to 8 MB using MP3 technology.
MP3 les are used in MP3 players, computers or mobile phones. Music les can be
downloaded or streamed from the internet in a compressed format, or CD les can
be converted to MP3 format. While streamed or MP3 music quality can never
match the ‘full’ version found on a CD, the quality is satisfactory for most purposes.
But how can the original music le be reduced by 90% while still retaining most of
the music quality? This is done using le compression algorithms that use
perceptual music shaping.
Perceptual music shaping removes certain sounds. For example
• frequencies that are outside the human hearing range
• if two sounds are played at the same time, only the louder one can be heard by
the ear, so the softer sound is eliminated.
This means that certain parts of the music can be removed without affecting the
quality too much. MP3 les use what is known as a lossy format, since part of the
original le is lost following the compression algorithm. This means that the
original le cannot be put back together again. However, even the quality of MP3
les can be different, since it depends on the bit rate – this refers
to the number of bits per second used when creating the le. Bit rates are between
80 and 320 kilobits per second; usually 200 kilobits or higher gives a sound quality
close to a normal CD.
Zain Merchant
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
5
MPEG-4 (MP4) les are slightly different to MP3 les. This format allows the storage
of multimedia les rather than just sound. Music, videos, photos and animation can
all be stored in the MP4 format. Videos, for example, could
be streamed over the internet using the MP4 format without losing any real
discernible quality.

Photographic (bit-map) images


When a photographic le is compressed, both the le size and quality of image are
reduced. A common le format for images is JPEG, which uses lossy le
compression. Once the image is subjected to the JPEG compression algorithm, a
new le is formed and the original le can no longer be constructed. A JPEG will
reduce the raw bit-map image by a factor of between 5 and 15, depending on the
quality of the original.
Vector graphics can also undergo some form of le compression. Scalable vector
graphics (.svg) are de ned in XML text les which, therefore, allows them to be
compressed.

Run-length encoding (RLE)


Run-length encoding (RLE) can be used to compress a number of different le
formats.
It is a form of lossless/reversible le compression that reduces the size of a string
of adjacent, identical data (such as repeated colours in an image).
A repeating string is encoded into two values.
The rst value represents the number of identical data items (such as characters) in
the run. The second value represents the code of the data item (such as ASCII code
if it is a keyboard character).
RLE is only effective where there is a long run of repeated units/bits.

Using RLE on text data


Consider the text string ‘aaaaabbbbccddddd’.
Assuming each character requires 1 byte, then this string needs 16 bytes. If we
assume ASCII code is being used, then the string can be coded as follows:

This means we have ve characters with ASCII code 97, four characters with ASCII
code 98, two characters with ASCII code 99, and ve characters with ASCII code
100. Assuming each number in the second row requires 1 byte of memory, the RLE
code will need 8 bytes. This is half the original le size.
One issue occurs with a string such as ‘cdcdcdcdcd’, where compression is not very
effective. To cope with this we use a ag. A ag preceding data indicates that what
follows are the number of repeating units (for example, 255 05 97 where 255 is the
ag and the other two numbers indicate that there are ve items with ASCII code
97). When a ag is not used, the next byte(s) are taken with their face value and a
run of 1 (for example, 01 99 means one character with ASCII code 99 follows).
Consider this example:

Zain Merchant
fl
fi
fi
fl
fi
fi
fi
fi
fi
fi
fi
fi
fl
fi
fl
fi
fi
fi
fi
fi
fi
fi
fi
6

The original string contains 32 characters and would occupy 32 bytes of storage.
The coded version contains 18 values and would require 18 bytes of storage.
Introducing a ag (255 in this case) produces:
255 08 97 255 10 98 99 100 99 100 99 100 255 08 101
This has 15 values and would, therefore, require 15 bytes of storage. This is a
reduction in le size of about 53%.

Using RLE with images


Black and white images
Figure below shows the letter F in a grid where each square requires 1 byte of
storage. A white square has a value 1 and a black square a value of 0.

The 8 × 8 grid would need 64 bytes; the compressed RLE format has 30 values,
and therefore needs only 30 bytes to store the image.
Coloured images
Figure 1.8 shows an object in four colours. Each colour is made up of red, green
and blue (RGB) according to the code on the right.

This produces the following data:

The original image (8 × 8 square) would need 3 bytes per square (to include all
three RGB values). Therefore, the uncompressed le for this image is 8 × 8 × 3 =
192 bytes.
Zain Merchant
fi
fl
fi
7
The RLE code has 92 values, which means the compressed le will be 92 bytes in
size. This gives a le reduction of about 52%. It should be noted that the le
reductions in reality will not be as large as this due to other data which needs to be
stored with the compressed le (such as a le header).

General methods of
compressing files
All the above le compression techniques are excellent for very speci c types of
le. However, it is also worth considering some general methods to reduce the size
of a le without the need to use lossy or lossless le compression:

Zain Merchant
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi

You might also like