0% found this document useful (0 votes)
134 views

Data Storage and File Compression

space used by files, audio and bitmap and differents types of compression
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
134 views

Data Storage and File Compression

space used by files, audio and bitmap and differents types of compression
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 6
1.3 Data storage and file compression 1.3.1 Measurement of data storage A bit is the basic unit of all computing memory storage terms and is either 1 or 0. The word comes from digit. the byte is the smallest unit of memory in a computer. 1 byte is 8 bits. A 4-bit number is called a nibble - half a byte. 1 byte of memory wouldn't allow you to store very much information so memory size is measured in the multiples shown in Table 1.4: Y Table 1.4 Memory size using denary values Name of memory size Equivalent denary value ‘kilobyte (1B) 1000 bytes ‘megabyte {1 MB) 1000000 bytes ‘gigabyte (168) ‘yooo000000 bytes ‘terabyte [1TB) ‘yo00000000000 bytes ‘Ipetabyte [1PB) ‘yoooo00000000000 bytes Texatyte (1EB) 1000000000000000000 bytes The above system of numbering now only refers to some storage devices but is technically inaccurate, Tt is based on the ST (base 10) system of units where 1 kilo is equal to 1000, A.LTB hard disk drive would allow the storage of 1 = 10” bytes according to this system. However, since memory size is actually measured in terms of powers of 2, another system has been adopted by the IEC (International Electrotechnical Commission) that is based on the binary system (Table 1.5): Y Table 1.5 IEC memory size system Name of memory size Number of bytes Equivalent denary value ‘Tkibibyte (1 KiB] e 1026 bytes 1 mebibyte {1 MiB) = 104857 bytes ‘1 gibiby te [1 GiB) = 1073761824 bytes ‘tebibyte (1TIB) ae 1099511627776 bytes Apebibyte (1 PIB) = 1125899906842626 bytes ‘Vexbibyte (1 EB) zm 1152921504 606846976 bytes This system is more accurate. Internal memories (such as RAM and ROM) should be measured using the IEC system. A 64GiB RAM could, therefore, store 64 x 2” bytes of data (68719476736 bytes). 1.3.2 Calculation of file size In this section we will look at the calculation of the file size required to hold a bitmap image and a sound sample. The file size of an image is calculated as: image resolution (in pixels) x colour depth (in bits) The size of a mono sound file is calculated as: sample rate (in Hz) x sample resolution (in bits) x Length of sample (in seconds) For a stereo sound file, you would then multiply the result by two, ‘A photograph is 1024 x 1080 pixels and uses a colour depth of 32 bits. How mary photographs of this size would fit onto a memory stick of 64GiB? ‘| Multiply number of pixels in vertical and horizontal directions to find total number of pixels = (1024 x 1080) = 1105920 pixels 2 Now multiply number of pixels by colour depth then divide ty 8 to give the number of bytes= 1105920 x 32 = 35389 440/8 bytes = 4423 680 bytes 3 64 GiB = 64x 1024 x 1024 x 1024 = 68719 476736 bytes 4 Finally divide the memory stick size by the files size 15534 photos. Acamera detector has an array of 2048 by 2048 pixels and uses a colour depth of 16. Find the size of an image taken by this camera in MiB. ‘| Multiply number of pixels in vertical and horizontal directions to find total number of pixels = (2048 x 2048] = 4194 304pixels 2 Now multiply number of pixels by colour depth = 4194 304 x 16= 67 108864 bits 3. Now divide number of bits by 8 to find the number of bytes in the file = (67 108864)/8 = 83886081bytes 4 Now divide by 1024 x 1024 to convert to MiB = [8388608)/[1 048576]= 8MiB. a] 8719476 736/423 680 Example 3 An audio CD has a sample rate of 44 100 and a sample resolution of 16bits. The music being sampled uses two channels to allow for stereo recording, Calculate the file size for a 60-minute recording. 1 Size of file = sample rate (in Hz) x sample resolution (in bits) x length of sample (in seconds) Size of sample = (44 100 x 16 x (60 x 60) = 2540160000 its Multiply by 2 since there are two channels being used = 5080320000 bits Divide by 8 to find number of bytes = (5 080.320 000)/8 = 635 040000 Divide by 1024 x 1024 to convert to MiB = 635 040 000/ 1 048 576 = 605 MiB. anon y 1.16 Acamera detector has an array of 1920 by 1536 pixels. A colour depth of 16 bits is used. Calculate the size of a photograph taken by this camera, giving your answer in MiB. Photographs have heen taken by a smartphone which uses a detector with a 1024 x 1536 pixel array. The software uses a colour depth of 24 bits. How many photographs could be stored ona 16GiB memory card? Audio is being sampled at the rate of 44.1 kHz using 8 bits. Two channels are being used, Calculate: athe size of a one second sample, in bits b the size of a 30-second audio recording in MiB. The typical song stored on a music CD is 3minutes and 30seconds. Assuming each song is sampled at 44.1 kHz (44 100 samples per second] and 16bits are used per sample. Each song utilises two channels. Calculate how many typical songs could be stored on a 740MiB CD. 1.3.3 Data compression The calculations in Section 1.3.2 show that sound and image files can be very large. It is therefore necessary to reduce (or compress) the size of a file for the following reasons: » to save storage space on devices such as the hard disk drive/Solid state drive » to reduce the time taken to stream a music or video file » to reduce the time taken to upload, download or transfer a file across a network » the download/upload process uses up network bandwidth - this is the maximum rate of transfer of data across a network, measured in bits per second. This occurs whenever a file is downloaded, for example, from a server. Compressed files contain fewer bits of data than uncompressed files and therefore use less bandwidth, which results in a faster data transfer rate, » reduced file size also reduces costs, For example, when using cloud storage, the cost is based on the size of the files stored. Also an internet service provider (ISP) may charge a user based on the amount of data downloaded. 1.3.4 Lossy and lossless file compression File compression can either be lossless or lossy. Lossy file compression With this technique, the file compression algorithm eliminates unnecessary data from the file. This means the original file cannot be reconstructed once it has been compressed. Lossy file compression results in some loss of detail when compared to the original file. The algorithms used in the lossy technique have to decide which parts of the file need to be retained and which parts can be discarded. For example, when applying a lossy file compression algorithm to: » an image, it may reduce the resolution and/or the bit/colour depth » asound file, it may reduce the sampling rate and/or the resolution. Lossy files are smaller than lossless files which is of great benefit when considering storage and data transfer rate requirements. Common lossy file compression algorithms are: >> MPEG-3 (MP3) and MPEG-4 (MP4) » JPEG. MPEG-3 (MP3) and MPEG-4 (MP4) MP3 files are used for playing music on computers or mobile phones. This compression technology will reduce the size of a normal music file by about 90%. While MP3 music files can never match the sound quality found on a DVD or CD, the quality is satisfactory for most general purposes. But how can the original music file be reduced by 90% while still retaining most of the music quality? Essentially the algorithm removes sounds that the human ear can’t hear properly. For example: » removal of sounds outside the human ear range »> if two sounds are played at the same time, only the louder one can be heard by the ear, so the softer sound is eliminated. This is called perceptual music shaping. MP4 files are slightly different to MP3 files. This format allows the storage of multimedia files rather than just sound — music, videos, photos and animation can all be stored in the MP4 format. As with MP3, this is a lossy file compression format, but it still retains an acceptable quality of sound and video. Movies, for example, could be streamed over the internet using the MP4 format without losing any real discernible quality. JPEG When a camera takes a photograph, it produces a raw bitmap file which can be very large in size. These files are temporary in nature. JPEG is a lossy file compression algorithm used for bitmap images. As with MP3, once the image is subjected to the JPEG compression algorithm, a new file is formed and the original file can no longer be constructed. The JPEG file reduction process is based on two key concepts: »» human eyes don’t detect differences in colour shades quite as well as they detect differences in image brightness (the eye is less sensitive to colour variations than it is to variations in brightness) »» by separating pixel colour from brightness, images can be split into 8 x 8 pixel blocks, for example, which then allows certain ‘information’ to be discarded from the image without causing any real noticeable deterioration in quality. Lossless file compression With this technique, all the data from the original uncompressed file can be reconstructed. This is particularly important for files where any loss of data would be disastrous (e.g. when transferring a large and complex spreadsheet or when downloading a large computer application). Lossless file compression is designed so that none of the original detail from the file is lost. Run-length encoding (RLE) can be used for lossless compression of a number of different file formats: » it is a form of lossless/reversible file compression » it reduces the size of a string of adjacent, identical data (e.g. repeated colours in an image) » a repeating string is encoded into two values: = the first value represents the number of identical data items (e.g. characters) in the run = the second value represents the code of the data item (such as ASCII code if it is a keyboard character) » RLE is only effective where there is a long run of repeated units/bits. Using RLE on text data Consider the following text string: ‘aaaaabbbbccddddd’. Assuming each character requires 1 byte then this string needs 16 bytes. If we assume ASCII code is being used, then the string can be coded as follows: alal@laiJAl[*®|>[/>,/5|<«[c«]a@]aljalala 0597 0498 299 05 100 This means we have five characters with ASCII code 97, four characters with ASCII code 98, two characters with ASCII code 99 and five characters with ASCII code 100. Assuming each number in the second row requires 1 byte of memory, the RLE code will need 8 bytes. This is half the original file size. One issue occurs with a string such as ‘cdedcdedcd’ where RLE compression isn’t very effective. To cope with this, we use a flag. A flag preceding data indicates that what follows are the number of repeating units (for example, 255 05 97 where 255 is the flag and the other two numbers indicate that there are five items with ASCII code 97). When a flag is not used, the next byte(s) are taken with their face value and a run of 1 (for example, 01 99 means one character with ASCII code 99 follows). Consider this example: String | aaaaaaaa| bbbbbbbbbb | ¢ a |c | |c | d | eeeeeeee Code | 0897 1098 | 0199 | 01 100| 0199/01 100/ 0199|01100| 08 101 The original string contains 32 characters and would occupy 32 bytes of storage. The coded version contains 18 values and would require 18 bytes of storage. Introducing a flag (255 in this case) produces: 255 08 97 || 255 10 98 || 99 100 99 100 99 100 || 255 08 101 This has 15 values and would, therefore, require 15 bytes of storage. This is a reduction in file size of about 53% when compared to the original string. Using RLE with images ? ‘Example 1: Black and white image Figure 1.12 shows the letter ‘F” ina grid where each square requires 1 byte of storage. Awhite square has avalue 1 ainda black square a value of 0: In compressed RLE format this becomes: ‘9W 68 2W 1B 7W 1B 7W 5B 3W 18 7W 1B 7W 1B 6W. Using W= 1 and B=0 we get: 91.60.21 1071 1071 5031 1071 1071 10 61 A. Figure 1.12 Using RLE with a black and white image The 8 x 8 grid would need 64 bytes; the compressed RLE format has 30 values, and therefore needs only 30bytes to store the image. 2 ‘Example 2: Coloured images Figure 1.13 shows an object in four colours. Each colour is made up of red, green and blue (RGB) according to the code on the right. Square ‘Green colour | Red components Blue wie ° ° 255255255 Milo 2 o 255 0 o & Figure 1.13 Using RLE with a coloured image This produces the following data: 2.00 0 4 0 255030 006 255 255 2551000 2.0 255 0 4 25500 40 255 0 1 255 255 255 2 255 00 1255 255 255 4.0 2550 4 2550.0 40 255 0 4 255 255 255 2.0 255010002 255 255 255 2 25500 2 255 255 255 300040 25502000. The original image (8 x 8 square) would need 3 bytes per square (to include all three RGB values). Therefore, the uncompressed file for this image is 8x 8 x 3 = 192 bytes. The RLE code has 92 values, which means the compressed file will be 92 bytes in size. This gives a file reduction of about 52%. It should be noted that the file reductions in reality will not be as large as this due to other data which needs to be stored with the compressed file (e.g. a file header).

You might also like