COM 416 Multimedia
COM 416 Multimedia
Introduction
What is Multimedia?
The term multimedia has different meaning to different people. While a PC vendor
may view the term in terms of a PC that has sound capability, a DVD-ROM drive and
probably a better multimedia-enabled microprocessor, a consumer entertainment
vendor may take it to mean an interactive cable TV with hundreds of digital channels.
To a computer science student multimedia would be looked at from the perspective of
what it consists of: applications that use multiple modalities to their advantage,
including text, graphics, animation, video, sound and most likely some level of
interactivity.
It is safe to say, multimedia is computer information represented through audio,
graphics, images, video and animation in addition to text.
Components of Multimedia
The multiple modalities of text, audio, images, drawings, animation and video are the
components that, either alone or in combination of two or more, give rise to
multimedia like video conferencing, telemedicine, cooperative work environment etc.
Multimedia and Hypermedia
While our traditional media (like books) is a linear medium, that is, it is read from
beginning to end, a hypertext system is meant to be read in a nonlinear fashion.
Hypermedia is not constrained to be text-based. It can include other media like
graphics, images and continuous media sound and video. The World Wide Web
(WWW) is a good example of a hypermedia application.
We can therefore say hypermedia is one particular application of multimedia.
World Wide Web
The World Wide Web is the largest and most commonly used hypermedia application.
Due to the amount of information available from web servers, the capacity to post such
information and the ease of navigation through such information with a web browser
the popularity of WWW is epic.
HyperText Transfer Protocol (HTTP)
HTTP is a protocol for the transmission of hypermedia which supports any file type. It
is a stateless request/response protocol where a client opens a connection to the
HTTP server, request information, the server responds and the connection is
terminated.
The basic format is:
Method URI version
Additional Headers
Message body
The Uniform Resource Identifier (URI) identifies the resource accessed, such as the
host name, always preceded by the token http://. The URI could be a URL.
Two popular methods are used in HTTP. They are the GET and POST methods. GET
specifies that the information requested is in the request string itself, while POST
method specifies that the resource pointed to in the URI should consider the message
body. POST is generally used for submitting HTML forms.
GET method has the following format:
GET URI HTTP/version
An example of a GET method is GET http://www.kanopoly.edu.ng/ HTTP/1.1
The basic response format is:
Version Status-Code Status-Phrase
Additional Headers
Message body
Status-Code is a number that identifies the response type (or error that occurs), and
Status-Phrase is a textual description of it. Two commonly seen status codes and
phrases are 200 OK when the request was processed successfully and 404 Not
Found when the URI does not exist.
HyperText Markup Language (HTML)
HTML is the language that is used to publish hypermedia on the World Wide Web. It
is defined used SGML (Standard Generalized Markup Language), and derives
elements that describe generic document structure and formatting. Since it uses
ASCII, it is portable to all different computer hardware which allows for global
exchange of information.
HTML uses tags to describe document elements. The tags are in the format <token
params> to define the start point of a document element and </token> to define the
end of the element. Some elements have only inline parameters and dont require
ending tags.
HTML divides the document into HEAD and BODY as follows:
<HTML>
<HEAD>
..
</HEAD>
<BODY>
.
</BODY>
</HTML>
The HEAD describes document definitions, which are parsed before any document
rendering is done. These include page title, resource links and meta-information the
author decides to specify. The BODY part describes the document structure and
content. Common structure elements are paragraphs, tables, forms, links, item lists
and buttons.
A simple HTML page is given follows:
<!DOCTYPE HTML>
<HTML>
<HEAD>
<TITLE>
Web Page Demo
</TITLE>
</HEAD>
<BODY>
<P> This is the body of the page. Everything goes here
</P>
</BODY>
</HTML>
The current HTML standard is HTML Version 5.
Graphics and Image Data Representations
Graphics/Image Data Types
The number of file formats used in multimedia keeps on increasing. We shall be more
interested in GIF and JPG image file formats since these formats are distinguished by
the fact that most web browsers can decompress and display them.
1-Bit Images
Images consist of pixels or pels picture elements in digital images. A 1-bit image
consist of on and off only and thus it is the simplest type of image. Each pixel is store
as a single bit (0 or 1). Hence, such an image is also referred to as a binary image.
It is also called a 1-bit monochrome image, since it contains no colour. Monochrome 1-
bit images can be satisfactory for pictures containing only simple graphics and text.
8-Bit Gray-Level Images
Every pixel in an 8-bit image has a gray value between 0 and 255. Each pixel is
represented by a byte for example, a dark pixel might have a value of 10 and a bright
one might be 230.
The whole image can be viewed as a two-dimensional array of pixel values. Such an
array is referred to as a bitmap a representation of the graphics/image data that
parallels the manner in which it is stored in video memory.
Image resolution refers to the number of pixels in a digital image (higher resolutions
yield better quality).
24-Bit Colour Images
In a 24-bit colour image, each pixel is represented by three bytes, usually representing
RGB. Since each value is in the range of 0 255, this format supports 256 X 256 X
256, or a total of 16,777,216 possible combined colours
8-Bit Colour Images
Many systems can make use of only 8 bits of colour information (the so-called 256
colours) in producing a screen image.
Popular File Formats
GIF
Graphics Interchange Format (GIF) was devised for transmitting graphical images
over phone lines via modem. The GIF standard uses the Lempel-Ziv-Welch algorithm,
modified slightly for image scanline packets to use the line grouping of pixels
effectively.
The GIF standard is limited to 8-bit colour images only. While this produces
acceptable colour, it is best suited for images with few distinctive colours.
The GIF image format has some interesting features. It allows for successive display of
pixels in widely spaced rows by a four-pass display process known as interlacing. It
also supports simple animation via a Graphic Control Extension block in the data.
JPEG
JPEG is the most important current standard for image compression. It was created by
a working group of the International Organization for Standardization (ISO) popularly
known as the Joint Photographic Experts Group.
The human vision system has some limitations, which JPEG takes advantage of to
achieve high rates of compression. The eye-brain system cannot see extremely fine
detail. This limitation is even more conspicuous for colour vision than for grayscale
(black and white).
PNG
With the popularity of the Internet there are efforts toward more system-independent
image formats. Portable Network Graphics (PNG) is one of such formats. It is meant to
supersede GIF and extend it in more important ways.
Special features of PNG files include support for up to 48 bits of colour information.
Files may also contain gamma-correction information.
Fundamental Concepts in Video
Type of Video Signals
Video signal can be organized in three different ways: Component video, Composite
video and S-video.
Component Video
Component video makes use of three separate video signals for the red, green and blue
image planes. This kind of system has three wires (and connectors) connecting the
camera or other devices to a TV or monitor. Most computer systems use component
video, with separate signals for R, G and B signals.
For any colour separation scheme, component video gives the best colour
reproduction, since there is no crosstalk between the three different channels, unlike
composite video or S-video. Component video, however, requires more bandwidth and
good synchronization of the three components.
Composite Video
In composite video, colour (chrominance) and intensity (luminance) signals are
mixed into a single carrier wave. Chrominance is a composite of two colour
components. This type of signal is used by broadcast colour TVs; it is downward
compatible with black-and-white TV.
When connecting to TVs or VCRs, composite video uses only one wire (and hence one
connector), and video colour signals are mixed, not sent separately. The audio signal is
another addition to this signal. Since colour information is mixed and both colour and
intensity are wrapped into the same signal, some interference between the luminance
and chrominance signals is inevitable.
S-Video
As a compromise, S-video (separated video or super-video) uses two wires: one for
luminance and another for a composite chrominance signal. As a result there is less
crosstalk between the colour information and the crucial gray-scale information.
The reason for placing luminance into its own part of the signal is that black-and-
white information is crucial for visual perception.
Analog Video
Most TV signals are still sent and received as analog signals. An analog signal samples
a time-varying image. So-called progressive scanning traces through a complete
picture (frame) row-wise for each time interval. A high resolution computer monitor
typically uses a time interval of 1/72 second.
In TV and in some monitors and multimedia standards, another system, interlaced
scanning, is used. Here, odd-numbered lines are traced first, then the even-numbered
lines. This results in odd and even fields two fields make up one frame.
NTSC Video
The NTSC TV standard is mostly used in North America and Japan. It uses a familiar
4:3 aspect ratio (i.e. the ratio of picture width to height) and 525 scan lines per frame
at 30 frames per second.
NTSC follows the interlaced scanning system and each frame is divided into two fields,
with 262.5 lines/field.
PAL Video
PAL (Phase Alternating Line) is a TV standard originally invented by German
scientists. It uses 625 lines per frame, at 25 frames per second, with a 4:3 aspect ratio
and interlaced fields. Its broadcast TV signals are also used in composite video.
SECAM Video
SECAM stands for Systeme Electronique Couleur Avec Memoire. SECAM uses 625
scan lines per frame, at 25 frames per second, with a 4:3 aspect ratio and interlaced
fields.
SECAM and PAL are similar, differing slightly in their coding scheme.
Digital Video
The advantages of digital representation for video are many. It permits
o Storing video on digital devices or in memory, ready to be processed (noise
removal, cut and paste, and so on) and integrated into various multimedia
applications.
o Direct access, which makes nonlinear video editing simple.
o Repeated recording without degradation of image quality.
o Ease of encryption and better tolerance to channel noise.
High Definition TV (HDTV)
The main thrust of High Definition TV (HDTV) is not to increase the definition in
each unit area, but rather to increase the visual field, especially its width.
The salient difference between conventional TV and HDTV is that the latter has a
much wider aspect ratio of 16:9 instead of 4:3. Another feature of HDTV is its move
toward progressive scan. The rationale is that interlacing introduces serrated edges to
moving objects and flickers along horizontal edges.
Basics of Digital Audio
Digitization of sound
What is Sound?
Sound is a wave phenomenon that involves molecules of air being compressed and
expanded under the action of some physical device. For example, a speaker in an audio
system vibrates back and forth and produces a longitudinal pressure wave that we
perceive as sound.
Without air there is no sound. Since sound is a pressure wave, it takes on continuous
values, as opposed to digitized ones with a finite range. Nevertheless, if we wish to use
a digital version of sound waves, we must form digitized representations of audio
information.
Digitization
Values of sound wave change over time in amplitude: the pressure increases or
decreases with time. The amplitude value is a continuous quantity. Since we are
interested in working with such data in a computer storage, we must digitize the
analog signals produced by microphones. Digitization means conversion to a stream
of numbers preferably integers for efficiency.
To fully digitize an analog signal we have to sample it both in time and amplitude.
Sampling means measuring the quantity we are interested in, usually at evenly spaced
intervals. The first kind of sampling is simply called sampling, and the rate at which it
is performed is called sampling frequency.
For audio, typical sampling rates are from 8 kHz to 48 kHz. The human ear can hear
from about 20 Hz to as much as 20 kHz. The human voice can reach approximately
4 kHz.
Sampling in the amplitude or voltage dimension is called quantization.
To decide how to digitize audio data, we need to answer the following questions:
1. What is the sampling rate?
2. How finely is the data to be quantized, and is the quantization uniform?
3. How is audio data formatted?
Nyquist Theorem
The Nyquist Theorem states that: if a signal is band-limited that is, if it has a lower
limit f1 and an upper limit f2 of frequency components in the signal then we need a
sampling rate of at least 2(f1 f2).
Signal-to-Noise Ratio (SNR)
In any analog system, random fluctuations produce noise added to the signal, and the
measure voltage is thus incorrect. The ratio of the power of the correct signal to the
noise is called the signal-to-noise ratio (SNR). Therefore, the SNR is a measure of the
quality of the signal.
The SNR is usually measure in decibels (dB). The SNR value, in units of dB, is defined
in terms of base-10 logarithms of squared voltages:
= 10 log = 20 log
Audio Filtering
Prior to sampling, the audio signal is also usually filtered to remove unwanted
frequencies. The frequencies depend on the application. For speech, typically from
50 Hz to 10 kHz is retained. Other frequencies are blocked by a band-pass filter, also
called a band-limiting filter, which screens out lower and higher frequencies.
An audio music signal will typically contain from about 20 Hz up to 20 kHz. So the
band-pass filter for music will screen out frequencies outside this range.
Multimedia Data Compression
Lossless Compression Algorithms
With the emergence of multimedia technologies the quantum of data generated is
enormous. As a result, there is need for a technique that will reduce the number of bits
involved in storing and/or transmitting this data. The process is referred to as
compression.
Below is a general data compression scheme:
We call the output of the encoder codes or codewords. The intermediate medium could
either be data storage or a communication/computer network. If the compression and
decompression processes induce no information loss, the compression scheme is
lossless; otherwise, it is lossy.
If the total number of bits required to represent the data before compression is B0 and
the total number of bits required to represent the data after compression is B1, then we
define the compression ratio as
#
!"
#
Basics Information Theory
According to Claude E. Shannon, the entropy of an information source with alphabet
S = {s1, s2, ..., sn} is defined as:
1
$ % log
&
' % log
&
The encoding steps of the Shannon-Fano algorithm can be presented in the following
top-down manner:
1. Sort the symbols according to the frequency count of their occurrences.
2. Recursively divide the symbols into two parts, each with approximately the same
number of counts, until all parts contain only one symbol.
A natural way of implementing the above procedure is to build a binary tree. As a
convention, let's assign bit 0 to its left branches and 1 to the right branches.