0% found this document useful (0 votes)
50 views

CSC 101 2019 2020 Module 3

The document discusses different methods of representing data on computers including text, numbers, images, and audio. It describes how ASCII and Unicode encode text data and how binary representation is used to encode numeric values including integers and non-integers through signed-magnitude and floating-point representation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

CSC 101 2019 2020 Module 3

The document discusses different methods of representing data on computers including text, numbers, images, and audio. It describes how ASCII and Unicode encode text data and how binary representation is used to encode numeric values including integers and non-integers through signed-magnitude and floating-point representation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

DATA REPRESENTATION

1.0 DATA REPRESENTATION ON THE COMPUTER SYSTEM


The data we need to store and manage on a computer must be represented in a way that
captures the essence of the information, and it must do so in a form convenient for
computer processing. We represent and store the various kinds of information a
computer manages.
Data representation is concerned with how information can be encoded as bit patterns.
These include the popular method for encoding text, numerical data, images, and sound.
In the not-so-distant past, computers dealt almost exclusively with numeric and textual
data, but now computers are truly multimedia devices, dealing with a vast array of
information categories. Computers store, present, and help us modify many different
types of data, including:
 Text
 Number
 Images and graphics
 Audio
1.1 Representing text
Information in form of text is normally represented by means of a code in which
each of the different symbols in the text (such as the letters of the alphabet and
punctuation marks) is assigned a unique bit pattern. The text is then represented as a long
string of bits in which the successive patterns represent the successive symbols in the
original text. A character set is simply a list of characters and the codes used to represent
each one.
The American national standard institute (ANSI) adopted the American Standard Code
for Information Interchange (ASCII). This code uses bit patterns of length seven to
represent the upper and lower case letters of the English alphabet, punctuation symbols,
the digits 0 through 9, and certain control information such as line feed, carriage return,
and tab. ASCII is often extended to an eight-bit-per symbols format by adding a 0 at the
most significant end of each of the seven-bit patterns. ASCII set allows for 256 characters
and includes accented letters as well as several additional special symbols. This technique
provides 128 additional bit patterns that can represent symbols excluded in the original
ASCII. It also produces a code in which each pattern fits conveniently into a typical byte-
size memory cell.
Other methods of representing information include Unicode, which was developed
through the cooperation of several of the leading manufacturers of computer hardware.
This pattern uses 16-bit to represent a character which makes 65,536 different bit patterns
enough to allow text written in such languages as Chinese, Japanese, and Hebrew to be
represented.
Standard for a code that could compete with the Unicode have been developed by the
international organization for standardization (ISO) using patterns of 32 bits, this
encoding system has the potential of representing billons of symbols. A file consisting of
a long sequence of symbols encoded using ASCII or Unicode is often called text file.
Note that the simple text files that are manipulated by utility programs called text editors
(or editors) contains only character-by-character encoding of the text, where as a file
produced by a word processor contains numerous proprietary codes representing changes
in fonts, alignment information and other formatting, they better referred to as binary file.
The table shows the characters (ASCII) and hexadecimal (HEX) representation of
the 7-bit binary codes for each ASCII character. On the table are alphabetic, numeric, and
punctuation characters. The remaining characters are classed into two groups—
formatting and data-linking or control characters. Formatting characters are
responsible for how text appears on the page and includes line feed (LNFD), carriage
return (CRET), horizontal and vertical tabs (HTAB, VTAB), form feed (FMFD), etc.
Data-linking or control characters are those used by protocols to establish and maintain a
data link. They include characters for indicating the beginning of a transmission (STX—
start of text), ending a transmission (EOT—end of transmission), acknowledge (ACK),
delimiter (DLE), device control (DC1–DC4), etc.
TABLE 1-1
ASCII Character Code
Binary codes are shown in their hexadecimal (HEX) equivalent

HEX ASCII HEX ASCII HEX ASCII HEX ASCII


00 NULL 20 Space 40 @ 60 `
01 SOH 21 ! 41 A 61 a
02 STX 22 “ 42 B 62 b
03 ETX 23 # 43 C 63 c
04 EOT 24 $ 44 D 64 d
05 ENQ 25 % 45 E 65 e
06 ACK 26 & 46 F 66 f
07 BELL 27 , 47 G 67 g
08 BKSP 28 ( 48 H 68 h
09 HTAB 29 ) 49 I 69 i
0A LNFD 2A * 4A J 6A j
0B VTAB 2B + 4B K 6B k
0C FMFD 2C ‘ 4C L 6C l
0D CRET 2D - 4D M 6D m
0E SHOUT 2E . 4E N 6E n
0F SHIN 2F / 4F O 6F o
10 DLE 30 0 50 P 70 p
11 DC1 31 1 51 Q 71 q
12 DC2 32 2 52 R 72 r
13 DC3 33 3 53 S 73 s
14 DC4 34 4 54 T 74 t
15 NACK 35 5 55 U 75 u
16 SYNC 36 6 56 V 76 v
17 ETB 37 7 57 W 77 w
18 CAN 38 8 58 X 78 x
19 ENDM 39 9 59 Y 79 y
1A SUB 3A : 5A Z 7A z
1B ESC 3B ; 5B [ 7B {
1C FLSP 3C < 5C \ 7C :
1D GPSP 3D = 5D ] 7D }
1E RDSP 3E > 5E ^ 7E ~
1F UNSP 3F ? 5F _ 7F DEL
1.2 Representing numeric values

Numeric values are the most prevalent type of data used in a computer system.
Since binary is a number system, there is a natural relationship between the numeric
information and the binary values that we store to represent them. Integers are the
beginning in terms of numeric data. Negative and noninteger values can be represented.

1.2.1 Representing Negative Values


Negative numbers are numbers with a minus sign in front. They are regarded as Signed-
Magnitude Representation. In the traditional decimal system, a sign (+ or -) is placed
before a number value, though the positive sign is often assumed. The sign represents the
ordering, and the digits represent the magnitude of the number. The classic number line
looks something like this, in which a negative sign meant that the number was to the left
of zero and the positive number was to the right of zero. Performing addition and
subtraction with signed integer numbers can be described as moving a certain number of
units in one direction or another.
To add two numbers you find the first number on the scale and move in the direction of
the sign of the second as many units as specified. Subtraction was done in a similar way,
moving along the number line as dictated by the sign and the operation.
Storing purely numeric information in terms of encoded characters is inefficient as it
wastes storage and only limited amount of data can be represented. However using binary
notation to encode numeric data can store a wide range of data. Binary notation is a way
of representing numeric values using only the digits 0 and 1 rather than the digits 0 and 9
as in the traditional decimal system.

1.2.2 Representing Non-integer values


In computing, non-integer values are referred to as real values. A real number is a value
with a potential fractional part. That is, real numbers have a whole part and a fractional
part, either of which may be zero. For example, some real numbers in base 10 are 104.32,
0.999999, 357.0, and 3.14159. To the left of the decimal point, in base 10, we have the
1s position, the 10s position, the 100s position, etc. These position values come from
raising the base value to increasing powers (moving from the decimal point to the left).
The positions to the right of the decimal point work the same way, except the powers are
negative. So the positions to the right of the decimal point are the tenths position (10`1 or
one tenth), the hundredths position (10`2 or one hundredth) etc. We store the value as an
integer and include information showing where the radix point is. That is, any real value
can be described by three properties: the sign (positive or negative one), the mantissa,
which is made up of the digits in the value, and the exponent, which determines how the
radix point is shifted relative to the mantissa. A real value in base 10 can therefore be
defined by the following formula:
sign * mantissa * 10exp
The representation is called floating point because the number of digits is fixed but the
radix point floats. When a value is in floating-point form, a positive exponent shifts the
decimal point to the right, and a negative exponent shifts the decimal point to the left.
We can easily convert a real number expressed in our usual decimal notation into floating
point. For example, consider the number 148.69. The sign is positive, and there are two
digits to the right of the decimal point. Thus, the exponent is `2, giving us 14869 * 10`2.
We can convert a value in floating-point form back into decimal notation. The exponent
on the base tells us how many positions to move the radix point. If the exponent is
negative, we move the radix point to the left. If the exponent is positive, we move the
radix point to the right.

1.3 Representing images


A popular means of representing an image is to interpret the image as a collection
of dots, each of which is called a pixel, short for “picture element”. The appearance of
each pixel is then encoded and the entire image is represented as a collection of these
encoded pixels. Such a collection is called a bitmap. This approach is used by many
display devices, such as printers and computer monitors. Moreover images in bitmap
form are easily formatted for display.
The method of encoding the pixel in a bitmap varies among applications. In the case of a
simple black and white image, each pixel can be represented by a single bit whose value
depends on whether the corresponding pixel is black or white, as used by most facsimile
machines. For more elaborate black and white photographs, each pixel can be represented
by a collection of bits(usually eight), which allows a variety of shades of grayness to be
represented.
Color is our perception of the various frequencies of light that reach the retinas of our
eyes. Our retinas have three types of color photoreceptor cone cells that respond to
different sets of frequencies. These photoreceptor categories correspond to the colors of
red, green, and blue. All other colors perceptible by the human eye can be made by
combining various amounts of these three colors. Therefore, color is often expressed in a
computer as an RGB (red-green-blue) value, which is actually three numbers that indicate
the relative contribution of each of these three primary colors. If each number in the triple
is given on a scale of 0 to 255, then 0 means no contribution of that color, and 255 means
full contribution of that color. For example, an RGB value of (255, 255, 0) maximizes the
contribution of red and green, and minimizes the contribution of blue, which results in a
bright yellow.
The concept of RGB values gives rise to a three-dimensional color space. The amount of
data that is used to represent a color is called the color depth. It is usually expressed in
terms of the number of bits that are used to represent its color.
- HiColor is a term that indicates a 16-bit color depth. Five bits are used for each
number in an RGB value and the extra bit is sometimes used to represent
transparency.
- TrueColor indicates a 24-bit color depth. Therefore, each number in an RGB
value gets eight bits, which gives the range of 0 to 255 for each. This results in the
ability to represent over 16.7 million unique colors.
For colour images, each pixel is encoded by more complex system. Common approaches
include RGB encoding, in which each pixel is represented as three primary colours
component- a red component, a green component, and a blue component- corresponding
to the three primary colours of light. One byte is normally used to represent the intensity
of each colour component. In turn three bytes of storage are required to represent a single
pixel in the original image.
Alternative encoding scheme uses a “brightness” component (called the pixel
luminance) and two colours components. The pixel luminance is essentially the sum of
the red, the green, and the blue components. The other two components called the blue
chrominance and the red chrominance.
A bitmap file is one of the most straightforward graphic representations. In addition to a
few administrative details, a bitmap file contains the pixel color values of the image from
left to right and top to bottom. A bitmap file supports 24-bit TrueColor, though usually
the color depth can be specified to reduce the file size.
The GIF format (Graphics Interchange Format), developed by CompuServe in 1987, uses
indexed color exclusively to reduce file size, which limits the number of available colors
to 256. If even fewer colors are required, the color depth can usually be specified to fewer
bits. GIF files are best used for graphics and images with few colors, and are therefore
considered optimal for line art.
The JPEG format is designed to exploit the nature of our eyes. Humans are more
sensitive to gradual changes of brightness and color over distance than we are to rapid
changes. Therefore, the data that the JPEG format stores average out the color type over
short distances. The JPEG format is considered superior for photographic color images.
A disadvantage of representing image as bitmaps is that image cannot be rescaled easily
to any arbitrary size. The only way to enlarge the image is to make the pixel bigger,
which leads to grainy appearance.
Another way of representing images that avoids the scaling problems is to describe
the image as a collection of geometric structure such as lines and curves that can be
encoded using technique of analytic geometry. Such a description allows the device that
ultimately displays the image to decide how the geometry structure should be displayed
rather than insisting that the device reproduces a particular pixel pattern. This is the
approach used to produce the scalable fonts that are available via today’s word
processing systems. The geometric means of representing images is also popular in
Computer Aided Design (CAD) systems in which drawings of three-dimensional object
are displayed and manipulated on computer screens.

1.4 Representing sound


We perceive sound when a series of air compressions vibrate a membrane in our ear,
which sends signals to our brain. Thus a sound is defined in nature by the wave of air that
interacts with our eardrum. To represent a sound, we must somehow represent the
appropriate sound wave. A stereo sends an electrical signal to a speaker to produce
sound. This signal is an analog representation of the sound wave. The voltage in the
signal varies in direct proportion to the sound wave. The speaker receives the signal and
causes a membrane to vibrate, which in turn vibrates the air (creating a sound wave),
which in turn vibrates the eardrum. The created sound wave is hopefully identical to the
one that was captured initially, or at least good enough to please the listener.
To represent audio information on a computer, we must digitize the sound wave,
somehow breaking it into discrete, manageable pieces. One way to accomplish this is to
actually digitize the analog representation of the sound. That is, take the electric signal
that represents the sound wave and represent it as a series of discrete numeric values. An
analog signal varies in voltage continuously. To digitize the signal we periodically
measure the voltage of the signal and record the appropriate numeric value. This process
is called sampling. Instead of a continuous signal, we end up with a series of numbers
representing distinct voltage levels.
To reproduce the sound, the stored voltage values are used to create a new continuous
electronic signal. In general, a sampling rate of around 40,000 times per second is
enough to create a reasonable sound reproduction. The most generic method of encoding
audio information for computer storage and manipulation is to sample the amplitude of
the sound wave at regular intervals and then record the series of values obtained. For
example, the series 0, 1.5, 2.0, 2.5, 1.5, 2.0, 3.0, 4.0, 5.0, 3.0, 2.0, 0, would represent a
sound wave that rises in amplitude, falls briefly, rises to a higher level, and then drops
back to 0 as shown below:

Y-Values
6

4
Y-Values
3

0
0 2 4 6 8 10 12

To obtain the quality sound reproduction produced by today’s musical CD’s a sample
rate of 44,100 samples per second is used. The data obtained from each sample are
represented in bits. Consequently each second of music recorded in stereo requires more
than a million bits.
An alternative encoding system known as Musical Instrument Digital Interface (MIDI) is
widely used in the music syntheses found in electronic keyboards, for video game sound,
and for sound effects accompanying websites. By encoding directions for producing
music on a synthesizer rather than encoding the sound itself, MIDI avoids the large
storage requirements of the sampling technique. MIDI can be thought of as a way of
encoding the sheet music read by a performance rather than the performance itself, and in
turn, a MIDI recoding can sound significantly different when performed on different
synthesizers. Currently, the dominant format for compressing audio data is MP3.

You might also like