0% found this document useful (0 votes)
4 views

Character Sets and Encoding

The document provides an overview of character sets and encoding, detailing ASCII, EBCDIC, Unicode, and BCD. It explains the definitions, advantages, limitations, and examples of each encoding system, highlighting Unicode as the modern standard that supports all languages and symbols. Additionally, it compares the features of these encoding systems, emphasizing their use cases in text processing and numerical data representation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Character Sets and Encoding

The document provides an overview of character sets and encoding, detailing ASCII, EBCDIC, Unicode, and BCD. It explains the definitions, advantages, limitations, and examples of each encoding system, highlighting Unicode as the modern standard that supports all languages and symbols. Additionally, it compares the features of these encoding systems, emphasizing their use cases in text processing and numerical data representation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Character Sets and Encoding

Introduction to Character Sets

A character set is a collection of characters that a computer can recognize, store, and
manipulate. Each character is represented by a unique numeric code known as a character
encoding. These sets are essential for digital communication, allowing computers to store
and transmit text in different languages.

Common character sets include:

 ASCII (American Standard Code for Information Interchange)


 EBCDIC (Extended Binary Coded Decimal Interchange Code)
 UNICODE (Universal Character Set)
 BCD (Binary-Coded Decimal)

Each encoding system assigns a unique numeric value (code point) to every character.

1. ASCII (American Standard Code for Information


Interchange)

Definition

ASCII is a character encoding standard used in computers and communication devices. It


was developed in the 1960s by the American National Standards Institute (ANSI) and is
based on English characters.
Types of ASCII

1. Standard ASCII (7-bit)


o Uses 7 bits per character, allowing for 128 characters (0–127).
o Includes:
 Uppercase and lowercase English letters (A-Z, a-z)
 Digits (0-9)
 Punctuation marks (.,!?)
 Control characters (e.g., \n for newline, \t for tab)
2. Extended ASCII (8-bit)
o Uses 8 bits per character, allowing for 256 characters (0–255).
o Includes additional symbols, accented characters, and graphical symbols.

Examples

Decimal Binary Hex Character Description


65 1000001 41 A Uppercase A
97 1100001 61 a Lowercase a
48 0110000 30 0 Digit 0
32 0010000 20 (Space) Space character
13 0001101 0D CR Carriage Return

Advantages of ASCII

 Simple and widely used in programming.


 Requires only 7 or 8 bits per character, saving storage.
 Compatible with most modern and legacy systems.

Limitations of ASCII

 Supports only English and a limited set of symbols.


 Cannot represent characters from other languages like Chinese, Arabic, or Hindi.
2. EBCDIC (Extended Binary Coded Decimal Interchange
Code)

Definition

EBCDIC is an 8-bit character encoding used mainly in IBM mainframes and legacy systems.
It was developed by IBM in the 1960s as an alternative to ASCII.

Structure

 Uses 8 bits per character, allowing for 256 characters.


 Unlike ASCII, characters are not arranged sequentially in a logical order.
 Divided into groups based on control codes, printable characters, and special
symbols.

Examples

Decimal Binary Hex Character Description


193 11000001 C1 A Uppercase A
129 10000001 81 a Lowercase a
240 11110000 F0 0 Digit 0
64 01000000 40 (Space) Space character

Advantages of EBCDIC

 Efficient for IBM mainframes and punched card systems.


 Backward compatible with older IBM machines.

Limitations of EBCDIC

 Not widely used outside IBM systems.


 Not compatible with ASCII, requiring conversion for communication with ASCII-based
systems.
3. UNICODE (Universal Character Set - UCS)

Definition

UNICODE is a universal character encoding system designed to support all writing systems,
including symbols, emojis, and mathematical characters. It was developed by the Unicode
Consortium in 1991.

Unicode Encoding Formats

1. UTF-8 (8-bit)
o Uses 1 to 4 bytes per character.
o Backward compatible with ASCII.
o Most commonly used encoding on the web.
2. UTF-16 (16-bit)
o Uses 2 or 4 bytes per character.
o Efficient for languages like Chinese and Japanese.
3. UTF-32 (32-bit)
o Uses 4 bytes per character.
o Simple but consumes more storage.

Unicode Character Sets

Unicode assigns a unique code point to each character. The notation used is U+xxxx, where
xxxx is a hexadecimal number.

Examples

Unicode Characte Description


r
U+0041 A Uppercase A
U+0061 a Lowercase a
U+0030 0 Digit 0
U+20AC € Euro symbol
U+1F600 😀 Smiley emoji
Advantages of Unicode

 Supports all writing systems (e.g., Arabic, Chinese, Devanagari).


 Compatible with modern operating systems and applications.
 Allows storage and exchange of multilingual text.

Limitations of Unicode

 Uses more storage space compared to ASCII.


 Not all systems fully support all Unicode characters.

4. BCD (Binary-Coded Decimal)

Definition

BCD is a numeric encoding system that represents decimal digits (0-9) using a 4-bit binary
code. Unlike ASCII and Unicode, which encode characters, BCD is used primarily for
numerical data representation in computing and digital electronics.

How BCD Works

 Each decimal digit (0-9) is represented by a 4-bit binary equivalent.


 The remaining 6 combinations (1010 to 1111) are not used.

BCD Encoding Table

Decima BCD (Binary) Hex


l
0 0000 0
1 0001 1
2 0010 2
3 0011 3
4 0100 4
5 0101 5
6 0110 6
7 0111 7
8 1000 8
9 1001 9

Example of BCD Representation

For the decimal number 275, the BCD equivalent is:

2 → 0010
7 → 0111
5 → 0101
BCD = 0010 0111 0101

Advantages of BCD

 Simple conversion between decimal and binary.


 Used in financial and digital clock applications.

Limitations of BCD

 Requires more storage space compared to pure binary.


 Arithmetic operations are more complex than binary encoding.
Comparison of ASCII, EBCDIC, Unicode, and BCD

Feature ASCII (7- ASCII (8-bit) EBCDIC (8- Unicode BCD


bit) bit)
Bit Size 7-bit 8-bit 8-bit Variable (8, 16, 4-bit per
32-bit) digit
Characters 128 256 256 143,000+ 10 (0-9
only)
Language English Some IBM Systems All languages Only
Support only European numerical
languages
Compatibility Most Extended IBM Modern Digital
systems character set mainframes applications circuits
Storage Space Small Medium Medium Large (varies Small
by encoding)
Use Case Text Extended IBM Multilingual Numerical
processing symbols mainframes support data

Conclusion

 ASCII is simple and efficient for English text.


 EBCDIC is mostly obsolete, used only in IBM mainframes.
 Unicode is the modern standard, supporting all languages and symbols.
 BCD is used for numeric encoding in electronics and finance.

For text encoding, Unicode (UTF-8) is the most widely used. For numerical representation,
BCD is useful in digital systems.

You might also like