Character Sets and Encoding
Character Sets and Encoding
A character set is a collection of characters that a computer can recognize, store, and
manipulate. Each character is represented by a unique numeric code known as a character
encoding. These sets are essential for digital communication, allowing computers to store
and transmit text in different languages.
Each encoding system assigns a unique numeric value (code point) to every character.
Definition
Examples
Advantages of ASCII
Limitations of ASCII
Definition
EBCDIC is an 8-bit character encoding used mainly in IBM mainframes and legacy systems.
It was developed by IBM in the 1960s as an alternative to ASCII.
Structure
Examples
Advantages of EBCDIC
Limitations of EBCDIC
Definition
UNICODE is a universal character encoding system designed to support all writing systems,
including symbols, emojis, and mathematical characters. It was developed by the Unicode
Consortium in 1991.
1. UTF-8 (8-bit)
o Uses 1 to 4 bytes per character.
o Backward compatible with ASCII.
o Most commonly used encoding on the web.
2. UTF-16 (16-bit)
o Uses 2 or 4 bytes per character.
o Efficient for languages like Chinese and Japanese.
3. UTF-32 (32-bit)
o Uses 4 bytes per character.
o Simple but consumes more storage.
Unicode assigns a unique code point to each character. The notation used is U+xxxx, where
xxxx is a hexadecimal number.
Examples
Limitations of Unicode
Definition
BCD is a numeric encoding system that represents decimal digits (0-9) using a 4-bit binary
code. Unlike ASCII and Unicode, which encode characters, BCD is used primarily for
numerical data representation in computing and digital electronics.
2 → 0010
7 → 0111
5 → 0101
BCD = 0010 0111 0101
Advantages of BCD
Limitations of BCD
Conclusion
For text encoding, Unicode (UTF-8) is the most widely used. For numerical representation,
BCD is useful in digital systems.