0% found this document useful (0 votes)

4 views

Character Sets and Encoding

The document provides an overview of character sets and encoding, detailing ASCII, EBCDIC, Unicode, and BCD. It explains the definitions, advantages, limitations, and examples of each encoding system, highlighting Unicode as the modern standard that supports all languages and symbols. Additionally, it compares the features of these encoding systems, emphasizing their use cases in text processing and numerical data representation.

Uploaded by

mukungurutsepearson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Character Sets and Encoding

Uploaded by

mukungurutsepearson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Character Sets and Encoding

Introduction to Character Sets

A character set is a collection of characters that a computer can recognize, store, and
manipulate. Each character is represented by a unique numeric code known as a character
encoding. These sets are essential for digital communication, allowing computers to store
and transmit text in different languages.

Common character sets include:

 ASCII (American Standard Code for Information Interchange)

 EBCDIC (Extended Binary Coded Decimal Interchange Code)
 UNICODE (Universal Character Set)
 BCD (Binary-Coded Decimal)

Each encoding system assigns a unique numeric value (code point) to every character.

1. ASCII (American Standard Code for Information

Interchange)

Definition

ASCII is a character encoding standard used in computers and communication devices. It

was developed in the 1960s by the American National Standards Institute (ANSI) and is
based on English characters.
Types of ASCII

1. Standard ASCII (7-bit)

o Uses 7 bits per character, allowing for 128 characters (0–127).
o Includes:
 Uppercase and lowercase English letters (A-Z, a-z)
 Digits (0-9)
 Punctuation marks (.,!?)
 Control characters (e.g., \n for newline, \t for tab)
2. Extended ASCII (8-bit)
o Uses 8 bits per character, allowing for 256 characters (0–255).
o Includes additional symbols, accented characters, and graphical symbols.

Examples

Decimal Binary Hex Character Description

65 1000001 41 A Uppercase A
97 1100001 61 a Lowercase a
48 0110000 30 0 Digit 0
32 0010000 20 (Space) Space character
13 0001101 0D CR Carriage Return

Advantages of ASCII

 Simple and widely used in programming.

 Requires only 7 or 8 bits per character, saving storage.
 Compatible with most modern and legacy systems.

Limitations of ASCII

 Supports only English and a limited set of symbols.

 Cannot represent characters from other languages like Chinese, Arabic, or Hindi.
2. EBCDIC (Extended Binary Coded Decimal Interchange
Code)

Definition

EBCDIC is an 8-bit character encoding used mainly in IBM mainframes and legacy systems.
It was developed by IBM in the 1960s as an alternative to ASCII.

Structure

 Uses 8 bits per character, allowing for 256 characters.

 Unlike ASCII, characters are not arranged sequentially in a logical order.
 Divided into groups based on control codes, printable characters, and special
symbols.

Examples

Decimal Binary Hex Character Description

193 11000001 C1 A Uppercase A
129 10000001 81 a Lowercase a
240 11110000 F0 0 Digit 0
64 01000000 40 (Space) Space character

Advantages of EBCDIC

 Efficient for IBM mainframes and punched card systems.

 Backward compatible with older IBM machines.

Limitations of EBCDIC

 Not widely used outside IBM systems.

 Not compatible with ASCII, requiring conversion for communication with ASCII-based
systems.
3. UNICODE (Universal Character Set - UCS)

Definition

UNICODE is a universal character encoding system designed to support all writing systems,
including symbols, emojis, and mathematical characters. It was developed by the Unicode
Consortium in 1991.

Unicode Encoding Formats

1. UTF-8 (8-bit)
o Uses 1 to 4 bytes per character.
o Backward compatible with ASCII.
o Most commonly used encoding on the web.
2. UTF-16 (16-bit)
o Uses 2 or 4 bytes per character.
o Efficient for languages like Chinese and Japanese.
3. UTF-32 (32-bit)
o Uses 4 bytes per character.
o Simple but consumes more storage.

Unicode Character Sets

Unicode assigns a unique code point to each character. The notation used is U+xxxx, where
xxxx is a hexadecimal number.

Examples

Unicode Characte Description

r
U+0041 A Uppercase A
U+0061 a Lowercase a
U+0030 0 Digit 0
U+20AC € Euro symbol
U+1F600 😀 Smiley emoji
Advantages of Unicode

 Supports all writing systems (e.g., Arabic, Chinese, Devanagari).

 Compatible with modern operating systems and applications.
 Allows storage and exchange of multilingual text.

Limitations of Unicode

 Uses more storage space compared to ASCII.

 Not all systems fully support all Unicode characters.

4. BCD (Binary-Coded Decimal)

Definition

BCD is a numeric encoding system that represents decimal digits (0-9) using a 4-bit binary
code. Unlike ASCII and Unicode, which encode characters, BCD is used primarily for
numerical data representation in computing and digital electronics.

How BCD Works

 Each decimal digit (0-9) is represented by a 4-bit binary equivalent.

 The remaining 6 combinations (1010 to 1111) are not used.

BCD Encoding Table

Decima BCD (Binary) Hex

l
0 0000 0
1 0001 1
2 0010 2
3 0011 3
4 0100 4
5 0101 5
6 0110 6
7 0111 7
8 1000 8
9 1001 9

Example of BCD Representation

For the decimal number 275, the BCD equivalent is:

2 → 0010
7 → 0111
5 → 0101
BCD = 0010 0111 0101

Advantages of BCD

 Simple conversion between decimal and binary.

 Used in financial and digital clock applications.

Limitations of BCD

 Requires more storage space compared to pure binary.

 Arithmetic operations are more complex than binary encoding.
Comparison of ASCII, EBCDIC, Unicode, and BCD

Feature ASCII (7- ASCII (8-bit) EBCDIC (8- Unicode BCD

bit) bit)
Bit Size 7-bit 8-bit 8-bit Variable (8, 16, 4-bit per
32-bit) digit
Characters 128 256 256 143,000+ 10 (0-9
only)
Language English Some IBM Systems All languages Only
Support only European numerical
languages
Compatibility Most Extended IBM Modern Digital
systems character set mainframes applications circuits
Storage Space Small Medium Medium Large (varies Small
by encoding)
Use Case Text Extended IBM Multilingual Numerical
processing symbols mainframes support data

Conclusion

 ASCII is simple and efficient for English text.

 EBCDIC is mostly obsolete, used only in IBM mainframes.
 Unicode is the modern standard, supporting all languages and symbols.
 BCD is used for numeric encoding in electronics and finance.

For text encoding, Unicode (UTF-8) is the most widely used. For numerical representation,
BCD is useful in digital systems.

Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
From Everand
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
Sherwyn Allibang
5/5 (2)
Make A CNC Hot Wire Foam Cutter - CNC
No ratings yet
Make A CNC Hot Wire Foam Cutter - CNC
17 pages
Chapter 6 - Computer Encoding System
No ratings yet
Chapter 6 - Computer Encoding System
46 pages
CHARACTER ENCODING: How Do Computers Deal With Multiple Language?
No ratings yet
CHARACTER ENCODING: How Do Computers Deal With Multiple Language?
26 pages
Lecture-02-write
No ratings yet
Lecture-02-write
9 pages
Document (2) (1) - 1
No ratings yet
Document (2) (1) - 1
5 pages
3 - BCD, Alphanumeric Codes
No ratings yet
3 - BCD, Alphanumeric Codes
31 pages
Data Representation Theory_Notes
No ratings yet
Data Representation Theory_Notes
5 pages
Computer Codes
No ratings yet
Computer Codes
28 pages
SS3 Note 2nd Term
No ratings yet
SS3 Note 2nd Term
10 pages
US1MACSC01
No ratings yet
US1MACSC01
30 pages
16th August
No ratings yet
16th August
3 pages
Computer Codes
No ratings yet
Computer Codes
58 pages
G 10 ICT WorkSheetc 3.2 English M PDF
No ratings yet
G 10 ICT WorkSheetc 3.2 English M PDF
3 pages
digital
No ratings yet
digital
68 pages
Chapter 4
No ratings yet
Chapter 4
25 pages
OBJECTIVE 9
No ratings yet
OBJECTIVE 9
4 pages
Chapter2 2.6
No ratings yet
Chapter2 2.6
10 pages
ASCII Code
No ratings yet
ASCII Code
3 pages
Chapter 4 Computer Codes
No ratings yet
Chapter 4 Computer Codes
30 pages
BCD Ebcdic & Ascii
50% (2)
BCD Ebcdic & Ascii
10 pages
Computer Codes
No ratings yet
Computer Codes
22 pages
Encoding Scheme
No ratings yet
Encoding Scheme
11 pages
Short Notes On ASCII
100% (1)
Short Notes On ASCII
16 pages
Machine Level Representation of Data Character Representation
No ratings yet
Machine Level Representation of Data Character Representation
14 pages
Lecture 2.3 Information Coding Scheme
0% (1)
Lecture 2.3 Information Coding Scheme
10 pages
Strings - ASCII, UTF8, UTF32, ISCII (Indian Script Code), Unicode-2 PDF
No ratings yet
Strings - ASCII, UTF8, UTF32, ISCII (Indian Script Code), Unicode-2 PDF
30 pages
Lecture1000 3
No ratings yet
Lecture1000 3
71 pages
Lecture 05_Repesentation (1)
No ratings yet
Lecture 05_Repesentation (1)
27 pages
CS 1111-01 Unit 2
No ratings yet
CS 1111-01 Unit 2
3 pages
Alphanumeric Code Lecture-11
No ratings yet
Alphanumeric Code Lecture-11
17 pages
Character Sets, Encodings, and Unicode
No ratings yet
Character Sets, Encodings, and Unicode
26 pages
قالب های داده ها
No ratings yet
قالب های داده ها
54 pages
Lecture - ASCII and Unicode
No ratings yet
Lecture - ASCII and Unicode
38 pages
Computer code
No ratings yet
Computer code
12 pages
Coding System
No ratings yet
Coding System
8 pages
Computer Codes
No ratings yet
Computer Codes
24 pages
Character Encoding for Sanskrit and Other Languages
No ratings yet
Character Encoding for Sanskrit and Other Languages
8 pages
Chapter 1 Part 3 Continuation
No ratings yet
Chapter 1 Part 3 Continuation
2 pages
Representation of Text
No ratings yet
Representation of Text
5 pages
Ascii and Ebcdic Codes
No ratings yet
Ascii and Ebcdic Codes
19 pages
Assignment#2
0% (1)
Assignment#2
8 pages
CODES_2
No ratings yet
CODES_2
13 pages
15_Representation_of_nonnumeric_data__character_codes__31_01_2024.pdf
No ratings yet
15_Representation_of_nonnumeric_data__character_codes__31_01_2024.pdf
13 pages
Data Representation in Computers PDF
No ratings yet
Data Representation in Computers PDF
33 pages
Encoding Schemes
No ratings yet
Encoding Schemes
23 pages
EBCDIC Encoding Essentials
No ratings yet
EBCDIC Encoding Essentials
1 page
Introduction To Unicode: History of Character Codes
No ratings yet
Introduction To Unicode: History of Character Codes
4 pages
Logic Gate - Unicode
No ratings yet
Logic Gate - Unicode
12 pages
BCD, Ascii, Unicode
No ratings yet
BCD, Ascii, Unicode
12 pages
CIT 111 Theory 05
No ratings yet
CIT 111 Theory 05
23 pages
Character Sets KS4 Presentation
No ratings yet
Character Sets KS4 Presentation
16 pages
UNIT-11
No ratings yet
UNIT-11
52 pages
2-Data Formats
No ratings yet
2-Data Formats
20 pages
ASSIGNMENT Digital Electronics (1)
No ratings yet
ASSIGNMENT Digital Electronics (1)
8 pages
Text Encoding
No ratings yet
Text Encoding
8 pages
Basic Digital System Structure: - CPU: - Data Path: - Control Unit: - Storage
No ratings yet
Basic Digital System Structure: - CPU: - Data Path: - Control Unit: - Storage
22 pages
CM105 24to25
No ratings yet
CM105 24to25
24 pages
Codes
No ratings yet
Codes
31 pages
Page 49 & 50
No ratings yet
Page 49 & 50
11 pages
Dictionary of Computing
From Everand
Dictionary of Computing
Handz Valentin, Sr
No ratings yet
Summary of Chapter 10: Music 270 - Music Theory 1 Dr. Karen Sunabacka
No ratings yet
Summary of Chapter 10: Music 270 - Music Theory 1 Dr. Karen Sunabacka
2 pages
Biophysics
No ratings yet
Biophysics
2 pages
Welcome TO Saint Nicholas Academy of Vintar, Inc
No ratings yet
Welcome TO Saint Nicholas Academy of Vintar, Inc
16 pages
Manu
No ratings yet
Manu
26 pages
DME Syllabus
No ratings yet
DME Syllabus
2 pages
ASTM For OGFC PDF
100% (1)
ASTM For OGFC PDF
7 pages
Rajant SpecSheet-KMA 2400 5-041223
No ratings yet
Rajant SpecSheet-KMA 2400 5-041223
3 pages
PP DS Flow
No ratings yet
PP DS Flow
104 pages
Mullen - 1983
No ratings yet
Mullen - 1983
21 pages
Digitronik Digital Indicating Controller SDC10 User's Manual
No ratings yet
Digitronik Digital Indicating Controller SDC10 User's Manual
38 pages
PRO3 / ECO3 / TCO3 SMS Commands List
No ratings yet
PRO3 / ECO3 / TCO3 SMS Commands List
8 pages
RT 225 - Prelim - Act 1
No ratings yet
RT 225 - Prelim - Act 1
3 pages
Excel Formulas
No ratings yet
Excel Formulas
16 pages
Hypersonic Tunnel
No ratings yet
Hypersonic Tunnel
3 pages
OA460 307封面
100% (1)
OA460 307封面
3 pages
Astrophysics
No ratings yet
Astrophysics
23 pages
Regression Analysis: Mathematical Methods of Cognitive Science
100% (1)
Regression Analysis: Mathematical Methods of Cognitive Science
12 pages
F3-BIO-1st Term Test-2016-2017 (Answers)
No ratings yet
F3-BIO-1st Term Test-2016-2017 (Answers)
2 pages
Steam Generators (Boilers) 220308 155154
No ratings yet
Steam Generators (Boilers) 220308 155154
30 pages
Heat Exchanger
100% (1)
Heat Exchanger
16 pages
Design and Analysis of Quad-Acting Reciprocating Pump: A Novel Approach
No ratings yet
Design and Analysis of Quad-Acting Reciprocating Pump: A Novel Approach
7 pages
32-CIMPLICITY Communications - 8 - 2
No ratings yet
32-CIMPLICITY Communications - 8 - 2
16 pages
M3 - FDS
No ratings yet
M3 - FDS
38 pages
APPLIED ANATOMY AND PHYSIOLOGY IN EDENTULOUS PATIENTS (Autosaved) 1
No ratings yet
APPLIED ANATOMY AND PHYSIOLOGY IN EDENTULOUS PATIENTS (Autosaved) 1
56 pages
Yale
No ratings yet
Yale
3 pages
Oss Report
No ratings yet
Oss Report
5 pages
74HC4538
No ratings yet
74HC4538
16 pages
Decision Tree PDF
No ratings yet
Decision Tree PDF
10 pages
Hon-Kumoi-Joshi: Analysis Modes
No ratings yet
Hon-Kumoi-Joshi: Analysis Modes
1 page

Character Sets and Encoding

Uploaded by

Character Sets and Encoding

Uploaded by

Character Sets and Encoding

Introduction to Character Sets

Common character sets include:

 ASCII (American Standard Code for Information Interchange)

1. ASCII (American Standard Code for Information

ASCII is a character encoding standard used in computers and communication devices. It

1. Standard ASCII (7-bit)

Decimal Binary Hex Character Description

 Simple and widely used in programming.

 Supports only English and a limited set of symbols.

 Uses 8 bits per character, allowing for 256 characters.

Decimal Binary Hex Character Description

 Efficient for IBM mainframes and punched card systems.

 Not widely used outside IBM systems.

Unicode Encoding Formats

Unicode Character Sets

Unicode Characte Description

 Supports all writing systems (e.g., Arabic, Chinese, Devanagari).

 Uses more storage space compared to ASCII.

4. BCD (Binary-Coded Decimal)

How BCD Works

 Each decimal digit (0-9) is represented by a 4-bit binary equivalent.

BCD Encoding Table

Decima BCD (Binary) Hex

Example of BCD Representation

For the decimal number 275, the BCD equivalent is:

 Simple conversion between decimal and binary.

 Requires more storage space compared to pure binary.

Feature ASCII (7- ASCII (8-bit) EBCDIC (8- Unicode BCD

 ASCII is simple and efficient for English text.

You might also like