0% found this document useful (0 votes)
70 views

Uni Code

Unicode provides a unique number called a code point for every character no matter the language, platform, or program. It was created in 1987 to serve as a universal character encoding standard that could represent text for all languages worldwide. Unlike ASCII which only supports 128 characters, Unicode can support over 1 million characters using up to 4 bytes per character. Common encodings are UTF-8, which uses a variable number of bytes from 1-4 depending on the character, and UTF-16. Each character in Unicode is assigned a unique integer code point starting at 0 to identify it.

Uploaded by

jhon mark
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views

Uni Code

Unicode provides a unique number called a code point for every character no matter the language, platform, or program. It was created in 1987 to serve as a universal character encoding standard that could represent text for all languages worldwide. Unlike ASCII which only supports 128 characters, Unicode can support over 1 million characters using up to 4 bytes per character. Common encodings are UTF-8, which uses a variable number of bytes from 1-4 depending on the character, and UTF-16. Each character in Unicode is assigned a unique integer code point starting at 0 to identify it.

Uploaded by

jhon mark
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

UNICODE

Report by: Jhon Mark C. Palen


What is Unicode or Universal Code?

o Unicode provides a unique number for every character,


no matter what the platform,
no matter what the program,
no matter what the language.
Unicode is a universal character encoding standard. It
defines the way individual character are represented in text
files, web pages, and other types of documents.
Unicode History
Unicode was started in 1987, by Joe Becker (Xerox), Lee
Collins (Apple), and Mark Davis (Apple). The idea was to
create a universal character set, as there were many
incompatible standards for encoding plain text at that time:
numerous variations of 8-bit ASCII, Big Five (Traditional
Chinese), GB 2312 (Simplified Chinese), and more. Before
Unicode, no standard for multilingual plain text existed, but
there were rich-text systems (such as Apples WorldScript) that
allowed you to combine multiple encodings.

The first Unicode draft proposal was published in 1988.


Work continued afterward and the working group expanded.
The Unicode Consortium was incorporated on January 3, 1991:
Cont.

Unlike ASCII, which was designed to represent only


basic English characters, Unicode was designed to support
characters from all languages around the world. The standard
ASCII character set only supports 128 characters, while
Unicode can support roughly 1,000,000 characters. While
ASCII only uses one byte to represent each character,
Unicode supports up to 4 bytes for each character.
Cont.
There are several different types of Unicode
encodings, though UTF-8 and UTF-16 are the most common.
UTF-8 has become the standard character encoding used on
the web and is also the default encoding used by many
software programs. While UTF-8 supports up to four bytes per
character, it would be inefficient to use four bytes to represent
frequently used characters. Therefore, UTF-8 uses only one
byte to represent common English characters. European
(Latin), Hebrew, and Arabic characters are represented with
two bytes, while three bytes are used to Chinese, Japanese,
Korean, and other Asian characters. Additional Unicode
characters can be represented with four bytes.
Unicode's Character Set and Encoding
Systems

Unicode is a standard created by the Unicode Consortium


in 1991.Unicode primarily defines 2 things:
1. a character set. (which includes the characters needed for
all world's languages.)
2. Several encoding systems. (most popular are UTF-8, UTF-
16)
Code Point
Each character in Unicode is given a unique ID. This id is a number
(integer), starting at 0, and is called the char's code point.
(You can think of code point as character ID. It's not called
character id, because some character are not really character,
such as space, return, tab, right-to-left marker, etc.)
ASCII and Unicode.mp4

You might also like