Unicode provides a unique number called a code point for every character no matter the language, platform, or program. It was created in 1987 to serve as a universal character encoding standard that could represent text for all languages worldwide. Unlike ASCII which only supports 128 characters, Unicode can support over 1 million characters using up to 4 bytes per character. Common encodings are UTF-8, which uses a variable number of bytes from 1-4 depending on the character, and UTF-16. Each character in Unicode is assigned a unique integer code point starting at 0 to identify it.
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
70 views
Uni Code
Unicode provides a unique number called a code point for every character no matter the language, platform, or program. It was created in 1987 to serve as a universal character encoding standard that could represent text for all languages worldwide. Unlike ASCII which only supports 128 characters, Unicode can support over 1 million characters using up to 4 bytes per character. Common encodings are UTF-8, which uses a variable number of bytes from 1-4 depending on the character, and UTF-16. Each character in Unicode is assigned a unique integer code point starting at 0 to identify it.
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9
UNICODE
Report by: Jhon Mark C. Palen
What is Unicode or Universal Code?
o Unicode provides a unique number for every character,
no matter what the platform, no matter what the program, no matter what the language. Unicode is a universal character encoding standard. It defines the way individual character are represented in text files, web pages, and other types of documents. Unicode History Unicode was started in 1987, by Joe Becker (Xerox), Lee Collins (Apple), and Mark Davis (Apple). The idea was to create a universal character set, as there were many incompatible standards for encoding plain text at that time: numerous variations of 8-bit ASCII, Big Five (Traditional Chinese), GB 2312 (Simplified Chinese), and more. Before Unicode, no standard for multilingual plain text existed, but there were rich-text systems (such as Apples WorldScript) that allowed you to combine multiple encodings.
The first Unicode draft proposal was published in 1988.
Work continued afterward and the working group expanded. The Unicode Consortium was incorporated on January 3, 1991: Cont.
Unlike ASCII, which was designed to represent only
basic English characters, Unicode was designed to support characters from all languages around the world. The standard ASCII character set only supports 128 characters, while Unicode can support roughly 1,000,000 characters. While ASCII only uses one byte to represent each character, Unicode supports up to 4 bytes for each character. Cont. There are several different types of Unicode encodings, though UTF-8 and UTF-16 are the most common. UTF-8 has become the standard character encoding used on the web and is also the default encoding used by many software programs. While UTF-8 supports up to four bytes per character, it would be inefficient to use four bytes to represent frequently used characters. Therefore, UTF-8 uses only one byte to represent common English characters. European (Latin), Hebrew, and Arabic characters are represented with two bytes, while three bytes are used to Chinese, Japanese, Korean, and other Asian characters. Additional Unicode characters can be represented with four bytes. Unicode's Character Set and Encoding Systems
Unicode is a standard created by the Unicode Consortium
in 1991.Unicode primarily defines 2 things: 1. a character set. (which includes the characters needed for all world's languages.) 2. Several encoding systems. (most popular are UTF-8, UTF- 16) Code Point Each character in Unicode is given a unique ID. This id is a number (integer), starting at 0, and is called the char's code point. (You can think of code point as character ID. It's not called character id, because some character are not really character, such as space, return, tab, right-to-left marker, etc.) ASCII and Unicode.mp4