Uni Code

Unicode is a standard for encoding characters across languages that defines over 149,000 characters covering 161 scripts and languages. It aims to allow all languages to be represented digitally. The standard began development in 1987 and is now widely adopted in technology. It defines character codes that are independent of font or screen details to allow consistent representation of text.

Uploaded by

ElenaDiSavoia

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

157 views

Uni Code

Uploaded by

ElenaDiSavoia

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Unicode, formally

The Unicode Standard,[note 1][note 2] is an information technology standard for the

consistent encoding, representation, and handling of text expressed in most of the world's writing
systems. The standard, which is maintained by the Unicode Consortium, defines as of the current
version (15.0) 149,186 characters[3][4] covering 161 modern and historic scripts, as well as symbols,
3664 emoji[5] (including in colors), and non-visual control and formatting codes.
Unicode's success at unifying character sets has led to its widespread and predominant use in
the internationalization and localization of computer software. The standard has been implemented
in many recent technologies, including modern operating systems, XML, and most
modern programming languages.
The Unicode character repertoire is synchronized with ISO/IEC 10646, each being code-for-code
identical with the other. The Unicode Standard, however, includes more than just the
base code. Alongside the character encodings, the Consortium's official publication includes a wide
variety of details about the scripts and how to display them: normalization rules,
decomposition, collation, rendering, and bidirectional text display order for multilingual texts, and so
on.[6] The Standard also includes reference data files and visual charts to help developers and
designers correctly implement the repertoire.
Unicode can be stored using several different encodings, which translate the character codes into
sequences of bytes. The Unicode standard defines three and several other encodings exist, all in
practice variable-length encodings. The most common encodings are the ASCII-compatible UTF-8,
the ASCII-incompatible UTF-16 (compatible with the obsolete UCS-2), and the Chinese Unicode
encoding standard GB18030 which is not an official Unicode standard but is used in China and
implements Unicode fully.

Origin and development[edit]

Unicode has the explicit aim of transcending the limitations of traditional character encodings, such
as those defined by the ISO/IEC 8859 standard, which find wide usage in various countries of the
world but remain largely incompatible with each other. Many traditional character encodings share a
common problem in that they allow bilingual computer processing (usually using Latin
characters and the local script), but not multilingual computer processing (computer processing of
arbitrary scripts mixed with each other).
Unicode, in intent, encodes the underlying characters—graphemes and grapheme-like units—rather
than the variant glyphs (renderings) for such characters. In the case of Chinese characters, this
sometimes leads to controversies over distinguishing the underlying character from its variant glyphs
(see Han unification).
In text processing, Unicode takes the role of providing a unique code point—a number, not a glyph—
for each character. In other words, Unicode represents a character in an abstract way and leaves
the visual rendering (size, shape, font, or style) to other software, such as a web browser or word
processor. This simple aim becomes complicated, however, because of concessions made by
Unicode's designers in the hope of encouraging a more rapid adoption of Unicode.
The first 256 code points were made identical to the content of ISO/IEC 8859-1 so as to make it
trivial to convert existing western text. Many essentially identical characters were encoded multiple
times at different code points to preserve distinctions used by legacy encodings and therefore, allow
conversion from those encodings to Unicode (and back) without losing any information. For
example, the "fullwidth forms" section of code points encompasses a full duplicate of the Latin
alphabet because Chinese, Japanese, and Korean (CJK) fonts contain two versions of these letters,
"fullwidth" matching the width of the CJK characters, and normal width. For other examples,
see duplicate characters in Unicode.
Unicode Bulldog Award recipients include many names influential in the development of Unicode
and include Tatsuo Kobayashi, Thomas Milo, Roozbeh Pournader, Ken Lunde, and Michael
Everson.[7]

History[edit]
Based on experiences with the Xerox Character Code Standard (XCCS) since 1980,[8] the origins of
Unicode can be traced back to 1987, when Joe Becker from Xerox with Lee Collins and Mark
Davis from Apple started investigating the practicalities of creating a universal character set. [9] With
additional input from Peter Fenwick and Dave Opstad,[8] Joe Becker published a draft proposal for an
"international/multilingual text character encoding system in August 1988, tentatively called
Unicode". He explained that "the name 'Unicode' is intended to suggest a unique, unified, universal
encoding".[8]
In this document, entitled Unicode 88, Becker outlined a 16-bit character model:[8]
Unicode is intended to address the need for a workable, reliable world text encoding. Unicode could
be roughly described as "wide-body ASCII" that has been stretched to 16 bits to encompass the
characters of all the world's living languages. In a properly engineered design, 16 bits per character
are more than sufficient for this purpose.
His original 16-bit design was based on the assumption that only those scripts and characters in
modern use would need to be encoded:[8]
Unicode gives higher priority to ensuring utility for the future than to preserving past antiquities.
Unicode aims in the first instance at the characters published in modern text (e.g. in the union of all
newspapers and magazines printed in the world in 1988), whose number is undoubtedly far below
214 = 16,384. Beyond those modern-use characters, all others may be defined to be obsolete or rare;
these are better candidates for private-use registration than for congesting the public list of generally
useful Unicodes.
In early 1989, the Unicode working group expanded to include Ken Whistler and Mike Kernaghan of
Metaphor, Karen Smith-Yoshimura and Joan Aliprand of RLG, and Glenn Wright of Sun
Microsystems, and in 1990, Michel Suignard and Asmus Freytag from Microsoft and Rick McGowan
of NeXT joined the group. By the end of 1990, most of the work on mapping existing character
encoding standards had been completed, and a final review draft of Unicode was ready.
The Unicode Consortium was incorporated in California on 3 January 1991, [10] and in October 1991,
the first volume of the Unicode standard was published. The second volume, covering Han
ideographs, was published in June 1992.
In 1996, a surrogate character mechanism was implemented in Unicode 2.0, so that Unicode was no
longer restricted to 16 bits. This increased the Unicode codespace to over a million code points,
which allowed for the encoding of many historic scripts (e.g., Egyptian hieroglyphs) and thousands of
rarely used or obsolete characters that had not been anticipated as needing encoding. Among the
characters not originally intended for Unicode are rarely used Kanji or Chinese characters, many of
which are part of personal and place names, making them much more essential than envisioned in
the original architecture of Unicode.[11]
The Microsoft TrueType specification version 1.0 from 1992 used the name 'Apple Unicode' instead
of 'Unicode' for the Platform ID in the naming table.

Unicode Consortium[edit]
Main article: Unicode Consortium
The Unicode Consortium is a nonprofit organization that coordinates Unicode's development. Full
members include most of the main computer software and hardware companies with any interest in
text-processing standards, including Adobe, Apple, Facebook, Google, IBM, Microsoft, Netflix,
and SAP SE.[12]
Over the years several countries or government agencies have been members of the Unicode
Consortium. Presently only the Ministry of Endowments and Religious Affairs (Oman) is a full
member with voting rights.[12]
The Consortium has the ambitious goal of eventually replacing existing character encoding schemes
with Unicode and its standard Unicode Transformation Format (UTF) schemes, as many of the
existing schemes are limited in size and scope and are incompatible with multilingual environments.

Scripts covered[edit]
Main article: Script (Unicode)

Many modern applications can render a substantial subset of the many scripts in Unicode, as demonstrated by
this screenshot from the OpenOffice.org application.
Unicode currently covers most major writing systems in use today.[13][better source needed]
As of 2022, a total of 161 scripts[14] are included in the latest version of Unicode
(covering alphabets, abugidas and syllabaries), although there are still scripts that are not yet
encoded, particularly those mainly used in historical, liturgical, and academic contexts. Further
additions of characters to the already encoded scripts, as well as symbols, in particular for
mathematics and music (in the form of notes and rhythmic symbols), also occur.
The Unicode Roadmap Committee (Michael Everson, Rick McGowan, Ken Whistler, V.S.
Umamaheswaran)[15] maintain the list of scripts that are candidates or potential candidates for
encoding and their tentative code block assignments on the Unicode Roadmap [16] page of
the Unicode Consortium website. For some scripts on the Roadmap, such as Jurchen and Khitan
small script, encoding proposals have been made and they are working their way through the
approval process. For other scripts, such as Mayan (besides numbers) and Rongorongo, no
proposal has yet been made, and they await agreement on character repertoire and other details
from the user communities involved.
Some modern invented scripts which have not yet been included in Unicode (e.g., Tengwar) or
which do not qualify for inclusion in Unicode due to lack of real-world use (e.g., Klingon) are listed in
the ConScript Unicode Registry, along with unofficial but widely used Private Use Areas code
assignments.
There is also a Medieval Unicode Font Initiative focused on special Latin medieval characters. Part
of these proposals have been already included into Unicode.

Script Encoding Initiative[edit]

The Script Encoding Initiative, [17] a project run by Deborah Anderson at the University of California,
Berkeley was founded in 2002 with the goal of funding proposals for scripts not yet encoded in the
standard. The project has become a major source of proposed additions to the standard in recent
years.[18]

Versions[edit]
The Unicode Consortium and the International Organization for Standardization (ISO) have together
developed a shared repertoire following the initial publication of The Unicode Standard in 1991;
Unicode and the ISO's Universal Coded Character Set (UCS) use identical character names and
code points. However, the Unicode versions do differ from their ISO equivalents in two significant
ways.
While the UCS is a simple character map, Unicode specifies the rules, algorithms, and properties
necessary to achieve interoperability between different platforms and languages. Thus, The Unicode
Standard includes more information, covering—in depth—topics such as bitwise
encoding, collation and rendering. It also provides a comprehensive catalog of character properties,
including those needed for supporting bidirectional text, as well as visual charts and reference data
sets to aid implementers. Previously, The Unicode Standard was sold as a print volume containing
the complete core specification, standard annexes, and code charts. However, Unicode 5.0,
published in 2006, was the last version printed this way. Starting with version 5.2, only the core
specification, published as print-on-demand paperback, may be purchased. [19] The full text, on the
other hand, is published as a free PDF on the Unicode website.
A practical reason for this publication method highlights the second significant difference between
the UCS and Unicode—the frequency with which updated versions are released and new characters
added. The Unicode Standard has regularly released annual expanded versions, occasionally with
more than one version released in a calendar year and with rare cases where the scheduled release
had to be postponed. For instance, in April 2020, only a month after version 13.0 was published, the
Unicode Consortium announced they had changed the intended release date for version 14.0,
pushing it back six months from March 2021 to September 2021 due to the COVID-19 pandemic.
The latest version of Unicode, 15.0.0, was released on 13 September 2022. Several annexes were
updated including Unicode Security Mechanisms (UTS #39), and a total of 4489 new characters
were encoded, including 20 new emoji characters, such as "wireless" (network) symbol and hearts in
different colors such as pink, two new scripts, CJK Unified Ideographs extension, and multiple
additions to existing blocks.[20][21]

Storytown - Grade - 1 - Spelling - Practice - Book Belinda PDF
100% (1)
Storytown - Grade - 1 - Spelling - Practice - Book Belinda PDF
110 pages
Academic Encounters Level 3 Reading Writing TB
100% (2)
Academic Encounters Level 3 Reading Writing TB
46 pages
Apple 1993
No ratings yet
Apple 1993
46 pages
CHARACTER ENCODING: How Do Computers Deal With Multiple Language?
No ratings yet
CHARACTER ENCODING: How Do Computers Deal With Multiple Language?
26 pages
Unicode - Wikipedia, The Free Encyclopedia
No ratings yet
Unicode - Wikipedia, The Free Encyclopedia
18 pages
Unicode - Language of Universe
No ratings yet
Unicode - Language of Universe
15 pages
06_02_emerging
No ratings yet
06_02_emerging
6 pages
Universal Character Set Characters
No ratings yet
Universal Character Set Characters
34 pages
Problem Addressed by The Topic
No ratings yet
Problem Addressed by The Topic
2 pages
ICT Assignment ASCII Table and UNI Code
No ratings yet
ICT Assignment ASCII Table and UNI Code
4 pages
10.2005.5 Unicode
No ratings yet
10.2005.5 Unicode
4 pages
Programacion Web Parte-4
No ratings yet
Programacion Web Parte-4
4 pages
CH 01
No ratings yet
CH 01
8 pages
Uni Code
No ratings yet
Uni Code
9 pages
An Introduction To Unicode - The Trainer's Friend
No ratings yet
An Introduction To Unicode - The Trainer's Friend
52 pages
Availability: Unicode Input Is The Insertion of A Specific Unicode Character On A Computer by A
No ratings yet
Availability: Unicode Input Is The Insertion of A Specific Unicode Character On A Computer by A
2 pages
(Digital Classical Philology) Character Encoding of Classical Languages
No ratings yet
(Digital Classical Philology) Character Encoding of Classical Languages
22 pages
(Ebook) Unicode Demystified: A Practical Programmer's Guide to the Encoding Standard by Richard Gillam ISBN 9780201700527, 0201700522 2024 Scribd Download
100% (1)
(Ebook) Unicode Demystified: A Practical Programmer's Guide to the Encoding Standard by Richard Gillam ISBN 9780201700527, 0201700522 2024 Scribd Download
81 pages
CH 02
No ratings yet
CH 02
42 pages
7-Text Preprocessing - ASCII and UNICODE-10!01!2024
No ratings yet
7-Text Preprocessing - ASCII and UNICODE-10!01!2024
34 pages
Unicode Tutorial
No ratings yet
Unicode Tutorial
15 pages
Power Point
No ratings yet
Power Point
10 pages
Unicode®: Character Encodings
No ratings yet
Unicode®: Character Encodings
11 pages
Howto Unicode
No ratings yet
Howto Unicode
9 pages
Unicode Demystified A Practical Programmer s Guide to the Encoding Standard 1st Edition Richard Gillam - Download the full ebook version right now
100% (2)
Unicode Demystified A Practical Programmer s Guide to the Encoding Standard 1st Edition Richard Gillam - Download the full ebook version right now
78 pages
Unicode Demystified A Practical Programmer s Guide to the Encoding Standard 1st Edition Richard Gillam instant download
100% (1)
Unicode Demystified A Practical Programmer s Guide to the Encoding Standard 1st Edition Richard Gillam instant download
72 pages
PPT.UNICODE
No ratings yet
PPT.UNICODE
9 pages
[FREE PDF sample] Unicode Demystified A Practical Programmer s Guide to the Encoding Standard 1st Edition Richard Gillam ebooks
100% (1)
[FREE PDF sample] Unicode Demystified A Practical Programmer s Guide to the Encoding Standard 1st Edition Richard Gillam ebooks
81 pages
Howto Unicode
No ratings yet
Howto Unicode
12 pages
Immediate access to Unicode Demystified A Practical Programmer s Guide to the Encoding Standard 1st Edition Richard Gillam ebook full chapters
No ratings yet
Immediate access to Unicode Demystified A Practical Programmer s Guide to the Encoding Standard 1st Edition Richard Gillam ebook full chapters
87 pages
ASCII
0% (1)
ASCII
2 pages
Unicode Fundamentals
No ratings yet
Unicode Fundamentals
51 pages
Character Sets
No ratings yet
Character Sets
1 page
Week 4 - A Comparative Study of UTF-8 UTF-16 and UTF-32
No ratings yet
Week 4 - A Comparative Study of UTF-8 UTF-16 and UTF-32
12 pages
Introduction To Unicode: History of Character Codes
No ratings yet
Introduction To Unicode: History of Character Codes
4 pages
Utf-8 - Wikipedia, The Free Encyclopedia
No ratings yet
Utf-8 - Wikipedia, The Free Encyclopedia
10 pages
Linux Unicode Programming
No ratings yet
Linux Unicode Programming
10 pages
Unicode Block: Design and Implementation
No ratings yet
Unicode Block: Design and Implementation
18 pages
U30000
No ratings yet
U30000
63 pages
Forouzan Appendix
No ratings yet
Forouzan Appendix
106 pages
Howto Unicode PDF
No ratings yet
Howto Unicode PDF
11 pages
Unicode Enabling of ABAP
No ratings yet
Unicode Enabling of ABAP
82 pages
FALLSEM2020-21 CSE4022 ETH VL2020210104471 Reference Material I 25-Jul-2020 NLP2-Lecture 1 3
No ratings yet
FALLSEM2020-21 CSE4022 ETH VL2020210104471 Reference Material I 25-Jul-2020 NLP2-Lecture 1 3
35 pages
Ascii and Unicode
No ratings yet
Ascii and Unicode
6 pages
The Bytext Standard
No ratings yet
The Bytext Standard
78 pages
Desc
No ratings yet
Desc
1 page
Text Processing
No ratings yet
Text Processing
47 pages
Unicode HOWTO: Guido Van Rossum and The Python Development Team
No ratings yet
Unicode HOWTO: Guido Van Rossum and The Python Development Team
12 pages
Module 3
No ratings yet
Module 3
30 pages
Versions of The Unicode Standard
No ratings yet
Versions of The Unicode Standard
7 pages
MM UNIT3
No ratings yet
MM UNIT3
13 pages
Lesson 2 - Binary
No ratings yet
Lesson 2 - Binary
7 pages
2015 04 29 051704mmUNIT3
No ratings yet
2015 04 29 051704mmUNIT3
13 pages
Unicode For Indian Languages
100% (1)
Unicode For Indian Languages
75 pages
Text Encoding
No ratings yet
Text Encoding
8 pages
Ascii Code
No ratings yet
Ascii Code
7 pages
Phoenician: Range: 10900-1091F
No ratings yet
Phoenician: Range: 10900-1091F
2 pages
Uni Code
No ratings yet
Uni Code
13 pages
ISO Basic Latin Alphabet - Wikipedia
No ratings yet
ISO Basic Latin Alphabet - Wikipedia
6 pages
Lecture 1: Encoding Language: LING 1330/2330: Introduction To Computational Linguistics Na-Rae Han
No ratings yet
Lecture 1: Encoding Language: LING 1330/2330: Introduction To Computational Linguistics Na-Rae Han
18 pages
Document (2) (1) - 1
No ratings yet
Document (2) (1) - 1
5 pages
HTML Introduction Part 2
No ratings yet
HTML Introduction Part 2
28 pages
Technology and Books for All
From Everand
Technology and Books for All
Marie Lebert
No ratings yet
Nešto 2
No ratings yet
Nešto 2
5 pages
Nešto 1
No ratings yet
Nešto 1
3 pages
Biography: Erdinand de Saussure (
No ratings yet
Biography: Erdinand de Saussure (
7 pages
Poly - Mer: Cartoon Schematic of Polymer Molecules
No ratings yet
Poly - Mer: Cartoon Schematic of Polymer Molecules
7 pages
Lang
No ratings yet
Lang
5 pages
Yokohama (: Pronounced
No ratings yet
Yokohama (: Pronounced
5 pages
Tanka
No ratings yet
Tanka
3 pages
Osaka: Osaka Prefecture Osaka (Disambiguation)
No ratings yet
Osaka: Osaka Prefecture Osaka (Disambiguation)
19 pages
Shinobu Orikuchi: Inline Citations
No ratings yet
Shinobu Orikuchi: Inline Citations
3 pages
Lineage: Matthew Calbraith Perry (April 10, 1794 - March 4, 1858) Was A
No ratings yet
Lineage: Matthew Calbraith Perry (April 10, 1794 - March 4, 1858) Was A
4 pages
Lesson 2 Eng 113
No ratings yet
Lesson 2 Eng 113
3 pages
Lecture 2: Datastructures and Algorithms For Indexing: Information Retrieval Computer Science Tripos Part II
No ratings yet
Lecture 2: Datastructures and Algorithms For Indexing: Information Retrieval Computer Science Tripos Part II
47 pages
Standard Styles in Related Literature, References, or Citations
No ratings yet
Standard Styles in Related Literature, References, or Citations
2 pages
Ancient Greek - Teach Yourself
100% (5)
Ancient Greek - Teach Yourself
335 pages
Parent Handouts: Building Skills at Home
No ratings yet
Parent Handouts: Building Skills at Home
14 pages
Using The Mathtımeprofessional Fonts With L Tex
No ratings yet
Using The Mathtımeprofessional Fonts With L Tex
7 pages
LASIP - Best-Practices-in-English & FILIPINO READING
100% (7)
LASIP - Best-Practices-in-English & FILIPINO READING
4 pages
Anne Carroll Moore
No ratings yet
Anne Carroll Moore
6 pages
Spelling Homework Ideas
No ratings yet
Spelling Homework Ideas
2 pages
Sow English Year 1 - 2024 - 2025
No ratings yet
Sow English Year 1 - 2024 - 2025
15 pages
Powerpoint Rubric
No ratings yet
Powerpoint Rubric
1 page
Uitleg Kritisch Apparaat Bhs
No ratings yet
Uitleg Kritisch Apparaat Bhs
7 pages
ACEEE Template
No ratings yet
ACEEE Template
2 pages
Statement of Intent Guidelines
No ratings yet
Statement of Intent Guidelines
4 pages
Syed Muhammad Mustafa Hussaini: Software Competencies
No ratings yet
Syed Muhammad Mustafa Hussaini: Software Competencies
3 pages
Boutros Ads Pro
No ratings yet
Boutros Ads Pro
21 pages
Sow English Year 4 2024-2025
No ratings yet
Sow English Year 4 2024-2025
23 pages
LSP-Unit II
No ratings yet
LSP-Unit II
26 pages
Biszewski C Portfolio
No ratings yet
Biszewski C Portfolio
16 pages
Essay Assignment 2024
No ratings yet
Essay Assignment 2024
2 pages
Topic: Visual::Worksheet Number:16: 1 - Find The Odd One Out
100% (1)
Topic: Visual::Worksheet Number:16: 1 - Find The Odd One Out
5 pages
Vocabulary Fundamentals, G1 - Answer
No ratings yet
Vocabulary Fundamentals, G1 - Answer
18 pages
IET Submission DoubleColumn Template
No ratings yet
IET Submission DoubleColumn Template
4 pages
Q & A's Q1: Who Can Become A Content Writer?: Page 1 of 4
No ratings yet
Q & A's Q1: Who Can Become A Content Writer?: Page 1 of 4
4 pages
JPWB - Step 2 Contents
100% (1)
JPWB - Step 2 Contents
13 pages
Kruti Code
50% (8)
Kruti Code
1 page
Devanagari - Wikipedia, The Free Encyclopedia
No ratings yet
Devanagari - Wikipedia, The Free Encyclopedia
11 pages

Uni Code

Uploaded by

Uni Code

Uploaded by

Unicode, formally

The Unicode Standard,[note 1][note 2] is an information technology standard for the

Origin and development[edit]

Script Encoding Initiative[edit]

You might also like