SlideShare a Scribd company logo
XML FOR DUMMIESBook author: Lucinda Dykes and Ed TittelSlides Prepared by Cong TanPart 2 : XML and The WebChapter 6: Adding Character(s) to XML.
ContentsAbout Character Encodings.Introducing Unicode.Character Sets, Fonts, Scripts, and Glyphs.For Each Character, a Code.Key Character Sets.Using Unicode Character s.Finding Character Entity Information.
1. About Character Encodings.  Clearly, the trend is toward longer bit strings to encode character data, so size does matter when representing character data. Here’s why: A 7-bit string can represent  a maximum of  27  , or 128, different characters… An  8-bit string can represent a maximum of 28 , or 256, different characters, including everything a 7-bit encoding can handle, and leaves room for  what some experts call higher-order characters. A 16-bit string can represent a maximum of 216 , or 56.536, different characters. Some modern computers still use 8-bit encodings to represent most character data. Windows NT, Window 2000, and Window XP, however, use 16-bit encoding  for internal representations  of text and most global solutions use 16-bit encoding to support all possible languages and characters.
2. Introducing Unicode.Today, Unicode defines just over 96.000 different character codes. The default, character set used to encode all HTML document on the Web. Many people —including numerous XML experts —refer to the XML character set as “Unicode”. Note that XML 1.0, 2nd Edition references Unicode 2.0 and 3.0, and XML 1.1 references Unicode 4.0, whereas the 1st Edition of XML 1.0 references only Unicode 2.0… For more information about Unicode characters, symbols, history, and the current standard, you can find a plethora of information at the Unicode consortium’s Web site at www.unicode.org.
3. Character Sets, Fonts, Scripts, and Glyphs.To see what’s  in XML scripts  that 7-or 8-bit character encodings can’t cover —which means special symbols or non-Roman alphabets —you’ll need a few extra local ingredients: A character set that matches the script you’re trying to read and display. Software that understands the character set  for the script. An electronic font that allows the character set to be displayed on screen. All these ingredients are necessary to work with alternate character sets. Character sets represent a mapping from a script to a set of corresponding numeric character codes. Fonts represent a collection of glyphs for the numeric  character codes in a character set. Finally, to create text to match the alphabets used in a script, you need an input tool —such as a text or XML editor —that can work with the character set and its corresponding font.
4. For Each Character, a Code.In the Unicode/ISO 10646 character set,  individual characters correspond to specific 16-bit numbers. Numeric entities take one of two forms, decimal or hexadecimal. For example: Each numeric entity in XML has an associated text encoding. If some specific encoding is not defined in a numeric entity’s definition, the default is an encoding called UTF-8, which stands for Unicode Transformation Format, 8-bit form. UTF and UCS are mechanisms for implementing Unicode.  UTF versions include UTF-32, UTF-16,UTF-8,UTF-EBCDIC, and UTF-7 UCS versions include UCS-4 and UCS-2. UTF-16 used mainly for internal processing. က<!-- &# indicates a decimal number -->ༀ<!-- &#x indicates a hexadecimal number-->
5. Key Character Sets.Most computers today use some variant of the  ASCII, an 8-bit character set that  handles the basic Roman alphabet used for English, along with punctuation, numbers, and simple symbols. Most European languages match standard ASCII values from 0 to 127 and go on from there to define alternate mappings between character codes and local script characters for values from 128 to 255.Non-Roman alphabets, such as Hebrew, Japanese, and Thai, depend on special character sets that include basic ASCII(0-127, or 0-255) . A listing of character sets built around the ASCII framework appears in Table 6-1.
Xml For Dummies   Chapter 6 Adding Character(S) To Xml
Table 6-1 shows that most character sets can render English and German, plus  a collection of other.When choosing a variant of ISO-8859, remember that all the languages you want to include must use Unicode.XML goes beyond such idiosyncratic or customized character sets and uses Unicode.
6. Using Unicode Characters.	  So do many modern word processors —for instance, Word 97, and later versions support a format called encoded text that uses Unicode encoding. If you  don’t have already access  to such tools and want to save XML file in Unicode format, you must use a conversion tool. Several different tools , both freeware and commercial products, are available, depending on your OS. Widely used tools such as Netscape Navigator(version 4.1 or newer) and IE(version 5 or newer) can handle most ISO-8859 variants. If you want to use  an alternate character encoding, you must identify that encoding in your XML document’s prolog as follows:  Note that XML parsers are required to support only UTF-8 and UTF-16  encodings, so the encoding attribute in an XML document prolog might not work with all such tools.<?xml version=”1.0” encoding=”ISO-8859-9”?>
7. Finding Character Entity Information. Resource :  The Unicode Standard, version 4.0 or you can  also find plenty of encoding information online, for example: www.unicode.org/ucd/ You’ll also find the XHTML entity lists useful  in this context: Latin-1: www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent. Special: www.w3.org/TR/xhtml1/DTD/xhtml-special.ent Symbols: www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
THE END
Ad

More Related Content

What's hot (16)

Io
IoIo
Io
Minal Maniar
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
Elizabeth Smith
 
Your Guide to be a Software Engineer
Your Guide to be a Software EngineerYour Guide to be a Software Engineer
Your Guide to be a Software Engineer
Ahmed Mater
 
Strings and encodings
Strings and encodingsStrings and encodings
Strings and encodings
bradleygrainger
 
Adam Goucher I18n And L10n
Adam Goucher   I18n And L10nAdam Goucher   I18n And L10n
Adam Goucher I18n And L10n
Adam Goucher
 
PDT DC015 Chapter 2 Computer System 2017/2018 (e)
PDT DC015 Chapter 2 Computer System 2017/2018 (e)PDT DC015 Chapter 2 Computer System 2017/2018 (e)
PDT DC015 Chapter 2 Computer System 2017/2018 (e)
Fizaril Amzari Omar
 
Notes on a Standard: Unicode
Notes on a Standard: UnicodeNotes on a Standard: Unicode
Notes on a Standard: Unicode
Elena-Oana Tabaranu
 
Unicode 101
Unicode 101Unicode 101
Unicode 101
davidfstr
 
Character Encoding issue with PHP
Character Encoding issue with PHPCharacter Encoding issue with PHP
Character Encoding issue with PHP
Ravi Raj
 
Understand unicode & utf8 in perl (2)
Understand unicode & utf8 in perl (2)Understand unicode & utf8 in perl (2)
Understand unicode & utf8 in perl (2)
Jerome Eteve
 
Uncdtalk
UncdtalkUncdtalk
Uncdtalk
Bilal Maqbool ツ
 
SignWriting in Unicode Next
SignWriting in Unicode NextSignWriting in Unicode Next
SignWriting in Unicode Next
Stephen Slevinski
 
Python
PythonPython
Python
Dr. SURBHI SAROHA
 
Introduction to W3C I18N Best Practices
Introduction to W3C I18N Best PracticesIntroduction to W3C I18N Best Practices
Introduction to W3C I18N Best Practices
Gopal Venkatesan
 
Unicode and kurdish fonts
Unicode and kurdish fontsUnicode and kurdish fonts
Unicode and kurdish fonts
c_s_halabja
 
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5   XMLM.FLORENCE DAYANA WEB DESIGN -Unit 5   XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
Dr.Florence Dayana
 
Your Guide to be a Software Engineer
Your Guide to be a Software EngineerYour Guide to be a Software Engineer
Your Guide to be a Software Engineer
Ahmed Mater
 
Adam Goucher I18n And L10n
Adam Goucher   I18n And L10nAdam Goucher   I18n And L10n
Adam Goucher I18n And L10n
Adam Goucher
 
PDT DC015 Chapter 2 Computer System 2017/2018 (e)
PDT DC015 Chapter 2 Computer System 2017/2018 (e)PDT DC015 Chapter 2 Computer System 2017/2018 (e)
PDT DC015 Chapter 2 Computer System 2017/2018 (e)
Fizaril Amzari Omar
 
Character Encoding issue with PHP
Character Encoding issue with PHPCharacter Encoding issue with PHP
Character Encoding issue with PHP
Ravi Raj
 
Understand unicode & utf8 in perl (2)
Understand unicode & utf8 in perl (2)Understand unicode & utf8 in perl (2)
Understand unicode & utf8 in perl (2)
Jerome Eteve
 
Introduction to W3C I18N Best Practices
Introduction to W3C I18N Best PracticesIntroduction to W3C I18N Best Practices
Introduction to W3C I18N Best Practices
Gopal Venkatesan
 
Unicode and kurdish fonts
Unicode and kurdish fontsUnicode and kurdish fonts
Unicode and kurdish fonts
c_s_halabja
 
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5   XMLM.FLORENCE DAYANA WEB DESIGN -Unit 5   XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
Dr.Florence Dayana
 

Viewers also liked (20)

X Laran Ax
X Laran AxX Laran Ax
X Laran Ax
ESPEJO 25
 
Xing Sardegna newsletter gennaio 2010
Xing Sardegna newsletter gennaio 2010Xing Sardegna newsletter gennaio 2010
Xing Sardegna newsletter gennaio 2010
Andrea ADSLLOSO Portoghese
 
Xonar2010
Xonar2010Xonar2010
Xonar2010
Anneke65
 
Xilokastro
XilokastroXilokastro
Xilokastro
Adonios
 
Xi coneia pucallpa 2010
Xi coneia pucallpa 2010Xi coneia pucallpa 2010
Xi coneia pucallpa 2010
UNFV
 
Xml holland - XQuery novelties - Geert Josten
Xml holland - XQuery novelties - Geert JostenXml holland - XQuery novelties - Geert Josten
Xml holland - XQuery novelties - Geert Josten
Daidalos
 
XOOPS Securilty flow
XOOPS Securilty flowXOOPS Securilty flow
XOOPS Securilty flow
Yoshi Sakai
 
Xequemate 31
Xequemate 31Xequemate 31
Xequemate 31
guestff87a6
 
Place of Ecuador por Ximena Llumiquinga
Place of Ecuador por Ximena LlumiquingaPlace of Ecuador por Ximena Llumiquinga
Place of Ecuador por Ximena Llumiquinga
Ximena-Llumiquinga
 
Xener krp 100601_1
Xener krp 100601_1Xener krp 100601_1
Xener krp 100601_1
xenersystems
 
Xina Voyage(A De Mello)
Xina Voyage(A De Mello)Xina Voyage(A De Mello)
Xina Voyage(A De Mello)
franchyintercultural
 
Xi%20 cto%20menores%20de%20edad%20y%20iii%20cto%20divisiones%20varias%20aire%...
Xi%20 cto%20menores%20de%20edad%20y%20iii%20cto%20divisiones%20varias%20aire%...Xi%20 cto%20menores%20de%20edad%20y%20iii%20cto%20divisiones%20varias%20aire%...
Xi%20 cto%20menores%20de%20edad%20y%20iii%20cto%20divisiones%20varias%20aire%...
elarcoestandar
 
Xerox annual reports 2002
Xerox annual reports  2002Xerox annual reports  2002
Xerox annual reports 2002
finance15
 
Xls issues in life sciences ed 12 april 2013
Xls issues in life sciences ed 12 april 2013Xls issues in life sciences ed 12 april 2013
Xls issues in life sciences ed 12 april 2013
ayanda hlope
 
Xii Encuentro Latinoamericano De Educadores
Xii Encuentro Latinoamericano De EducadoresXii Encuentro Latinoamericano De Educadores
Xii Encuentro Latinoamericano De Educadores
Anaclara Dalla Valle
 
XI UNDICESIMO SALONE DIMPRESA LAB 2 vendite Leggere - Italfile
XI UNDICESIMO SALONE DIMPRESA LAB 2 vendite Leggere - Italfile XI UNDICESIMO SALONE DIMPRESA LAB 2 vendite Leggere - Italfile
XI UNDICESIMO SALONE DIMPRESA LAB 2 vendite Leggere - Italfile
Roberto Terzi
 
Xilokastro
XilokastroXilokastro
Xilokastro
Adonios
 
Xi coneia pucallpa 2010
Xi coneia pucallpa 2010Xi coneia pucallpa 2010
Xi coneia pucallpa 2010
UNFV
 
Xml holland - XQuery novelties - Geert Josten
Xml holland - XQuery novelties - Geert JostenXml holland - XQuery novelties - Geert Josten
Xml holland - XQuery novelties - Geert Josten
Daidalos
 
XOOPS Securilty flow
XOOPS Securilty flowXOOPS Securilty flow
XOOPS Securilty flow
Yoshi Sakai
 
Place of Ecuador por Ximena Llumiquinga
Place of Ecuador por Ximena LlumiquingaPlace of Ecuador por Ximena Llumiquinga
Place of Ecuador por Ximena Llumiquinga
Ximena-Llumiquinga
 
Xener krp 100601_1
Xener krp 100601_1Xener krp 100601_1
Xener krp 100601_1
xenersystems
 
Xi%20 cto%20menores%20de%20edad%20y%20iii%20cto%20divisiones%20varias%20aire%...
Xi%20 cto%20menores%20de%20edad%20y%20iii%20cto%20divisiones%20varias%20aire%...Xi%20 cto%20menores%20de%20edad%20y%20iii%20cto%20divisiones%20varias%20aire%...
Xi%20 cto%20menores%20de%20edad%20y%20iii%20cto%20divisiones%20varias%20aire%...
elarcoestandar
 
Xerox annual reports 2002
Xerox annual reports  2002Xerox annual reports  2002
Xerox annual reports 2002
finance15
 
Xls issues in life sciences ed 12 april 2013
Xls issues in life sciences ed 12 april 2013Xls issues in life sciences ed 12 april 2013
Xls issues in life sciences ed 12 april 2013
ayanda hlope
 
Xii Encuentro Latinoamericano De Educadores
Xii Encuentro Latinoamericano De EducadoresXii Encuentro Latinoamericano De Educadores
Xii Encuentro Latinoamericano De Educadores
Anaclara Dalla Valle
 
XI UNDICESIMO SALONE DIMPRESA LAB 2 vendite Leggere - Italfile
XI UNDICESIMO SALONE DIMPRESA LAB 2 vendite Leggere - Italfile XI UNDICESIMO SALONE DIMPRESA LAB 2 vendite Leggere - Italfile
XI UNDICESIMO SALONE DIMPRESA LAB 2 vendite Leggere - Italfile
Roberto Terzi
 
Ad

Similar to Xml For Dummies Chapter 6 Adding Character(S) To Xml (20)

Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicode
Ulf Mattsson
 
Unicode Primer for the Uninitiated
Unicode Primer for the UninitiatedUnicode Primer for the Uninitiated
Unicode Primer for the Uninitiated
Lingoport (www.lingoport.com)
 
Software Internationalization Crash Course
Software Internationalization Crash CourseSoftware Internationalization Crash Course
Software Internationalization Crash Course
Will Iverson
 
Lecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.pptLecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.ppt
Alula Tafere
 
Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfiles
Milind Patil
 
Understanding Character Encodings
Understanding Character EncodingsUnderstanding Character Encodings
Understanding Character Encodings
Mobisoft Infotech
 
Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...
Ulf Mattsson
 
What is Python Interpreter.pptx
What is Python Interpreter.pptxWhat is Python Interpreter.pptx
What is Python Interpreter.pptx
SudhanshiBakre1
 
Encoding Nightmares (and how to avoid them)
Encoding Nightmares (and how to avoid them)Encoding Nightmares (and how to avoid them)
Encoding Nightmares (and how to avoid them)
Kenneth Farrall
 
EXTENSIBLE MARKUP LANGUAGE BY SAIKIRAN PANJALA
EXTENSIBLE MARKUP LANGUAGE BY SAIKIRAN PANJALAEXTENSIBLE MARKUP LANGUAGE BY SAIKIRAN PANJALA
EXTENSIBLE MARKUP LANGUAGE BY SAIKIRAN PANJALA
Saikiran Panjala
 
Unicode
UnicodeUnicode
Unicode
Ankit Sharma
 
Comprehasive Exam - IT
Comprehasive Exam - ITComprehasive Exam - IT
Comprehasive Exam - IT
guest6ddfb98
 
Unicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set CollisionsUnicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set Collisions
Ray Paseur
 
Computers and text
Computers and textComputers and text
Computers and text
chitcharonko
 
4.language expert rendering unicode text on ascii editor for indian languages...
4.language expert rendering unicode text on ascii editor for indian languages...4.language expert rendering unicode text on ascii editor for indian languages...
4.language expert rendering unicode text on ascii editor for indian languages...
EditorJST
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
Elizabeth Smith
 
13001620124_AashishAgarwal_Data representation.text and numbers.pdf
13001620124_AashishAgarwal_Data representation.text and numbers.pdf13001620124_AashishAgarwal_Data representation.text and numbers.pdf
13001620124_AashishAgarwal_Data representation.text and numbers.pdf
ssusercf82c42
 
chapter-2.pptx
chapter-2.pptxchapter-2.pptx
chapter-2.pptx
RithinA1
 
How To Build And Launch A Successful Globalized App From Day One Or All The ...
How To Build And Launch A Successful Globalized App From Day One  Or All The ...How To Build And Launch A Successful Globalized App From Day One  Or All The ...
How To Build And Launch A Successful Globalized App From Day One Or All The ...
agileware
 
Character sets and alphabets
Character sets and alphabetsCharacter sets and alphabets
Character sets and alphabets
RazinaShamim
 
Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicode
Ulf Mattsson
 
Software Internationalization Crash Course
Software Internationalization Crash CourseSoftware Internationalization Crash Course
Software Internationalization Crash Course
Will Iverson
 
Lecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.pptLecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.ppt
Alula Tafere
 
Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfiles
Milind Patil
 
Understanding Character Encodings
Understanding Character EncodingsUnderstanding Character Encodings
Understanding Character Encodings
Mobisoft Infotech
 
Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...
Ulf Mattsson
 
What is Python Interpreter.pptx
What is Python Interpreter.pptxWhat is Python Interpreter.pptx
What is Python Interpreter.pptx
SudhanshiBakre1
 
Encoding Nightmares (and how to avoid them)
Encoding Nightmares (and how to avoid them)Encoding Nightmares (and how to avoid them)
Encoding Nightmares (and how to avoid them)
Kenneth Farrall
 
EXTENSIBLE MARKUP LANGUAGE BY SAIKIRAN PANJALA
EXTENSIBLE MARKUP LANGUAGE BY SAIKIRAN PANJALAEXTENSIBLE MARKUP LANGUAGE BY SAIKIRAN PANJALA
EXTENSIBLE MARKUP LANGUAGE BY SAIKIRAN PANJALA
Saikiran Panjala
 
Comprehasive Exam - IT
Comprehasive Exam - ITComprehasive Exam - IT
Comprehasive Exam - IT
guest6ddfb98
 
Unicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set CollisionsUnicode, PHP, and Character Set Collisions
Unicode, PHP, and Character Set Collisions
Ray Paseur
 
Computers and text
Computers and textComputers and text
Computers and text
chitcharonko
 
4.language expert rendering unicode text on ascii editor for indian languages...
4.language expert rendering unicode text on ascii editor for indian languages...4.language expert rendering unicode text on ascii editor for indian languages...
4.language expert rendering unicode text on ascii editor for indian languages...
EditorJST
 
13001620124_AashishAgarwal_Data representation.text and numbers.pdf
13001620124_AashishAgarwal_Data representation.text and numbers.pdf13001620124_AashishAgarwal_Data representation.text and numbers.pdf
13001620124_AashishAgarwal_Data representation.text and numbers.pdf
ssusercf82c42
 
chapter-2.pptx
chapter-2.pptxchapter-2.pptx
chapter-2.pptx
RithinA1
 
How To Build And Launch A Successful Globalized App From Day One Or All The ...
How To Build And Launch A Successful Globalized App From Day One  Or All The ...How To Build And Launch A Successful Globalized App From Day One  Or All The ...
How To Build And Launch A Successful Globalized App From Day One Or All The ...
agileware
 
Character sets and alphabets
Character sets and alphabetsCharacter sets and alphabets
Character sets and alphabets
RazinaShamim
 
Ad

More from phanleson (20)

Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
phanleson
 
Firewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth FirewallsFirewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth Firewalls
phanleson
 
Mobile Security - Wireless hacking
Mobile Security - Wireless hackingMobile Security - Wireless hacking
Mobile Security - Wireless hacking
phanleson
 
Authentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless ProtocolsAuthentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless Protocols
phanleson
 
E-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server AttacksE-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server Attacks
phanleson
 
Hacking web applications
Hacking web applicationsHacking web applications
Hacking web applications
phanleson
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
phanleson
 
HBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - OperationsHBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - Operations
phanleson
 
Hbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBaseHbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBase
phanleson
 
Learning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlibLearning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlib
phanleson
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
phanleson
 
Learning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQLLearning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQL
phanleson
 
Learning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a ClusterLearning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a Cluster
phanleson
 
Learning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark ProgrammingLearning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark Programming
phanleson
 
Learning spark ch05 - Loading and Saving Your Data
Learning spark ch05 - Loading and Saving Your DataLearning spark ch05 - Loading and Saving Your Data
Learning spark ch05 - Loading and Saving Your Data
phanleson
 
Learning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value PairsLearning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value Pairs
phanleson
 
Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
phanleson
 
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about LibertagiaHướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
phanleson
 
Lecture 1 - Getting to know XML
Lecture 1 - Getting to know XMLLecture 1 - Getting to know XML
Lecture 1 - Getting to know XML
phanleson
 
Lecture 4 - Adding XTHML for the Web
Lecture  4 - Adding XTHML for the WebLecture  4 - Adding XTHML for the Web
Lecture 4 - Adding XTHML for the Web
phanleson
 
Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
phanleson
 
Firewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth FirewallsFirewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth Firewalls
phanleson
 
Mobile Security - Wireless hacking
Mobile Security - Wireless hackingMobile Security - Wireless hacking
Mobile Security - Wireless hacking
phanleson
 
Authentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless ProtocolsAuthentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless Protocols
phanleson
 
E-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server AttacksE-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server Attacks
phanleson
 
Hacking web applications
Hacking web applicationsHacking web applications
Hacking web applications
phanleson
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
phanleson
 
HBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - OperationsHBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - Operations
phanleson
 
Hbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBaseHbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBase
phanleson
 
Learning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlibLearning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlib
phanleson
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
phanleson
 
Learning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQLLearning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQL
phanleson
 
Learning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a ClusterLearning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a Cluster
phanleson
 
Learning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark ProgrammingLearning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark Programming
phanleson
 
Learning spark ch05 - Loading and Saving Your Data
Learning spark ch05 - Loading and Saving Your DataLearning spark ch05 - Loading and Saving Your Data
Learning spark ch05 - Loading and Saving Your Data
phanleson
 
Learning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value PairsLearning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value Pairs
phanleson
 
Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
phanleson
 
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about LibertagiaHướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
phanleson
 
Lecture 1 - Getting to know XML
Lecture 1 - Getting to know XMLLecture 1 - Getting to know XML
Lecture 1 - Getting to know XML
phanleson
 
Lecture 4 - Adding XTHML for the Web
Lecture  4 - Adding XTHML for the WebLecture  4 - Adding XTHML for the Web
Lecture 4 - Adding XTHML for the Web
phanleson
 

Recently uploaded (20)

TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
TrsLabs Consultants - DeFi, WEb3, Token Listing
TrsLabs Consultants - DeFi, WEb3, Token ListingTrsLabs Consultants - DeFi, WEb3, Token Listing
TrsLabs Consultants - DeFi, WEb3, Token Listing
Trs Labs
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Social Media App Development Company-EmizenTech
Social Media App Development Company-EmizenTechSocial Media App Development Company-EmizenTech
Social Media App Development Company-EmizenTech
Steve Jonas
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
MINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PRMINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PR
MIND CTI
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
TrsLabs Consultants - DeFi, WEb3, Token Listing
TrsLabs Consultants - DeFi, WEb3, Token ListingTrsLabs Consultants - DeFi, WEb3, Token Listing
TrsLabs Consultants - DeFi, WEb3, Token Listing
Trs Labs
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Social Media App Development Company-EmizenTech
Social Media App Development Company-EmizenTechSocial Media App Development Company-EmizenTech
Social Media App Development Company-EmizenTech
Steve Jonas
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
MINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PRMINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PR
MIND CTI
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 

Xml For Dummies Chapter 6 Adding Character(S) To Xml

  • 1. XML FOR DUMMIESBook author: Lucinda Dykes and Ed TittelSlides Prepared by Cong TanPart 2 : XML and The WebChapter 6: Adding Character(s) to XML.
  • 2. ContentsAbout Character Encodings.Introducing Unicode.Character Sets, Fonts, Scripts, and Glyphs.For Each Character, a Code.Key Character Sets.Using Unicode Character s.Finding Character Entity Information.
  • 3. 1. About Character Encodings. Clearly, the trend is toward longer bit strings to encode character data, so size does matter when representing character data. Here’s why: A 7-bit string can represent a maximum of 27 , or 128, different characters… An 8-bit string can represent a maximum of 28 , or 256, different characters, including everything a 7-bit encoding can handle, and leaves room for what some experts call higher-order characters. A 16-bit string can represent a maximum of 216 , or 56.536, different characters. Some modern computers still use 8-bit encodings to represent most character data. Windows NT, Window 2000, and Window XP, however, use 16-bit encoding for internal representations of text and most global solutions use 16-bit encoding to support all possible languages and characters.
  • 4. 2. Introducing Unicode.Today, Unicode defines just over 96.000 different character codes. The default, character set used to encode all HTML document on the Web. Many people —including numerous XML experts —refer to the XML character set as “Unicode”. Note that XML 1.0, 2nd Edition references Unicode 2.0 and 3.0, and XML 1.1 references Unicode 4.0, whereas the 1st Edition of XML 1.0 references only Unicode 2.0… For more information about Unicode characters, symbols, history, and the current standard, you can find a plethora of information at the Unicode consortium’s Web site at www.unicode.org.
  • 5. 3. Character Sets, Fonts, Scripts, and Glyphs.To see what’s in XML scripts that 7-or 8-bit character encodings can’t cover —which means special symbols or non-Roman alphabets —you’ll need a few extra local ingredients: A character set that matches the script you’re trying to read and display. Software that understands the character set for the script. An electronic font that allows the character set to be displayed on screen. All these ingredients are necessary to work with alternate character sets. Character sets represent a mapping from a script to a set of corresponding numeric character codes. Fonts represent a collection of glyphs for the numeric character codes in a character set. Finally, to create text to match the alphabets used in a script, you need an input tool —such as a text or XML editor —that can work with the character set and its corresponding font.
  • 6. 4. For Each Character, a Code.In the Unicode/ISO 10646 character set, individual characters correspond to specific 16-bit numbers. Numeric entities take one of two forms, decimal or hexadecimal. For example: Each numeric entity in XML has an associated text encoding. If some specific encoding is not defined in a numeric entity’s definition, the default is an encoding called UTF-8, which stands for Unicode Transformation Format, 8-bit form. UTF and UCS are mechanisms for implementing Unicode. UTF versions include UTF-32, UTF-16,UTF-8,UTF-EBCDIC, and UTF-7 UCS versions include UCS-4 and UCS-2. UTF-16 used mainly for internal processing. က<!-- &# indicates a decimal number -->ༀ<!-- &#x indicates a hexadecimal number-->
  • 7. 5. Key Character Sets.Most computers today use some variant of the ASCII, an 8-bit character set that handles the basic Roman alphabet used for English, along with punctuation, numbers, and simple symbols. Most European languages match standard ASCII values from 0 to 127 and go on from there to define alternate mappings between character codes and local script characters for values from 128 to 255.Non-Roman alphabets, such as Hebrew, Japanese, and Thai, depend on special character sets that include basic ASCII(0-127, or 0-255) . A listing of character sets built around the ASCII framework appears in Table 6-1.
  • 9. Table 6-1 shows that most character sets can render English and German, plus a collection of other.When choosing a variant of ISO-8859, remember that all the languages you want to include must use Unicode.XML goes beyond such idiosyncratic or customized character sets and uses Unicode.
  • 10. 6. Using Unicode Characters. So do many modern word processors —for instance, Word 97, and later versions support a format called encoded text that uses Unicode encoding. If you don’t have already access to such tools and want to save XML file in Unicode format, you must use a conversion tool. Several different tools , both freeware and commercial products, are available, depending on your OS. Widely used tools such as Netscape Navigator(version 4.1 or newer) and IE(version 5 or newer) can handle most ISO-8859 variants. If you want to use an alternate character encoding, you must identify that encoding in your XML document’s prolog as follows: Note that XML parsers are required to support only UTF-8 and UTF-16 encodings, so the encoding attribute in an XML document prolog might not work with all such tools.<?xml version=”1.0” encoding=”ISO-8859-9”?>
  • 11. 7. Finding Character Entity Information. Resource : The Unicode Standard, version 4.0 or you can also find plenty of encoding information online, for example: www.unicode.org/ucd/ You’ll also find the XHTML entity lists useful in this context: Latin-1: www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent. Special: www.w3.org/TR/xhtml1/DTD/xhtml-special.ent Symbols: www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent