XML What Is XML?: Ralph Mosely
XML What Is XML?: Ralph Mosely
What is XML?
XML is a meta-language, which can be used to store data & act as a mechanism to transfer
information between dissimilar systems.
XML stands for EXtensible Markup Language.
XML is a markup language much like HTML.
XML was designed to describe data.
XML tags are not predefined in XML. You must define your own tags.
XML is self describing.
XML uses a DTD (Document Type Definition) to formally describe the data.
<?xml version=”1.0”?>
<Person>
<Firstname>Ralph</Firstname>
<Lastname>Mosely</Lastname>
</Person>
XML stands for Extensible Markup Language. It is a text-based markup language derived from Standard
Generalized Markup Language (SGML).
XML tags identify the data and are used to store and organize the data, rather than specifying how to
display it like HTML tags, which are used to display the data. XML is not going to replace HTML in the
near future, but it introduces new possibilities by adopting many successful features of HTML.
There are three important characteristics of XML that make it useful in a variety of systems and solutions
−
XML is extensible − XML allows you to create your own self-descriptive tags, or language, that
suits your application.
XML carries the data, does not present it − XML allows you to store the data irrespective of how
it will be presented.
XML is a public standard − XML was developed by an organization called the World Wide Web
Consortium (W3C) and is available as an open standard.
XML Usage
A short list of XML usage says it all −
XML can work behind the scene to simplify the creation of HTML documents for large web sites.
XML can be used to exchange the information between organizations and systems.
XML can be used for offloading and reloading of databases.
XML can be used to store and arrange the data, which can customize your data handling needs.
XML can easily be merged with style sheets to create almost any desired output.
Virtually, any type of data can be expressed as an XML document.
What is Markup?
XML is a markup language that defines set of rules for encoding documents in a format that is both
human-readable and machine-readable. So what exactly is a markup language? Markup is information
added to a document that enhances its meaning in certain ways, in that it identifies the parts and how
they relate to each other. More specifically, a markup language is a set of symbols that can be placed in
the text of a document to demarcate and label the parts of that document.
Following example shows how XML markup looks, when embedded in a piece of text −
<message>
<text>Hello, world!</text>
</message>
This snippet includes the markup symbols, or the tags such as <message>...</message> and <text>...
</text>. The tags <message> and </message> mark the start and the end of the XML code fragment. The
tags <text> and </text> surround the text Hello, world!.
Difference between XML and HTML
XML HTM
L
XML was designed to store data and HTML was designed to display data.
transferthe
data.
XML focuses on what data is. HTML focus on how data looks.
In XML you can design your own tag. HTML has predefined tags.
XML uses parser to check & read xml fileseg. HTML don’t use any kind of parser
DOM,
SAX
Use of XML
Used to exchange data between dissimilar systems.
Used to describe content of document.
XML can be used as database to store data.
Features of XML
XML has its own tag so it’s self describing.
Language Independent:Any language is able to read & write XML.
OS Independent: can be work on any platform.
Readability: It’s a plain text file in user readable format so you can edit or viewin simple editor.
Hierarchical: It has hierarchical structure which is powerful to express complex data and
simple to understand.
<bookstore>
<book category="CHILDREN">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
<file type="gif">computer.gif</file>
Xmlns=”http://www.mydomian.com/ns/animals/1.1”
Create a XML file that contains Book Information.
<xml version=”1.0”?>
<bookstore>
<book>
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
<book>
<title>WAD</title>
<author>Ralph Mosely</author>
<year>2001</year>
<price>395</price>
</book>
</bookstore>
Internal DTD
o This is an XML document with a Document Type Definition:
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]><note>
<to>Ravi</to>
<from>Ketan</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Ravi</to>
<from>Narendra</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
o This is a copy of the file "note.dtd" containing the Document Type Definition:
<?xml version="1.0"?>
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
DTD - Elements
In the DTD, XML elements are declared with an element declaration. An element
declaration has the following syntax:
<!ELEMENT element-name
(#CDATA)>or
<!ELEMENT element-name
(#PCDATA)>or
<!ELEMENT element-name
(ANY)>example:
<!ELEMENT note (#PCDATA)>
o #CDATA means the element contains character data that is not supposed to be
parsed byaparser.
o #PCDATA means that the element contains data that IS going to be parsed by a parser.
o The keyword ANY declares an element with any content.
o If a #PCDATA section contains elements, these elements must also be declared.
Elements with children (sequences)
o Elements with one or more children are defined with the name of the children
elementsinside the parentheses:
<!ELEMENT element-name (child-element-
name)>or
<!ELEMENT element-name (child-element-name,child-element-name,. )>
example:
<!ELEMENT note (to,from,heading,body)>
o When children are declared in a sequence separated by commas, the children must
appearin the same sequence in the document. In a full declaration, the children must
also bedeclared, and the children can also have children. The full declaration of the
notedocument will be:
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#CDATA)>
<!ELEMENT from (#CDATA)>
<!ELEMENT heading (#CDATA)>
<!ELEMENT body (#CDATA)>
Wrapping
o If the DTD is to be included in your XML source file, it should be wrapped in
aDOCTYPEdefinition with the following syntax:
o The * sign in the example above declares that the child element message can occur
zeroor more times inside the note element.
Declaring zero or one occurrences of the same element
o The? Sign in the example above declares that the child element message can occur
zeroor one times inside the note element.
DTD – Attributes
Attributes provide extra information about elements.
Attributes are placed inside the start tag of an element.
Declaring Attributes
o In the DTD, XML element attributes are declared with an ATTLIST declaration.
Anattributedeclaration has the following syntax:
o As you can see from the syntax above, the ATTLIST declaration defines the
elementwhich can have the attribute, the name of the attribute, the type of the
attribute, and thedefault attribute value.
o The attribute-type can have the following values:
Value Explanatio
n
CDATA The value is character data
(eval|eval|..) The value must be an enumerated value
ID The value is an unique id
IDREF The value is the id of another element
IDREFS The value is a list of other ids
NMTOKEN The value is a valid XML name
NMTOKENS The value is a list of valid XML names
ENTITY The value is an entity
ENTITIES The value is a list of entities
NOTATION The value is a name of a notation
xml: The value is predefined
DTD example:
<!ELEMENT square EMPTY>
<!ATTLIST square width CDATA
"0">XML example:
<square width="100"></square>
o In the above example the element square is defined to be an empty element with
theattributes width of type CDATA. The width attribute has a default value of 0.
XML Schema
An XML Schema describes the structure of an XML document.
XML Schema is an XML-based alternative to DTD.
The XML Schema language is also referred to as XML Schema Definition (XSD).
XML Schema is a W3C Recommendation.
XML Schema is commonly known as XML Schema Definition (XSD). It is used to describe and
validate the structure and the content of XML data. XML schema defines the elements,
attributes and data types. Schema element supports Namespaces. It is similar to a database
schema that describes the data in a database.
XSD Elements
XML Schemas define the elements of your XML files. It’s of two types:
o Simple
o Complex Type
<lastname>Refsnes</lastname>
<age>36</age>
<dateborn>1970-03-27</dateborn>
<product pid="1345"/>
o A complex XML element, "employee", which contains only other elements:
<employee>
<firstname>John</firstname>
<lastname>Smith</lastname>
</employee>
o A complex XML element, "food", which contains only text:
<description>
It happened on <date lang="norwegian">03.03.99</date> ....
</description>
XSD Attributes
Simple elements cannot have attributes. If an element has attributes, it is considered to
be of acomplex type. But the attribute itself is always declared as a simple type.
How to Define an Attribute?
o The syntax for defining an attribute is:
XML Parsers
An XML parser is a software library or package that provides interfaces for client applications to work
with an XML document. The XML Parser is designed to read the XML and create a way for programs to
use XML.
XML parser validates the document and check that the document is well formatted.
Let's understand the working of XML parser by the figure given below:
There are two types of XML parsers namely Simple API for XML and Document Object Model.
SAX
DOM
SAX (Simple API for XML), is the most widely adopted API for XML in Java and is considered the de -
facto standard. Although it started as a library exclusive to Java, it is now a well-known API distributed
over a variety of programming languages. It is an open-source project and has recently switched to
SourceForge project infrastructure that makes it easier to track open SAX issues outside the high-
volume XML-dev list. The current latest version as of 01/10/2018 is SAX 2.0. It uses an event-driven
serial-access mechanism for accessing XML documents and is frequently used by applets that need to
access XML documents because it is the fastest and least memory-consuming API available for parsing
XML documents. The mechanism SAX uses makes it independent of the elements that came before, i.e.
it is state-independent.
DOM stands for Document Object Model. The DOM API provides the classes to read and write an XML
file. DOM reads an entire document. It is useful when reading small to medium size XML files. It is a
tree-based parser and a little slow when compared to SAX and occupies more space when loaded into
memory. We can insert and delete nodes using the DOM API.
Now, the package that provides linkage applications for clients that work with an XML document is
called an XML Parser. It was planned to read the XML documents. An XML Parser was created for doing
programs to use XML.
SAX Parser
SAX represents a simple API for XML and a SAX API is implemented by SAX Parser. This API was called
event-based API which provides interfaces on handlers. There are four handler interfaces.
ContentHandler, DTDHandler, EntityResolver, and ErrorHandler interface. It does not create any
internal structure rather it takes the occurrences of components of an input document as events, and
then it tells the client what it reads as it reads through the input document. It is suitable for large XML
files because it doesn’t require loading the whole XML file.
Features Of SAX Parser
The internal structure can not be created by SAX Parser.
These event-based SAX parsers work the same as the event handler in Java.
Advantages Of SAX Parser
Very simple to use and has good efficiency of memory.
Its runtime is too fast and it can work for a bigger document or file system.
Disadvantages Of SAX Parser
Its ability to understand APIs is too less than an event-based API.
We can’t know the full information because of a lot of pieces of data.
DOM Parser
DOM represents the Document Object model. When an object contains some information about XML
documents, is called DOM Parser. This looks like a tree structure. DOM API is implemented by a DOM
Parser, which is very easy and simple to use. It represents an XML Document into tree format in which
each element represents tree branches and creates an In Memory tree representation of XML file and
then parses it more memory is required for this.
Features Of DOM Parser
The internal structure can be created by DOM Parser.
Because of these internal structures, the client can get information about the original XML docs.
Advantages Of DOM Parser
DOM API is easy to use so that we can do both write and read operations.
When a document is required then it preferred a wide part that can be randomly accessed.
Disadvantages Of DOM Parser
Its efficiency of memory is not too good, it takes more memory cause XML docs needed to load in
there.
In comparison to the SAX parser, it is too slow.
Hence, conclusive differences between SAX Parser and DOM Parser in Java is as follows
It is called a Simple API for XML Parsing. It is called as Document Object Model.
SAX Parser is slower than DOM Parser. DOM Parser is faster than SAX Parser.
Best for the larger sizes of files. Best for the smaller size of files.
It is suitable for making XML files in Java. It is not good at making XML files in low
memory.
The internal structure can not be created The internal structure can be created by
by SAX Parser. DOM Parser.
In the SAX parser backward navigation is In DOM parser backward and forward
not possible. search is possible