XML Introduction
XML Introduction
I.
Historical glance:
1.
SGML: Very powerful but so complicated and expensive.
2.
HTML: Easy but not so powerful.
3.
XML: Very powerful and very easy.
It is produced by "W3C" World Wide Web Consortium at the end of nineties.
It is the abbreviation of eXtensible Markup Language.
II. Definition:
XML is a Meta markup language; it allows creating any custom markup language.
III. W3C Goals for XML:
1. XML must be straight forward and usable over the internet.
2. XML must support a wide variety of applications (software).
3. XML must be compatible with SGML (Standard Generalized Markup Language).
4. XML must be easy to write programs that process XML documents (parser).
5. XML minimize the number of optional features (example: closing tag).
6. XML documents must have good readability (clear).
7. XML design must be prepared quickly.
8. XML must be formal (small number of rules that are exactly followed).
9. XML document must be easy to create (using Notepad).
10. XML has no shortcuts (against SGML to be easier "Freaking").
IV. Some rules used in XML documents:
1.
Elements:
A tag is called an element (node):
There are several types of elements:
o Root element contains all the other elements. Each XML documents must contain at
least one element: the root.
o Child element that are contained into another element.
o Parent element that contains other child elements.
o Elements that are on the same level are called sisters nodes.
o Text elements contains only text and do not contain other elements must have tags.
2.
Closing tags:
Every tag or element must be closed.
Example: <tag name>text</tag name>.
N.B: if an element is empty i.e. has no text and no child element it can be closed directly.
Example: <tag name attribute name = "value" />.
3.
Proper nesting:
Children elements must be closed before parent elements. Examples:
<a>
<a>
<b>
<b>
</b>
</a>
</a> this example is proper nested
</b> this is not proper nesting
4.
Naming:
Names can begin with underscore "_", letter or colon ":"
Names can contain any number of letters, numbers, underscores, hyphens (dashes) "-",
dots "." or colons ":".
N.B: XML is case sensitive
i.e.: <A> this tag is different than this one <a>.
5.
6.
Comments:
We write comments using the form <!-- comments here -- >.
It does not need closing tag.
N.B: It should not end with three dashes ---> this is false.
7.
8.
Elements can have unlimited number of attributes. The order of attributes is not
important.<el a=ab b=cd> same as .<el b=cd a=ab>
9.
Special symbols:
There are five special symbols that use their predefined entities instead of them:
<
<
>
>
"
"
'
'
&
&
DTD
Note: an element can contain:
- Other elements (child elements).
- Text.
- Mixed content (Other elements and text).
- Nothing (EMPTY without parenthesis).
- Any content: not validated (ANY without parenthesis)
I.
II.
External DTD:
When an XML document requires an external DTD the version declaration in the XML
document became <? xml version="1.0" standalone="no"?>.
Declaration of the external DTD in the XML document:
1.
To call a local DTD we write:
<!DOCTYPE rootName SYSTEM "here we put the URL (address)"
2.
To call a DTD from internet we write:
<!DOCTYPE rootName PUBLIC "here the public identifier" "here the URL".
The public identifier consists of 4 sections separated by "//".
a. The first section is a "-" if it is not registered, or a "+" if it is registered and
unique.
b. The second section is the organization name.
c. The third is the document format and the document name separated by a space.
d. The last section is the required language.
Example of a public identifier and a URL:
"-//wrox//TEXT booklist//EN" "http://www.wrox.com".