0% found this document useful (0 votes)
49 views

12 XML

This document provides an overview of XML and compares it to HTML and relational databases. It discusses: - XML was designed to describe data, not format it like HTML. XML uses tags to describe information and a separate stylesheet defines presentation. - XML provides a common framework for structuring information and allows custom markup for any domain. A schema defines the structure. - While HTML focuses on display, XML focuses on describing data. XML can exchange data over the web more flexibly than traditional databases or EDI.

Uploaded by

basela2010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

12 XML

This document provides an overview of XML and compares it to HTML and relational databases. It discusses: - XML was designed to describe data, not format it like HTML. XML uses tags to describe information and a separate stylesheet defines presentation. - XML provides a common framework for structuring information and allows custom markup for any domain. A schema defines the structure. - While HTML focuses on display, XML focuses on describing data. XML can exchange data over the web more flexibly than traditional databases or EDI.

Uploaded by

basela2010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

XML: An introduction

Overview

 HTML Model
 XML Model
 RDBMS vs. XML
 XML Schema
 XML Tools
 …

2
What is a Data Model?
 Structure of Data
 Mathematical representation of data.
 Examples: relational model = tables; semi-
structured model = trees/graphs.
 Operations on data.
 Constraints.
History of HTML
 HTML: Hyper-Text Markup Language
 Invented by Tim Berners-Lee and Robert Caillau at CERN
in 1991
 What is hyper-text?
 A document that contains links to other documents (and
text, sound, images...)
 Invented around 1945 by Vannevar Bush
 What is a markup language?
 A notation for writing text with markup tags (<tag>)
 Tags indicate the structure of the text
 Tags have names and attributes
 Tags may enclose a part of the text
 Invented around 1970 by Charles F. Goldfarb (SGML)

4
HTML
 HTML was designed to display data and to focus
on how data looks.

5
XML
 XML is a framework for defining markup languages:
 XML was designed to describe data, not format.
 There is no fixed collection of markup tags. One must
define your own tags, tailored for our kind of information
 Allow tailor-made markup for any imaginable application
domain
 XML uses a schema language (eg, DTD, XML-Schema) to
formally describe the data.
 XML separates syntax from semantics to provide a
common framework for structuring information
 Web browser rendering semantics is separately defined by
stylesheets

6
XML
 XML is not a replacement for HTML:
 HTML should ideally be just another XML
language
 in fact, XHTML is just that
 XHTML is a (very popular) XML language for
hypertext markup

 HTML is about displaying information, but


XML is about describing information.

7
HTML vs. XML
HTML XML

<center> <event eID=“sigmod02”>


<h1>SIGMOD</h1> <acronym>SIGMOD</acronym>
<society>ACM</society>
<p><b>
<url>www.sigmod02.org</url>
<u>ACM</u> <a <loc>
href=“sigmod02.org”>SIG <city>Madison</city>
MOD Conference</a>, <state>WI</state>
Madison, WI, 2002 </loc>
</b></p> <year>2002</year>
</center> </event>

8
HTML vs. XML
 Need a stylesheet to define browser
presentation semantics

HTML XML CSS/XSL

Browser Browser

9
HTML vs. XML
 Need a stylesheet to define browser
presentation semantics

Data/Format Data Format

Browser Browser

10
Database Perspective
 DB must support
 Capture
 Storage
 Retrieval
 Exchange

 XML originally as the language to exchange


data over Web
 Replacing EDI (Electronic Data Interchange)

11
RDBMS vs. XML Example
 AFV receives 100+ videos every week
 Build a DB to be able to answer the following queries:
 Who sent which videos?

 Show me all videos about Cat category

 How many videos in a database since Jan 1, 2003?

 Which is the video with the best rating for the 1st week ofJan?

 Where does the sender James live? Phone? Gender?

 How many videos does James send so far?

 Show me all the ghost videos (ones without sender information)

12
ER Model
Category

VID Videos Date

Rating

Sends

Phone Name
Owners
Gender Address

13
ER => RDBMS
Category Videos

VID Videos Date

Rating
Sends

Sends

Phone Name Owners


Owners
Gender Address

14
RDBMS
VID Category Date Rating Video
100 Comedy 2005/1/1 5
200 Action 2005/1/10 4
300 SF 2004/12/31 5

Sends Name Phone VID


Jenny 564-3456 100
Tom 123-4567 200
Tom 123-4567 300

Owners Name Phone Address Gender


Jenny 564-3456 1050 Harvard F
Tom 123-4567 132 W 15th M

15
Changes #1
 VHS video => VHS, CD, DVD
 100+ videos => 1 million videos

16
RDBMS
VID Category Format Date Rating Video
100 Comedy VHS 2005/1/1 5

1000000 SF DVD 2004/12/31 5

Sends Name Phone VID


Jenny 564-3456 100


Tom 123-4567 1000000

Owners Name Phone Address Gender


Jenny 564-3456 1050 Harvard F
Tom 123-4567 132 W 15th M


17
Changes #2
 Arbitrary name formats for owners
 Eg, J. Doe vs. Dr. “Jonny” John Jay Doe Jr
 100+ different ways to capture owners’
information
 “1781 Louisiana St #200, Lawrence, KS, 66046”
 adr1=“1781 Louisiana St #200”, adr2=“Lawrence,
KS, 66046”
 street=“1781 Louisiana St #200”, city=“Lawrence”,
state=“KS”, zip=“66046”
 100+ different video formats with varying
properties => 1000+ attributes for Videos
18
RDBMS: Finest Granularity
Video VID Category Date Rating Att1 Att2 … att1000
100 Comedy 2005/1/1 5 10 … T1
200 Action 2005/1/10 4 20 …
300 SF 2004/12/31 5 … S20

Sends Prefix NN FN MN LN Suffix Phone VID


Jenny 564-3456 100
Dr. Jonny John Jay Doe Jr 123-4567 200
Dr. Jonny John Jay Doe Jr 123-4567 300
Owners
Prefix NN FN MN LN Suffix Phone Street City State Zip Gender

Jenny 564-3456 1050 CO 66049 F


Denver
Harvard
Dr. Jonny John Jay Doe Jr 123-4567 10 S. CO M
Beaver

19
RDBMS: Coarsest Granularity
Video VID Category Date Rating Att1to1000
100 Comedy 2005/1/1 5 10, T1
200 Action 2005/1/10 4 20
300 SF 2004/12/31 5 S20

Sends Name Phone VID


Jenny 564-3456 100
Dr. “Jonny” John 123-4567 200
Jay Doe Jr
Dr. “Jonny” John 123-4567 300
Jay Doe Jr

Owners Name Phone Address Gender


Jenny 564-3456 1050 Harvard, Denver,CO F
66049
Dr. “Jonny” John 123-4567 1322 W 15th, CO M
Jay Doe Jr

20
RDBMS: Ideal Case
Video VID Category Date Rating Att1 Att2 … att1000
100 Comedy 2005/1/1 5 10 … T1
200 Action 2005/1/10 4 20 …
300 SF 2004/12/31 5 … S20

Sends Name Phone VID


Jenny 564-3456 100
Violation
Of 1NF Dr. Jonny John Jay Doe Jr 123-4567 200
Dr. Jonny John Jay Doe Jr 123-4567 300
Owners
Name Phone Address Gender
Jenny 564-3456 1050 Denver CO 66049 F
Harvard
Dr. Jonny John Jay Doe 123-4567 132 W 15th KS M
Jr

21
XML
Video VID Category Date Rating Att1 Att2 … att1000
100 Comedy 2005/1/1 5 10 … T1
200 Action 2005/1/10 4 20 …
300 SF 2004/12/31 5 … S20

<VideoTable>
<Video vid=“100” category=“comedy” date=“2005/1/1”
rating=“5” att2=“10” att1000=“T1” />
<Video vid=“200” category=“action” date=“2005/1/10”
rating=“4” att1=“20” />
<Video vid=“300” category=“SF” date=“2004/12/31”
rating=“5” att1000=“S20” />
</VideoTable>

22
Address Address

Adr1 Adr2
XML Street City State Zip

Owners State

Name Phone Address Gender


Jenny 564-3456 1050 Lawrence KS 66049 F
Harvard
Dr. Jonny John Jay Doe Jr 123-4567 132 W 15th KS M

<OwnerTable>
<Owner phone=“564-3456” gender=“F”>
<Name FN=“Jenny” />
<Address>
<Street>1050 Harvard</Street><City> Denver</City>
<State>CO</State><Zip>66049</Zip>
</Address>
</Owner>
<Owner phone=“123-4567” gender=“M”>
<Name Prefix=“Dr.” NN=“Jonny” FN=“John” MN=“Jay” LN=“Doe” Suffix=“Jr.” />
<Address> <1322 W 15th</Adr1> <Adr2><State>CO</State></Adr2> </Addres>

</Owner>
</OwnerTable>
23
RDBMS vs. XML
RDBMS XML

 Structured model  Unstructured or Semi-


structured model
 Large scale data
 Small to Medium scale data
 Limited semantics
 Document vs. data

 Flexible and rich semantics


 Focus: how to handle large
size data efficiently?
 Focus: how can one handle
large number of small size
data with various formats
efficiently?

24
A conceptual view of XML
 An XML document is an ordered, labeled
tree
 Character data leaf nodes contain the actual
data (text strings)
 Elements nodes are each labeled with
 a name (often called the element type), and
 a set of attributes, each consisting of a name
and a value,
 and can have child nodes

25
A concrete view of XML
 An XML document is a text with markup tags
and other meta-information.
 Markup tags denote elements:
..<foo attr="val" ...>bar</foo>...
| | | |
| | | a matching element end tag
| | the contents “bar” of the element
| an attribute with name attr and value val, enclosed by ' or "
an element start tag with name foo
 An XML document must be well-formed:
 start and end tags must match
 element tags must be properly nested
26
Example of XML document
<?xml version="1.0"?>
<note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading>
<body>Don't forget me this weekend!</body> </note>

 The XML declaration (1st line) should always


be included; root element is <note>
 In XML all elements must have a closing tag:
 <p>foo<p>bar => <p>foo</p><p>bar</p>
 XML tags are case sensitive:
 <Message>This is incorrect</message>
 Must be properly nested within each other:
 <b><i>This is incorrect</b></i>
27
Element vs. Attribute
 The same information can be captured by
either Element or Attribute in XML
<event eID=“sigmod02”> <event
<acronym>SIGMOD</acronym> eID=“sigmod02”
<society>ACM</society> acronym=“SIGMOD”
<url>www.sigmod02.org</url> society=“ACM
<loc> url=“www.sigmod02.org”
<city>Madison</city> city=“Madison”
state=“WI”
<state>WI</state>
</loc> year=“2002”/>
<year>2002</year>
</event>

28
Applications of XML
 XML is a meta-language to create another
languages; the main application of XML is
making new languages
 XHTML: W3C's XMLization of HTML 4.0.

<?xml version="1.0" encoding="UTF-8"?>


<html xmlns=http://www.w3.org/1999/xhtml xml:lang="en">
<head><title>Hello world!</title></head>
<body><p>foobar</p></body>
</html>

29
Applications of XML (cont.)
 CML: Chemical Markup Language
<molecule id="METHANOL">
<atomArray>
<stringArray builtin="elementType">C O H H H H</stringArray>
<floatArray builtin="x3" units="pm"> -0.748 0.558 -1.293 -1.263
-0.699 0.716 </floatArray>
</atomArray>
</molecule>

 There are +1000 new markup languages


made by XML (eg, www.schema.net)
30
Why is XML important?
 Technically, … Nothing; Just old simple tree
model…
 Non-technically, …
 Hot ($$$)
 The standard for representation of Web
information
 The real force of XML is generic languages and
tools!
 By building on XML, you get a massive (standard)
infrastructure for free

31
32
Conclusion
 XML is an important language that one
should learn
 Plenty of research issues for Database
Researchers
 XML query language issue
 Conversion issue btw XML and other (eg,
relational) models
 Storage issue for native XML database
 Novel indexing issue
 System design and implementation issue

33
Further References
 World-Wide Web Consortium:
 www.w3c.org/xml/

 XML Cover Page: www.oasis-open.org/cover/


 XML Articles: www.xml.com
 Latest XML News: www.xmlhack.com
 XML Tutorial:
 www.w3schools.com/xml/default.asp

 www.brics.dk/~amoeller/XML

 www.xml101.com/xml/default.asp

 XML WIKI:
 en.wikibooks.org/wiki/XML

 Slides prepared by Dr. Bo Luo. 34


<End hasQuestion=“Yes”>
<Thanks/>
</End>

35

You might also like