11 Complex Data and XML
11 Complex Data and XML
1
Week 10 Review
• Trees and the places where tree structures are used in
computing and knowledge representation
• Nomenclature in trees
– Tree-based – root, node, branch, leaf
– Family-based – parent, child, sibling
• The Document Object Model as a tree
– DOM access in Javascript
• An example of DOM manipulation
– DHTML – accessing and updating the DOM with Javascript
– AJAX -
– server communication from JavaScript
2
Tree and questions
• Look at the tree in the diagram
• Answer the following questions:
– Name the root of the tree
– Name the leaves of the tree
– Using path notation in Windows or Unix
• What is the location of J relative to the root?
• What is the location of the root if you are at J?
• What is the location of K if you are at J?
3
A
R C
B G L J
D K M N
4
The rationale for XML
• Complex data
– StudentsOnline
– Modules and Courses
• Export and import – ‘self-documenting’
data
– Flickr
– TransXChange
• XML-based tools
– XForms
5
Year 1 Oracle example
• Students on year 1 Database module
modelled a much bigger data model
concerning modules, students, staff and
courses.
• Oracle SQL DDL was reverse-engineered
to QSEE, then the code regenerated to
MySQL, installed and executed.
• Query interface using PHP
6
department
deptName:VARCHAR
phone:VARCHAR
faxno:VARCHAR
location:VARCHAR
mgr_startDate:DATE
worksin
m gr
staff course
staffNo:NUMERIC cCode:VARCHAR
fName:VARCHAR leader title:VARCHAR
lName:VARCHAR duration:VARCHAR
address:VARCHAR
phone:VARCHAR
officeNo:VARCHAR
qualification sex:VARCHAR
qualification:VARCHAR salary:NUMERIC
post:VARCHAR
computerId:VARCHAR
student
matricNo:NUMERIC
Weak teaches
fName:VARCHAR
lName:VARCHAR
entity hours:NUMERIC
module
mCode:VARCHAR town:VARCHAR
title:VARCHAR street:VARCHAR
startDate:DATE postcode:VARCHAR
endDate:DATE dob:DATE
sex:VARCHAR
Foreign
coursework:VARCHAR
exam:VARCHAR loan:NUMERIC key
text
identifies
text:VARCHAR
undertake nextOfKin
performance:VARCHAR name:VARCHAR
phone:VARCHAR
relationship:VARCHAR
7
Lessons
• Problems with data types and data formats
• ER model as Conceptual Model
• Foreign keys in QSEE
• Many of the entities are ‘weak’
8
Data types
• Data types differ between the two
databases – QSEE uses a more general
set of datatypes and then outputs DDL
dependant on the target.
– Data typing is a big problem in databases
• Data types too small for the data stored, so data
might be lost
• Date formats differ 2006-12-08 in MySQL, 08-dec-
2006 in Oracle
9
Conceptual ER model
• No foreign keys in the ER model.
– foreign keys are the realisation (implementation) of the
relationships defined in the ER Model in a RDBMS
10
Foreign Keys in QSEE
11
Foreign keys in RDBMS
• In MySQL, action depends on which file model is
being used
– ISAM (Index Sequential access Method) does not
enforce foreign key integrity
– INNODB does
• Loading data is a problem
– Need to load data in the right order to prevent foreign
key errors – roughly parent before child
– Not possible if any cycles, so RDBMS allows it to be
turned off for the loading
– Special programs for bulk loading from files such as
CSV– easier than using INSERT statements
12
Questions about the model
• What foreign keys would be added to the
Course table in an RDBMS? To the
Teaches table?
• Explain the optionality on the mgr
relationship between staff and
department? Is it on the right end?
• Which two tables would be impossible to
put in order to ensure foreign key
constraints are not violated?
13
Non-first normal form
• First normal form specifies that all fields must be atomic
– single valued
• Look at the qualification table. The bar indicates that it is
dependant on staff. If repeated fields were allowed in
tables then we could simply add:
qualifications : set Varchar
• to the staff table
• We would need to be able to search for individual
qualifications
• Tags in the photo album site are a similar problem.
• Which other table could be removed if non-first normal
form tables were allowed?
• MySQL has sets and operators to add, remove and
search (but these all add complexity)
14
Harder cases
• What additional structure would be
required to get rid of the nextOfKin table?
• How could we get rid of the ‘teaches’ and
‘undertakes’ tables? How would this data
be handled in an Object-oriented model?
• If all the weak entities were removed, how
much would this simplify the model?
15
• NextOfKin
– Requires a sub-record of name, phone and
relationship which can be repeated
• Teaches
– Decide which side this data belongs to and
add a set of sub-records
1, 'David', 'Johnson', '109 Mount View Road, Kingswood, Bristol BS15 8UB',
'0117 9676505', '3A31', 'Male', 32000, 'Professor', 'dpjohnso', 'Applied Sciences'
2, 'Carrie', 'Ford', '15 Heron Way, Chipping Sodbury, Bristol BS37 6NT',
'01454 854089', '1C29', 'Female', 29000, 'Senior Lecturer', 'caford',
'Bristol Business School' 18
'BUWE B80 GF78', 'PgDIT', '3 Years', 13, 'CIS'
'BUWE B80 M221', 'Commercial Law', '3 Years', 7, 'Law'
'BUWE B80 B230', 'Pharmaceutical Sciences', '4 Years', 1, 'Applied Sciences'
'BUWE B80 G500', 'Business Decision Analysis', '4 Years', 9, 'Bristol Business School'
'BUWE B80 X300', 'Education Studies', '3 Years', 5, 'Education'
'BUWE B80 G451', 'Multimedia Computing', '4 Years', 10, 'Computing, Engineering and Mathema
'BUWE B80 L300', 'Sociology', '3 Years', 6, 'Humanities, Languages and Social Sciences'
20
Staff as XML Optional
processing
instruction
<?xml version="1.0" encoding="UTF-8"?>
<StaffList>
<Staff staffNo="1"> Root node
<fName>David</fName>
<lName>Johnson</lName>
<address>109 Mount View Road, Kingswood, Bristol BS15 8UB</address>
<phone>0117 9676505</phone>
<officeNo>3A31</officeNo> element
<sex>Male</sex>
<salary>32000</salary>
<post>Professor</post> Element with
<computerid>dpjohnso</computerid> children
<department>Applied Sciences</department>
<teaching>
<teaches module="UGDFNP-20-1" hours="6"/>
<teaches module="UGDF58-20-1" hours="7"/>
</teaching>
<qualification>PhD</qualification> value
</Staff> attribute
</StaffList> 21
Processes required for XML
• Store XML (native XML database)
• Query XML (XQuery)
• Transform XML into other XML, RDF or
text (XQuery, XSLT)
• Transform structured text to XML (?)
• Define deeper semantics for an XML file
(XSchema, RelaxNG)
• User Interface (XForms)
22
Applications of XML
• XHTML – HTML as well-formed XML
which conforms to standard
• SVG – Structured Vector Graphics
• kml – GoogleEarth overlay files
• XForms – UI definition
• Flickr API responses
• RSS news syndication
• ….
23
Well-formed XML
• ‘Well-formed’ – absolute property of an XML
document – must obey rules for well-formedness
• Check by
– Opening in Word 2003
– Submitting to an online validator
– Check using an XML tool
• XML Spy
• Stylus Studio
• Oxygen
• Check out the rules in the Wikipedia entry
24
Well-formed XML
25
Well-formed XML documents
Element names are case sensitive - <NAME>, <name>, <Name> & <NaMe>
are four different element types.
No white spaces in element name - <First Name> not allowed; <First_Name>
OK.
Element names cannot start with the letters “XML” or “xml” – reserved terms.
Element names must start with a letter or a underscore. Element names
cannot start with a number but numbers may be embedded within an element
name - <2you> not allowed; <me2you> is OK.
Attribute names are constrained by the above rules for element names.
Entity references are used to substitute specific characters. There are five
predefined entities built into XML:
Entity Char Notes
& & Do not use inside processing instructions
< < Use inside attribute values quoted with “.
> > Use after ]] in normal text and inside processing instruction.
" “ Use inside attribute values quoted with “.
' ‘ Use inside attribute values quoted with ‘.
26
Spot the mistakes
<?xml version="1.0" encoding="UTF-8">
<StaffList>
<Staff staffNo="1“ temporary>
<fName>Davd</fName>
<lName>Johnson</lname>
<address>109 Mount View Road, Kingswood, Bristol BS15 8UB</addess>
<phone>0117 9676505</phone>
<officeNo>3A31</officeNo>
<sex=Male/>
<salary>32000</SALARY>
<POST>Profesor</POST>
<computerid>dpjohnso</computerid>
<department>Applied Sciences</department>
<teaching>
<teaches module=UGDFNP-20-1 hours="6"/>
<teaches module="UGDF58-20-1" hours="7">
</teaching>
<qualification>PhD</qualification>
</StaffList> 27
‘Valid’
• Validity is relative - valid with respect to a
specified schema which defines:
– The sequence and nesting of elements
– The allowable attributes of an element
– Whether elements are optional or repeated
– The type of data in an element or attribute
• Different languages for expressing these rules
– DTD – Document Type Definition
– XML Schema (xsd) – itself an XML vocabulary
– Relax NG
– Schematron – permissive
• Document can be linked to a specific schema
– Too inflexible
28
Tutorial
• Review the exercises from this lecture
• Take the module specification and work out how many tables would
be required to hold this data
• Take one of the TransXChange files and try to develop a data
model.
– Use a browser to view the file, don’t try to print it out.
– Use ‘find’ to follow references.
• Later
– Read the wikipedia article on XML
– Work through the w3.schools tutorial
– XML dialects have developed in many fields of knowledge. Investigate
the XML dialects which have been created in one particular area of
interest e.g. Multimedia, Family History
29
Next Week
• Normalisation Exercise
• Processing Language
• Prize for the best Photo Album as judged
by you and us.
30