0% found this document useful (0 votes)
34 views

11 Complex Data and XML

The document discusses complex data modeling and the benefits of using XML, including handling complex data structures, facilitating data export and import through self-documenting formats, and utilizing XML-based tools. It provides an example ER model for a university database and discusses challenges with data types, foreign keys, and normalization of some tables into non-first normal form to reduce redundancy.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

11 Complex Data and XML

The document discusses complex data modeling and the benefits of using XML, including handling complex data structures, facilitating data export and import through self-documenting formats, and utilizing XML-based tools. It provides an example ER model for a university database and discusses challenges with data types, foreign keys, and normalization of some tables into non-first normal form to reduce redundancy.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

Complex Data and XML

1
Week 10 Review
• Trees and the places where tree structures are used in
computing and knowledge representation
• Nomenclature in trees
– Tree-based – root, node, branch, leaf
– Family-based – parent, child, sibling
• The Document Object Model as a tree
– DOM access in Javascript
• An example of DOM manipulation
– DHTML – accessing and updating the DOM with Javascript
– AJAX -
– server communication from JavaScript

2
Tree and questions
• Look at the tree in the diagram
• Answer the following questions:
– Name the root of the tree
– Name the leaves of the tree
– Using path notation in Windows or Unix
• What is the location of J relative to the root?
• What is the location of the root if you are at J?
• What is the location of K if you are at J?

3
A

R C

B G L J

D K M N

4
The rationale for XML
• Complex data
– StudentsOnline
– Modules and Courses
• Export and import – ‘self-documenting’
data
– Flickr
– TransXChange
• XML-based tools
– XForms
5
Year 1 Oracle example
• Students on year 1 Database module
modelled a much bigger data model
concerning modules, students, staff and
courses.
• Oracle SQL DDL was reverse-engineered
to QSEE, then the code regenerated to
MySQL, installed and executed.
• Query interface using PHP

6
department
deptName:VARCHAR
phone:VARCHAR
faxno:VARCHAR
location:VARCHAR
mgr_startDate:DATE

worksin

m gr

staff course
staffNo:NUMERIC cCode:VARCHAR
fName:VARCHAR leader title:VARCHAR
lName:VARCHAR duration:VARCHAR
address:VARCHAR
phone:VARCHAR
officeNo:VARCHAR
qualification sex:VARCHAR
qualification:VARCHAR salary:NUMERIC
post:VARCHAR
computerId:VARCHAR

student
matricNo:NUMERIC
Weak teaches
fName:VARCHAR
lName:VARCHAR
entity hours:NUMERIC
module
mCode:VARCHAR town:VARCHAR
title:VARCHAR street:VARCHAR
startDate:DATE postcode:VARCHAR
endDate:DATE dob:DATE
sex:VARCHAR
Foreign
coursework:VARCHAR
exam:VARCHAR loan:NUMERIC key
text
identifies
text:VARCHAR

undertake nextOfKin
performance:VARCHAR name:VARCHAR
phone:VARCHAR
relationship:VARCHAR
7
Lessons
• Problems with data types and data formats
• ER model as Conceptual Model
• Foreign keys in QSEE
• Many of the entities are ‘weak’

8
Data types
• Data types differ between the two
databases – QSEE uses a more general
set of datatypes and then outputs DDL
dependant on the target.
– Data typing is a big problem in databases
• Data types too small for the data stored, so data
might be lost
• Date formats differ 2006-12-08 in MySQL, 08-dec-
2006 in Oracle

9
Conceptual ER model
• No foreign keys in the ER model.
– foreign keys are the realisation (implementation) of the
relationships defined in the ER Model in a RDBMS

• Case tool can then


– removes redundancy - there is no point in defining both the
relationship AND the foreign keys [one thing in one place]
– it emphasises the conceptual nature of the model, which may be
implemented in other ways
– the data type of the primary key and all foreign keys must be the
same - but you may need to change it so better to change it in
one place and regenerate the schema
– all the necessary integrity constraints are also generated

10
Foreign Keys in QSEE

• The bar across a relationship indicates that the foreign key is to be


part of the primary key.
– As well as creating a foreign key, the primary key constraint will be
generated appropriately.
• A dotted boundary indicates that the whole primary key is made up
of foreign keys
• Any entity in which a foreign key is part of the primary key is 'weak' -
it depends for its existence on another entity.
• Foreign key naming
– default - fk1_ , fk2_ prefix
– My policy is to
• remove the prefix globally
• set the prefix on specific relationships where required (manager, leader)

11
Foreign keys in RDBMS
• In MySQL, action depends on which file model is
being used
– ISAM (Index Sequential access Method) does not
enforce foreign key integrity
– INNODB does
• Loading data is a problem
– Need to load data in the right order to prevent foreign
key errors – roughly parent before child
– Not possible if any cycles, so RDBMS allows it to be
turned off for the loading
– Special programs for bulk loading from files such as
CSV– easier than using INSERT statements

12
Questions about the model
• What foreign keys would be added to the
Course table in an RDBMS? To the
Teaches table?
• Explain the optionality on the mgr
relationship between staff and
department? Is it on the right end?
• Which two tables would be impossible to
put in order to ensure foreign key
constraints are not violated?
13
Non-first normal form
• First normal form specifies that all fields must be atomic
– single valued
• Look at the qualification table. The bar indicates that it is
dependant on staff. If repeated fields were allowed in
tables then we could simply add:
qualifications : set Varchar
• to the staff table
• We would need to be able to search for individual
qualifications
• Tags in the photo album site are a similar problem.
• Which other table could be removed if non-first normal
form tables were allowed?
• MySQL has sets and operators to add, remove and
search (but these all add complexity)
14
Harder cases
• What additional structure would be
required to get rid of the nextOfKin table?
• How could we get rid of the ‘teaches’ and
‘undertakes’ tables? How would this data
be handled in an Object-oriented model?
• If all the weak entities were removed, how
much would this simplify the model?

15
• NextOfKin
– Requires a sub-record of name, phone and
relationship which can be repeated
• Teaches
– Decide which side this data belongs to and
add a set of sub-records

• Which side would you place the data from


teaches and undertakes?
16
Complex data

• Real data is much more complex than this


• ISIS the student record system has over 200
tables, many them like the 5 weak tables in the
simpler model – so the same ratio might be
assumed to apply
• Consider an actual module specification
• Note the number of repeated sub-records?
• How many tables would be required to model
this data?
17
Communication
• Applications which stand alone and cannot interchange
data with other applications are called ‘silos’
• An RBMS must export data as files of INSERT
statements or as a comma separated variable file
• In both cases, you can’t make any sense of the data
without knowing the Database schema
INSERT INTO teaches VALUES ( 11, 'UFCEMQ-20-1', 5);
INSERT INTO teaches VALUES ( 11, 'UFEEHV-20-1', 5);
INSERT INTO teaches VALUES ( 13, 'UFQEGC-20-1', 6);
INSERT INTO teaches VALUES ( 8, 'UFCEKN-20-1', 10);

1, 'David', 'Johnson', '109 Mount View Road, Kingswood, Bristol BS15 8UB',
'0117 9676505', '3A31', 'Male', 32000, 'Professor', 'dpjohnso', 'Applied Sciences'
2, 'Carrie', 'Ford', '15 Heron Way, Chipping Sodbury, Bristol BS37 6NT',
'01454 854089', '1C29', 'Female', 29000, 'Senior Lecturer', 'caford',
'Bristol Business School' 18
'BUWE B80 GF78', 'PgDIT', '3 Years', 13, 'CIS'
'BUWE B80 M221', 'Commercial Law', '3 Years', 7, 'Law'
'BUWE B80 B230', 'Pharmaceutical Sciences', '4 Years', 1, 'Applied Sciences'
'BUWE B80 G500', 'Business Decision Analysis', '4 Years', 9, 'Bristol Business School'
'BUWE B80 X300', 'Education Studies', '3 Years', 5, 'Education'
'BUWE B80 G451', 'Multimedia Computing', '4 Years', 10, 'Computing, Engineering and Mathema
'BUWE B80 L300', 'Sociology', '3 Years', 6, 'Humanities, Languages and Social Sciences'

CREATE TABLE course(


cCode VARCHAR(30) NOT NULL,
title VARCHAR(50),
duration VARCHAR(20),
deptName VARCHAR(50) NOT NULL,
leader_staffNo NUMERIC(4,0) NOT NULL,

-- Specify the PRIMARY KEY constraint for table "course".


-- This indicates which attribute(s) uniquely identify each row of data.
CONSTRAINT pk_course PRIMARY KEY (cCode)
) TYPE=INNODB; 19
XML
• XML is tagged data following some simple rules
on the formation and relationship between tags –
‘well-formed’
• XML ‘documents’ are trees and thus support
non-first normal form data
• The tags in the document provide –some-
information about the meaning of the data
• The tags and the common tree structure allow
general tools to do much of the processing
without a separate schema.

20
Staff as XML Optional
processing
instruction
<?xml version="1.0" encoding="UTF-8"?>
<StaffList>
<Staff staffNo="1"> Root node
<fName>David</fName>
<lName>Johnson</lName>
<address>109 Mount View Road, Kingswood, Bristol BS15 8UB</address>
<phone>0117 9676505</phone>
<officeNo>3A31</officeNo> element
<sex>Male</sex>
<salary>32000</salary>
<post>Professor</post> Element with
<computerid>dpjohnso</computerid> children
<department>Applied Sciences</department>
<teaching>
<teaches module="UGDFNP-20-1" hours="6"/>
<teaches module="UGDF58-20-1" hours="7"/>
</teaching>
<qualification>PhD</qualification> value
</Staff> attribute
</StaffList> 21
Processes required for XML
• Store XML (native XML database)
• Query XML (XQuery)
• Transform XML into other XML, RDF or
text (XQuery, XSLT)
• Transform structured text to XML (?)
• Define deeper semantics for an XML file
(XSchema, RelaxNG)
• User Interface (XForms)
22
Applications of XML
• XHTML – HTML as well-formed XML
which conforms to standard
• SVG – Structured Vector Graphics
• kml – GoogleEarth overlay files
• XForms – UI definition
• Flickr API responses
• RSS news syndication
• ….
23
Well-formed XML
• ‘Well-formed’ – absolute property of an XML
document – must obey rules for well-formedness
• Check by
– Opening in Word 2003
– Submitting to an online validator
– Check using an XML tool
• XML Spy
• Stylus Studio
• Oxygen
• Check out the rules in the Wikipedia entry
24
Well-formed XML

Every XML document must be well-formed and must therefore adhere to


the following rules (among others):

1. Every start-tag must have a matching end tag.


2. Elements may nest but must not overlap.
<name>Anna<em>Coffey</em></name> - √
<name><em>Anna</name>Coffey</em> - ×
3. There must be exactly one root element.
4. Attribute values must be quoted.
5. An element must not be quoted.
6. Comments and processing instructions may not appear inside tags.
7. No unescaped < or & signs may occur in the character data of an
element.

25
Well-formed XML documents
Element names are case sensitive - <NAME>, <name>, <Name> & <NaMe>
are four different element types.
No white spaces in element name - <First Name> not allowed; <First_Name>
OK.
Element names cannot start with the letters “XML” or “xml” – reserved terms.
Element names must start with a letter or a underscore. Element names
cannot start with a number but numbers may be embedded within an element
name - <2you> not allowed; <me2you> is OK.
Attribute names are constrained by the above rules for element names.
Entity references are used to substitute specific characters. There are five
predefined entities built into XML:
Entity Char Notes
&amp; & Do not use inside processing instructions
&lt; < Use inside attribute values quoted with “.
&gt; > Use after ]] in normal text and inside processing instruction.
&quot; “ Use inside attribute values quoted with “.
&apos; ‘ Use inside attribute values quoted with ‘.
26
Spot the mistakes
<?xml version="1.0" encoding="UTF-8">
<StaffList>
<Staff staffNo="1“ temporary>
<fName>Davd</fName>
<lName>Johnson</lname>
<address>109 Mount View Road, Kingswood, Bristol BS15 8UB</addess>
<phone>0117 9676505</phone>
<officeNo>3A31</officeNo>
<sex=Male/>
<salary>32000</SALARY>
<POST>Profesor</POST>
<computerid>dpjohnso</computerid>
<department>Applied Sciences</department>
<teaching>
<teaches module=UGDFNP-20-1 hours="6"/>
<teaches module="UGDF58-20-1" hours="7">
</teaching>
<qualification>PhD</qualification>
</StaffList> 27
‘Valid’
• Validity is relative - valid with respect to a
specified schema which defines:
– The sequence and nesting of elements
– The allowable attributes of an element
– Whether elements are optional or repeated
– The type of data in an element or attribute
• Different languages for expressing these rules
– DTD – Document Type Definition
– XML Schema (xsd) – itself an XML vocabulary
– Relax NG
– Schematron – permissive
• Document can be linked to a specific schema
– Too inflexible
28
Tutorial
• Review the exercises from this lecture
• Take the module specification and work out how many tables would
be required to hold this data
• Take one of the TransXChange files and try to develop a data
model.
– Use a browser to view the file, don’t try to print it out.
– Use ‘find’ to follow references.

• Later
– Read the wikipedia article on XML
– Work through the w3.schools tutorial
– XML dialects have developed in many fields of knowledge. Investigate
the XML dialects which have been created in one particular area of
interest e.g. Multimedia, Family History

29
Next Week
• Normalisation Exercise
• Processing Language
• Prize for the best Photo Album as judged
by you and us.

30

You might also like