Unit 1 - Adbms - 2
Unit 1 - Adbms - 2
DTD
XML Schema
XML Documents, DTD, and XML Schema
• Two types of XML
• Well-Formed XML
• Valid XML
XML Documents, DTD, and XML Schema
• Well-Formed XML
• It must start with an XML declaration to indicate the version of XML being
used—as well as any other relevant attributes.
• It must follow the syntactic guidelines of the tree model.
• This means that there should be a single root element, and every element must include
a matching pair of start tag and end tag within the start and end tags of the parent
element.
XML Documents, DTD, and XML Schema
• Well-Formed XML (contd.)
• A well-formed XML document is syntactically correct
• This allows it to be processed by generic processors that traverse the document and
create an internal tree representation.
• DOM (Document Object Model) - Allows programs to manipulate the resulting tree
representation corresponding to a well-formed XML document. The whole document must be
parsed beforehand when using dom.
• SAX - Allows processing of XML documents on the fly by notifying the processing program
whenever a start or end tag is encountered.
XML Documents, DTD, and XML Schema
• Valid XML
• A stronger criterion is for an XML document to be valid.
• In this case, the document must be well-formed, and in addition the element
names used in the start and end tag pairs must follow the structure specified
in a separate XML DTD (Document Type Definition) file or XML schema file.
XML Documents, DTD, and XML Schema
(contd.)
An XML DTD file called projects
XML Documents, DTD, and XML Schema
(contd.)
• XML DTD Notation
• A * following the element name means that the element can be repeated zero or more
times in the document. This can be called an optional multivalued (repeating) element.
• A + following the element name means that the element can be repeated one or more
times in the document. This can be called a required multivalued (repeating) element.
• A ? following the element name means that the element can be repeated zero or one
times. This can be called an optional single-valued (non-repeating) element.
• An element appearing without any of the preceding three symbols must appear exactly
once in the document. This can be called an required single-valued (non-repeating)
element.
XML Documents, DTD, and XML Schema
• XML DTD Notation (contd.)
• The type of the element is specified via parentheses following the element.
• If the parentheses include names of other elements, these would be the children of the
element in the tree structure.
• If the parentheses include the keyword #PCDATA or one of the other data types available
in XML DTD, the element is a leaf node. PCDATA stands for parsed character data, which
is roughly similar to a string data type.
• Parentheses can be nested when specifying elements.
• A bar symbol ( e1 | e2 ) specifies that either e1 or e2 can appear in the document.
XML Documents, DTD, and XML Schema
(contd.)
• Limitations of XML DTD
• First, the data types in DTD are not very general.
• Second, DTD has its own special syntax and so it requires specialized
processors.
• It would be advantageous to specify XML schema documents using the syntax rules of
XML itself so that the same processors for XML documents can process XML schema
descriptions.
• Third, all DTD elements are always forced to follow the specified ordering the
document so unordered elements are not permitted.
XML Documents, DTD, and XML Schema
(contd.)
An XML schema file called company
XML Documents, DTD, and XML Schema (contd.)