document (2)
document (2)
Stephen Cranefield
Number 2001/04
February 2001
ISSN 1177-455X
University of Otago
The Department of Information Science is one of six departments that make up the
School of Business at the University of Otago. The department offers courses of study
leading to a major in Information Science within the BCom, BA and BSc degrees. In
addition to undergraduate teaching, the department is also strongly involved in post-
graduate research programmes leading to MCom, MA, MSc and PhD degrees. Re-
search projects in spatial information processing, connectionist-based information sys-
tems, software engineering and software development, information engineering and
database, software metrics, distributed information systems, multimedia information
systems and information systems security are particularly well supported.
The views expressed in this paper are not necessarily those of the department as a
whole. The accuracy of the information presented in this paper is the sole responsibil-
ity of the authors.
Copyright
Copyright remains with the authors. Permission to copy for research or teaching pur-
poses is granted on the condition that the authors and the Series are given due ac-
knowledgment. Reproduction in any form for purposes other than research or teach-
ing is forbidden unless prior written permission has been obtained from the authors.
Correspondence
This paper represents work to date and may not necessarily form the basis for the au-
thors’ final conclusions relating to this topic. It is likely, however, that the paper will ap-
pear in some form in a journal or in conference proceedings in the near future. The au-
thors would be pleased to receive correspondence in connection with any of the issues
raised in this paper, or for subsequent publication details. Please write directly to the
authors at the address provided below. (Details of final journal/conference publication
venues for these papers are also provided on the Department’s publications web pages:
http://www.otago.ac.nz/informationscience/pubs/). Any other correspondence con-
cerning the Series should be sent to the DPS Coordinator.
Abstract
This paper discusses technology to support the use of UML for rep-
resenting ontologies and domain knowledge in the Semantic Web. Two
mappings have been defined and implemented using XSLT to produce Java
classes and an RDF schema from an ontology represented as a UML class
diagram and encoded using XMI. A Java application can encode domain
knowledge as an object diagram realised as a network of instances of the
generated classes. Support is provided for marshalling and unmarshalling
this object-oriented knowledge to and from an RDF/XML serialisation.
1 Introduction
The growth of the World Wide Web and its integration into business and everyday
life has been phenomenal. The number of Web servers has grown from 26 in 1992
[1] to an estimated six million in 2000 [2]. However, this growth has meant that the
wealth of available information on any subject is swamped by an overwhelming
mass of irrelevant information, and the complexity and dynamic nature of the
current Web structure means it is only possible for people to find, index and make
effective use of a tiny fraction of the Web.
The solution to this problem is to let computer software relieve us of much
of the burden of locating resources on the Web that are relevant to our needs and
extracting, integrating and indexing the information contained within. To enable
this, the Web must evolve from its current status as a network of resources in-
tended for human comprehension. The Semantic Web concept [3] requires that
1
Web content be interpretable by both humans and machines. In particular, re-
sources on the Web need to be encoded in, or annotated with, structured machine-
readable descriptions of their contents. To this end, the World Wide Web Con-
sortium (W3C) has introduced the Extensible Markup Language (XML) and the
Resource Description Framework (RDF) as standard mechanisms for embedded
structured data and metadata in Web documents. These languages both have an
associated schema language (XML Schema and RDF Schema respectively) allow-
ing communities to define their own vocabularies for indicating document struc-
ture and describing its content.
However, the concept of the Semantic Web involves more than just commu-
nities sharing information after agreeing on common data and metadata formats.
The aim is to allow information to be shared across communities by providing the
ability to translate information between different representations. This requires
that the information in a Web resource be encoded as knowledge expressed using
terms or structures that have been explicitly defined in a domain ontology. Ontolo-
gies are, in essence, schemas that are expressed in a high-level modelling language
suitable for modelling the concepts in the domain, the relationships between them,
and any logical constraints on their interpretation. They are not concerned with
the physical format of documents but focus on the conceptual content.
Currently, there is a lot of research effort underway to develop ontology rep-
resentation languages compatible with World Wide Web standards, particularly
in the Ontology Inference Layer (OIL [4]) and DARPA Agent Markup Language
(DAML [5]) projects. Derived from frame-based representation languages from
the artificial intelligence knowledge representation community, OIL and DAML
schema build on top of RDF Schema by adding modelling constructs from de-
scription logic [6]. This style of language has a well understood semantic ba-
sis but lacks both a wide user community outside AI research laboratories and a
standard graphical presentation—an important consideration for aiding the human
comprehension of ontologies.
This paper discusses Semantic Web technology based on an alternative paradigm
that also supports the modelling of concepts in a domain (an ontology) and the
expression of information in terms of those concepts. This is the paradigm of
object-oriented modelling from the software engineering community. In particu-
lar, there is an expressive and standardised modelling language, the Unified Mod-
eling Language (UML [7]), which has graphical and XML-based formats, a huge
user community, a high level of commercial tool support and an associated con-
straint language with the expressive power of first-order logic. Although devel-
oped to support analysis and design in software engineering, UML is beginning
2
to be used for other modelling problems, one notable example being its adoption
by the Meta Data Coalition [8] for representing metadata schemas for enterprise
data.
The proposed application of UML to the Semantic Web is based on the fol-
lowing three claims:
UML class diagrams provide a static modelling capability that is well suited
for representing ontologies [9].
Further discussion of these points is beyond the scope of this paper which focuses
on technology to support the use of object-oriented modelling for the Semantic
Web.
3
domain-specific encoding format for knowledge about objects in that domain, and
an application programmer interface (API) to allow convenient creation, import
and export of that knowledge.
Marshalling
package RDF API
.. { uses .. {
...(.) { ...(.) {
..... .....
} }
} }
uses javac
Java Java
source files class files
.. { 100110
javac loads
UML-based ...(.) {
.....
101001
011011
design tool } 000110 Applications
} 101101
XSLT
references
references
<....>
<....> XSLT references
<...>
<.>
<..>
4
to represent knowledge about objects in the domain as in-memory data struc-
tures. The generated schema in RDF defines domain-specific concepts that an
application can reference when serialising this knowledge using RDF (in its XML
encoding). The marshalling and unmarshalling of object networks to and from
RDF/XML documents is performed by a pair of Java classes: MarshalHelper
and UnmarshalHelper. These delegate to the generated Java classes decisions
about the names and types of fields to be serialised and unserialised, but are then
called back to perform the translation to and from RDF making use of an existing
Java RDF application programmer’s interface [11].
Note that the generated RDF schema does not contain all the information
from the original UML model. If an application needs access to full ontologi-
cal information, it can use the original XMI document with the help of one of the
currently available or forthcoming Java APIs supporting the processing of UML
models. The purpose of the RDF schema is to define RDF resources correspond-
ing to all the classes, interfaces, attributes and associations in the ontology in
order to support RDF serialisation of instance information. For the sake of human
readers, the schema records additional information such as subclass relationships
and the domains and ranges of properties corresponding to attributes and asso-
ciations. However, this information is not required for processing RDF-encoded
instance information because each generated Java class contains specialised meth-
ods marshalFields and unmarshalFields containing hard-coded know-
ledge about the names and types of the class’s fields. This is a design decision
intended to avoid potentially expensive analysis of the schema during marshalling
and unmarshalling. This it should be possible to use this serialisation mechanism
in situations where optimised serialisation is important, such as in agent messag-
ing systems.
3 An example domain
This section presents an example ontology modelled as a UML class diagram and
some knowledge encoded as an object diagram. The ontology defines a subset
of the concepts included in the CIA World Factbook and is adapted from an OIL
representation of a similar subset [12].
5
3.1 An ontology in UML
Figure 2 presents the CIA World Factbook ontology represented as a UML class
diagram. The version shown here is not a direct translation from OIL: there is an
additional class (AdministrativeDivision), UML association classes are
used where appropriate, and instead of defining the classes City and Country
as specialised types of Region (GeographicalLocation in the OIL original), the
ontology represents these as optional roles that a region may have.
Region
name : String 1
1 1 0..*
AreaComparison
proportion : String
The boxes in the diagram represent classes, and contain their names and (where
applicable) their attributes. The lines between classes depict association relation-
ships between classes. A class A has an association with another class B if an
object of class A needs to maintain a reference to an object of class B. An associ-
ation may be bidirectional, or (if a single arrowhead is present) unidirectional. A
‘multiplicity’ expression at the end of an association specifies how many objects
of that class may take part in that relationship with a single object of the class
at the other end of the association. This may be expressed as a range of values,
6
with ‘*’ indicating no upper limit. Association ends may be optionally named.
In the absence of a name, the name of the adjacent class, with the initial letter in
lower case, is used as a default name. Associations can be explicitly represented
as classes by attaching a class box to an association (see LandBoundary and
AreaComparison in the figure). This is necessary when additional attributes
or further associations are required to clarify the relationship between two classes.
The dog-eared rectangle in the lower left corner of the figure contains a con-
straint in the Object Constraint Language (OCL). This language provides a way to
constrain the possible instances of a model in ways that cannot be expressed using
UML’s structural modelling elements alone. The constraint shown here states that
i) a country’s capital is a city in that country, and ii) if a country c has another
as a neighbour, then that neighbouring country has c as a neighbour. Finally, the
keyword “datatype” appearing in guillemets above the class Real indicates that
this is a pre-existing built-in datatype. OCL defines a minimal set of primitive
datatypes and it is currently assumed that the ontology designer has used these
primitive types.
7
: Region : Region : Region
name = "Wellington" name = "New Zealand" name = "Otago"
as_admin_division
as_city : AdministrativeDivision
capital
as_country type = "region"
: City
:Country : AreaComparison
proportion = "About the size of"
: City
as_city
coastline_in_km
: Region : Region
name = "Dunedin" 15134 name = "Colorado"
schema corresponding to the ontology and the other produces a corresponding set
of Java classes and interfaces.
XSLT is a language for transforming XML documents into other documents.
An XSLT stylesheet is comprised of a set of templates that match nodes in the
input document (represented internally as a tree) and transform them (possibly
via the application of other templates) to produce an output tree. The output tree
can then be output as text or as an HTML or XML document.
The main issue common to both mappings is the problem of translating from
UML classes, which may have different types of features such as attributes, asso-
ciations and association classes, to a model where classes only have fields or (in
RDF) properties. It was also necessary to generate default names for fields where
association ends are not named in the UML model. The OCL conventions for writ-
ing navigation paths through object structures were used to resolve these issues.
Also, attributes and association ends with a multiplicity upper limit greater than
one are represented as set-valued fields (bags in RDF Schema) or, in the case of
association ends with a UML “ordered” constraint, list-valued fields (sequences in
RDF Schema). Further details about the mappings have been discussed elsewhere
[13] and are beyond the scope of this paper.
8
4.1 The generated RDF schema
The Resource Description Framework (RDF) is a simple resource–property–value
model designed for expressing metadata about resources on the Web. RDF has a
graphical syntax as well as an XML-based serialisation syntax. For readability,
examples in this paper are presented in the graphical syntax, although in practice
they are generated in the XML format.
RDF Schema is a set of predefined resources (entities with uniform resource
identifiers) and relationships between them that define a simple meta-model in-
cluding concepts of classes, properties, subclass and subproperty relationships, a
primitive type Literal, bag and sequence types, and domain and range con-
straints on properties. Domain schemas (i.e. ontologies) can then be expressed as
sets of RDF triples using the (meta)classes and properties defined in RDF Schema.
Schemas defined using RDF Schema are commonly called RDF schemas (small
‘s’).
The main issue in generating an RDF schema that corresponds to an object-
oriented model is that RDF properties are first-class objects and are not defined
within the context of a particular class. This can lead to conflicting range declara-
tions if the same property (e.g. head) is used to represent a field in two different
classes (e.g. Brew and Department). The solution chosen was to prefix each
property name representing a field with the name of the class. This has the disad-
vantage that in the presence of inheritance a class’s fields may be represented by
properties with different prefixes: some specifying the class itself and some nam-
ing a parent class. This might be confusing for a human reader but is not a problem
for the current purpose: to specify a machine-readable format for object-oriented
knowledge interchange.
Figure 4 presents a subset of the generated RDF schema corresponding to the
UML model presented in Figure 2. Only the classes Country and Region and
the relationships between them are included here.
In the standard RDF graphical notation used in the figure, an ellipse represents
a resource with its qualified name shown inside as a namespace prefix followed
by a local name. A namespace prefix abbreviates a Uniform Resource Identifier
(URI) associated with a particular namespace, and the URI for the resource can
be constructed by appending the local name to the namespace URI. A property is
represented by an arc, with the qualified name for the property written beside the
arc (in this case the arcs are given labels with the corresponding URIs shown in
the table).
Figure 4 includes one property that is not part of RDF Schema. There is no
9
r t
d wfb:Region.name
rdfs:Literal
r
wfb:Region.as_country
d
wfb:Region r
t d wfb:AreaComparison.proportion t
t r t
rdfs:Class wfb:AreaComparison d wfb:AreaComparison.region
t
t d rdf:Property
r wfb:AreaComparison.country t
wfb:Country
d r t
t
wfb:Country.region
rdf:Bag d et
r wfb:Country.areaComparison
10
mechanism in RDF Schema to parameterise a collection type (such as rdf:Bag)
by the class of elements it may contain. Therefore, the non-standard property
rdfsx:collectionElementType was introduced to represent this infor-
mation (this is abbreviated in the figure by the arc label et). The definition of this
property is shown in Figure 5. The object serialisation mechanism described in
this paper does not require this information but it is useful to people reading the
schema.
rdfs:ConstraintProperty
t
d
rdfsx:collectionElementType rdfs:Container
r
rdfs:Class
11
t n
wfb:Region y Colorado
t
n acr
New Zealand About the size of
rac r acc p
x
ac 1
t t t
12
4.2 The generated Java classes and marshalling framework
The generated RDF schema described in the previous section defines a domain-
specific serialisation format for object-oriented representation of knowledge about
the domain. To facilitate the processing of knowledge communicated in this form,
a set of Java classes can also be generated from the ontology using XSLT. These
allow Java applications to instantiate instances of the domain concepts. In addi-
tion, the generated classes, along with some additional utility classes, allow these
in-memory structures to be marshalled and unmarshalled to and from the RDF
serialisation format defined by the generated RDF schema. The aim of the mar-
shalling code is to allow a Java application to maintain an internal representation
of object-oriented knowledge and to easily read and write parts of this knowledge
to and from a format suitable for transmission or publication on the Web.
Figure 7 presents a class diagram outlining the structure of the generated Java
classes and the marshalling framework. The class MarshalHelper is part of
a support package used by the generated classes. It contains a static method
marshalObjects that provides the entry point for an application to marshal
a network of objects. A similar class UnmarshalHelper is also provided, but
is not discussed here. The class DomainObject is an abstract base class that
all generated classes specialise (the specialisation relationship is represented by a
closed arrow pointing to the more general class). The class Region is shown as
an example of a generated class.
This diagram does not show all the fields and methods. In particular, the class
Region also contains fields and methods related to the as_city, as_country
and as_admin_division association ends from the ontology shown in Fig-
ure 2. There are some fields and methods depicted that are related to whether or
not a field value is “known”. This is discussed in Section 5.
There is a significant difference between knowledge represented proposition-
ally and knowledge represented in the form of an object diagram. Propositions are
self-contained statements of knowledge whereas object diagrams are networks of
objects. When serialising knowledge, an application may only wish to include
some of the information it knows about a domain. For example, Figure 6 com-
pares New Zealand’s area to that of Colorado, but doesn’t provide the information
that Colorado is an administrative division of the United States. To allow this
selectivity, the marshalObjects method takes a collection of objects as an
argument. Links to any objects outside this collection will not be serialised. To
allow a particular entry point into the knowledge structure to be identified, a root
object is specified and the method returns the qualified name of the RDF resource
13
MarshalHelper
Creates MarshalHelper object h and Adds triple to RDF model, either property value
for each o in objects calls: or statement that property value isn't known
o.marshal(h)
DomainObject
OID : int ...
marshalInheritedFields(h);
hashcode() : int marshalFields(h);
equals(o : Object) : boolean ...
compareTo(o : Object) : int
marshal(h : MarshalHelper)
«abstract» marshalInheritedFields(h : MarshalHelper)
marshalFields(h : MarshalHelper) Has empty default definition
Region
name : String
nameKnown : boolean = false
...
name() : String
setName(name : String)
Defined in all generated classes as:
nameKnown() : boolean
super.marshalFields(h);
setNameKnown(known : Boolean)
Region()
Region(name : String)
marshalInheritedFields(h : MarshalHelper) ...
marshalFields(h : MarshalHelper) h.marshalString("name", name, nameKnown);
main(args[] : String) ...
Figure 7: The structure of the generated classes and the marshalling methods
14
in the serialised model that represents that object. A namespace for the serialised
information is also provided.
Figure 8 shows the Java code that would produce the RDF serialisation in
Figure 6.
15
to distinguish between a statement that there are no values for a given property
and the omission or lack or knowledge about a given property. In other words,
the recipient of object-oriented information needs a way of knowing for which
objects and which properties a closed world assumption can safely be made. This
is achieved by including extra boolean fields in the generated Java classes that
record for each regular field whether or not its value is ‘known’ or, for set- or
list-valued fields, ‘closed’—meaning that the contents of the set or list provide
complete knowledge of that field. Setting the value of a single-valued field sets its
‘known’ field to true and all fields also have a method allowing the programmer
to explicitly specify the status of the field.
When unmarshalling an object diagram from the RDF encoding it is assumed
that complete information about all properties is included unless otherwise spec-
ified (although the opposite could equally well be implemented as the default as-
sumption). Incomplete information is indicated using a non-standard RDF prop-
erty notClosedFor that associates a property with a resource, meaning that
complete information is not provided for that property applied to that resource.
Figure 9 shows the declaration of the notClosedOn property.
t
d rdf:Property
rdfx:notClosedOn
r
rdfs:Resource
16
x. Also, there is possibly missing information about the administrative division
property of the region represented by the resource labelled y .
wfb:Country.capital
nco
nco
wfb:Country.city x
nco
wfb:Country.administrativeDivision
nco
wfb:Region.as_admin_division y
17
form of these is not constrained and they are not semantically integrated with the
language.
It is therefore an important question to evaluate how well UML fares in this re-
gard. In fact, UML includes a powerful mechanism for expressing inference rules:
the Object Constraint Language. OCL is essentially a variant of first-order logic
with an object-oriented syntax. It is therefore sufficiently expressive to repre-
sent any first-order inference rules that an ontology designer may wish to specify.
However, this expressiveness also means that reasoning about unconstrained OCL
expressions will be undecidable in general.
The object-oriented syntax of OCL is also unlike any commonly used log-
ical language, and attempting to write rules in OCL can be frustrating for the
inexperienced. A constraint can often be expressed in several different ways and
the resulting expression can look quite unlike its counterpart in first-order logic.
Consider the constraint in Figure 2. The second conjunct specifies that the neigh-
bourhood relationship between countries is reflexive. The form of this constraint
might be immediately recognised as a standard pattern by an OCL expert but it is
not obvious to the uninitiated.
To enable tractable reasoning about ontologies in UML, and to avoid the awk-
ward syntax of OCL, it would be useful to define a macro language on top of
OCL comprising predicates such as reflexive(path-expression) which
are defined in terms of OCL. The set of macros could be chosen to ensure that
reasoning over these expressions is tractable. This would also help to allow the
translation of rules between UML-based and other representations of an ontology.
This is a subject for future research.
7 Conclusion
This paper has described technology that facilitates the application of object-
oriented modelling, and the Unified Modeling Language in particular, to the Se-
mantic Web. From an ontology specified in UML, a corresponding RDF schema
and a set of Java classes can be automatically generated to facilitate the use of
object diagrams as internal knowledge representation structures and the import
and export of these as RDF documents. A mechanism was also introduced for
indicating when an object diagram has missing or incomplete knowledge.
Important areas for future work are the identification of tractable subsets of
OCL for encoding inference rules and the definition of mappings between object-
oriented representations of ontologies and knowledge and more traditional de-
18
scription logic-based formalisms. This would allow applications to choose the
style of modelling most suitable for their needs while retaining interoperability
with other subsets of the Semantic Web.
Acknowledgements
This work was done while visiting the Network Computing Group at the Insti-
tute for Information Technology, National Research Council of Canada, Ottawa,
Canada. Thanks are due to Larry Korba and the NRC for hosting me and to the
University of Otago for approving my research and study leave.
References
[1] Robert Cailliau. A little history of the World Wide Web. http://www.w3.org/
History.html, 1995.
[2] Laura Carr. 100 numbers you need to know. TheStandard.com, Standard Me-
dia International, November 13 2000. http://www.thestandard.com/article/
display/0,1151,20128,00.html.
19
[8] Meta Data Coalition home page. http://www.mdcinfo.com/, 2000.
[12] M. Klein, D. Fensel, F. van Harmelen, and I. Horrocks. The relation be-
tween ontologies and schema-languages: translating OIL-specifications in
XML-Schema. In Proceedings of the Workshop on Applications of Ontolo-
gies and Problem solving Methods, 14th European Conference on Artificial
Intelligence (ECAI 2000), 2000. http://delicias.dia.fi.upm.es/WORKSHOP/
ECAI00/7.pdf.
20