Showing posts with label ontologies. Show all posts
Showing posts with label ontologies. Show all posts

Saturday, July 26, 2014

Another OWL diagramming transform and some more thoughts on writing

With summer in full-tilt and lots going on, I seem to have lost track of time and been delinquent in publishing new posts. I want to get back into writing slowly ... with a small post that builds on two of my previous ones.

First, I wrote a new XSL transform that outputs all NamedIndividuals specified in an ontology file. The purpose was to help with diagramming enumerations. (I made a simplifying assumption that you added individuals into a .owl file in order to create enumerated or exemplary individuals.) The location of the transform is GitHub (check out http://purl.org/NinePts/graphing). And, details on how to use the transform (for example, with the graphical editor, yED) is described in my post, Diagramming an RDF/XML ontology.

If you don't want some individuals included, feel free to refine the transform, or just delete individuals after an initial layout with yEd.

Second, here are some more writing tips, that build on the post, Words and writing .... Most of these I learned in high school (a very long time ago), as editor of the school paper. (And, yes, I still use them today.)
  • My teacher taught us to vary the first letter of each paragraph, and start the paragraphs with interesting words (e.g., not "the", "this", "a", ...). Her point was that people got an impression of the article from glancing at the page, and the first words of the paragraphs made the most impression. If the words were boring, then the article was boring. I don't know if this is true, but it seems like a reasonable thing.
  • Another good practice is to make sure your paragraphs are relatively short, so as not to seem overwhelming. (I try to keep my paragraphs under 5-6 sentences.) Also, each paragraph should have a clear focus and stick to it. It is difficult to read when the main subject of a paragraph wanders.
  • Lastly, use a good opening sentence for each paragraph. It should establish the contents of the paragraph - setting it up for more details to come in the following sentences.
You can check out more writing tips at "Hot 100 News Writing Tips".

Andrea

Tuesday, May 20, 2014

Diagramming an RDF/XML OWL ontology

Over the course of time (many times, in fact), I have been asked to "graph" my ontologies to help visualize the concepts. Anyone who has worked with Protege (or the Neon Toolkit or other tools) knows that none of the tools give you all the images that you really need to document your work. I have often resorted to hand-drawing the ontologies using UML diagrams. This is both painful and a huge time sink.

Recently, I was reading some emails on the Linked Data community distribution list about how they generate the LOD cloud diagram. Omnigraffle is used in the "official" workflow to create this diagram, but that tool costs money to buy. One of the email replies discussed a different approach.

A gentleman from Freenet.de needed to draw a similar diagram for the data cloud for the Open Linguistics Working Group. His team could not use the same code and processing flow as the LOD cloud folks, since they didn't have many Mac users. So, they developed an alternative based on GraphML. To create the basic graph, they developed a Python script. And, ...
Using yed's "organic" layout, a reasonable representation can be achieved which is then manually brought in shape with yed (positioning) and XML (font adjustment). In yed, we augment it with a legend and text and then export it into the graphic format of choice.
Given my propensity to "reuse" good ideas, I decided to investigate GraphML and yEd. And, since GraphML is XML, ontologies can be defined in RDF/XML, and XSLT can be used to transform XML definitions, I used XSLT to generate various GraphML outputs of an ontology file. Once the GraphML outputs were in place, I used yEd to do the layout, as the Freenet.de team did. (It is important to note that the basic yEd tool is free. And, layout is the most difficult piece of doing a graphic.)

So, what did I find? You can be the judge. The XSLTs are found on GitHub (check out http://purl.org/NinePts/graphing). There are four files in the graphing directory:
  • AnnotationProperties.xsl - A transform of any annotation property definitions in an RDF/XML file, drawing them as rectangles connected to a central entity named "Annotation Properties".
  • ClassHierarchies.xsl - A transform of any class definitions in an RDF/XML file, drawing them in a class-superclass hierarchy.
  • ClassProperties.xsl - A transform of any data type and object property definitions in an RDF/XML file, drawing them as rectangles with their types (functional, transitive, etc.) and domains and ranges.
  • PropertyHierarchies.xsl - A transform of any data type and object property definitions in an RDF/XML file, drawing their property-super property relationships.
I executed the transforms using xsltproc. An example invocation is:
xsltproc -o result.graphml ../graphing/ClassProperties.xsl metadata-properties.owl
I then took the result.graphml and opened it in the yEd Graph Editor. (If you do the same, you will find that all the classes, or properties lay on top of each other. I made no attempt to do any kind of layout since I planned to use yEd for that purpose.) For the class properties graph (from the above invocation), I used the Layout->Radial formatting, with the default settings. Here is the result:



I was impressed with how easy this was!

The really great thing is that if you don't like a layout, you can choose another format and even tweak the results. I did some tweaking for the "Property Hierarchies" diagram. In this case, I ran the PropertyHierarchies.xsl against the metadata-properties.owl file and used the Hierarchical Layout on the resulting GraphML file. Then, I selected all the data properties and moved them underneath the object properties. Here is the result:



Admittedly, the diagrams can get quite complex for a large ontology. But, you can easily change/combine/separate the XSLT transforms to include more or less content.

With about a day and half's worth of work (and using standards and free tooling), I think that I saved myself many frustrating and boring hours of diagramming. Let me know if you find this useful, or you have other suggestions for diagramming ontologies.

Andrea

Wednesday, May 7, 2014

Updated metadata ontology file (V0.6.0) and new metadata-properties ontology (V0.2.0) on GitHub

I've spent some time doing more work on the general metadata ontologies (metadata-annotations and metadata-properties). Metadata-annotations is now at version 0.6.0. In this release, I mainly corrected the SPARQL queries that were defined as the competency queries. SPARQL is straightforward, but it is easy to make mistakes. I made a few in my previous version (because I just wrote the queries by hand, without testing them - my bad). Anyway, that is all fixed now and the queries are correct. My apologies on the errors.

You can also see that there is a new addition to the metadata directory with the metdata-properties ontology. Metadata-properties takes some of the concepts from metadata-annotations, and redefines them as data and object properties. In addition, a few supporting classes are defined (specifically, Actor and Modification), where required to fully specify the semantics.

Actor is used as the subject of the object properties, contributedTo and created. Modification is designed to collect all the information related to a change or update to an individual. This is important when one wants to track the specifics of each change as a set of related data. This may not be important - for example, if one only wants to track the date of last modification or only track a description of each change. In these cases, the data property, dateLastModified, or the annotation property, changeNote, can be the predicate of a triple involving the updated individual directly.

It is important to understand that only a minimum amount of information is provided for Actor and Modification. They are defined, but are purposefully underspecified to allow application- or domain-specific details to be provided in another ontology. (In which case, the IRIs of the corresponding classes in the other ontology would be related to Actor and Modification using an owl:equivalentClass axiom. This was discussed in the post on modular ontologies, and tying together the pieces.)

Also in the metadata-properties ontology, an identifier property is defined. It is similar to the identifier property from Dublin Core, but is not equivalent since the metadata-properties' identifier is defined as a functional data property. (The Dublin Core property is "officially" defined as an annotation property.)

To download the files, there is information in the blog post from Apr 17th.

Please let me know if you have any feedback or issues.

Andrea

Monday, April 28, 2014

General, Reusable Metadata Ontology - V0.2

This is just a short post that a newer version of the general metadata ontology is available. The ontology was originally discussed in a blog post on April 16th. And, if you have trouble downloading the files, there is help in the blog post from Apr 17th.

I have taken all the feedback, and reworked and simplified the ontology (I hope). All the changes are documented in the ontology's changeNote.

Important sidebar: I strongly recommend using something like a changeNote to track the evolution of every ontology and model.

As noted in the Apr 16th post, most of the concepts in the ontology are taken from the Dublin Core ELements vocabulary and the SKOS data model. In this version, the well-established properties from Dublin Core and SKOS use the namespaces/IRIs from those sources (http://purl.org/dc/elements/1.1/ and http://www.w3.org/2004/02/skos/core#, respectively). Some examples are dc:contributor, dc:description and skos:prefLabel. Where the semantics are different, or more obvious names are defined (for example, creating names that provide "directions" for the skos:narrower and broader relations), then the purl.org/ninepts namespace is used.

This release is getting much closer to a "finished" ontology. All of the properties have descriptions and examples, and most have scope/usage notes. The ontology's scope note describes what is not mapped from Dublin Core and SKOS, and why.

In addition, I have added two unique properties for the ontology. One is competencyQuestions and the other is competencyQuery. The concept of competency questions was originally defined in a 1995 paper by Gruninger and Fox as "requirements that are in the form of questions that [the] ontology must be able to answer." The questions help to define the scope of the ontology, and are [should be] translated to queries to validate the ontology. These queries are captured in the metadata ontology as SPARQL queries (and the corresponding competency question is included as a comment in the query, so that it can be tracked). This is a start at test-driven development for ontologies. :-)

Please take a look at the ontology (even if you did before since it has evolved), and feel free to comment or (even better) contribute.

Andrea

Thursday, April 17, 2014

Downloading the Metadata Ontology Files from GitHub

Since I posted my ontology files to GitHub, and got some emails that the downloads were corrupted, I thought that I should clarify the download process.

You are certainly free to fork the repository and get a local copy. Or, you can just download the file(s) by following these instructions:
  • LEFT click on the file in the directory on GitHub
  • The file is displayed with several tabs across the top. Select the Raw tab.
  • The file is now displayed in your browser window as text. Save the file to your local disk using the "Save Page As ..." drop-down option, under File.
After you download the file(s), you can then load one of them into something like Protege. (It is only necessary to load one since they are all the same.) Note that there are NO classes, data or object properties defined in the ontology. There are only annotation properties that can be used on classes, data and object properties. Since I need this all to be usable in reasoning applications, I started with defining and documenting annotation properties.

I try to note this in a short comment on the ontology (but given the confusion, I should probably expand the comment). I am also working on a metadata-properties ontology which defines some of the annotation properties as data and object properties. This will allow (for example) validating dateTime values and referencing objects/individuals in relations (as opposed to using literal values). It is important to note, however, that you can only use data and object properties with individuals (and not with class or property declarations, or you end up with OWL Full with no computational guarantees/no reasoning).

Lastly, for anyone that objects to using annotation properties for mappings (for example, where I map SKOS' exactMatch in the metadata-annotations ontology), no worries ... More is coming. As a place to start, I defined exactMatch, moreGeneralThan, moreSpecificThan, ... annotation properties for documentation and human-consumption. (I have to start somewhere. :-) And, I tried to be more precise in my naming than SKOS, which names the latter two relations, "broader" and "narrower", with no indication of whether the subject or the object is more broad or more narrow. (I always get this mixed up if I am away from the spec for more than a week. :-)

I want to unequivocally state that annotation properties are totally inadequate to do anything significant. But, they are a start, and something that another tool could query and use. Separately, I am working on a more formal approach to mapping but starting with documentation is where I am.

Obviously, there is a lot more work in the pipeline. I just wish I had more time (like everyone).

In the meantime, please let me know if you have more questions about the ontologies or any of my blog entries.

Andrea

Wednesday, April 16, 2014

General, Reusable, Metadata Ontology

I recently created a new ontology, following the principles discussed in Ontology Summit 2014's Track A. (If you are not familiar with the Summit, please check out some of my earlier posts.) My goal was to create a small, focused, general, reusable ontology (with usage and scope information, examples of each concept, and more). I must admit that it was a lot more time-consuming than I anticipated. It definitely takes time to create the documentation, validate and spell-check it, make sure that all the possible information is present, etc., etc.

I started with something relatively easy (I thought), which was a consolidation of basic Dublin Core and SKOS concepts into an OWL 2 ontology. The work is not yet finished (I have only been playing with the definition over the last few days). The "finished" pieces are the ontology metadata/documentation (including what I didn't map and why), and several of the properties (contributor, coverage, creator, date, language, mimeType, rights and their sub-properties). The rest is all still a work-in-progress.

It has been interesting creating and dog-fooding the ontology. I can definitely say that it was updated based on my experiences in using it!

You can check out the ontology definition on github (http://purl.org/ninepts/metadata). My "master" definition is in the .ofn file (OWL functional syntax), and I used Protege to generate a Turtle encoding from it. My goals are to maintain the master definition in a version-control-friendly format (ofn), and also providing a somewhat human-readable format (ttl). I also want to experiment with different natural language renderings that are more readable than Turtle (but I am getting ahead of myself).

I would appreciate feedback on this metadata work, and suggestions for other reusable ontologies (that would help to support industry and refine the development methodology). Some of the ontologies that I am contemplating are ontologies for collections, events (evaluating and bringing together concepts from several, existing event ontologies), actors, actions, policies, and a few others.

Please let me know what you think.

Andrea

Wednesday, February 5, 2014

More on modular ontologies and tying them together

There is a short email dialog on this topic on the Ontology Summit 2014 mail list. I thought that I would add the reply from Amanda Vizedom as a blog post (to keep everything in one place).

Amanda added:

The style of modularity you mention, with what another summit poster (forgive me for forgetting who at the moment) referred to as 'placeholder' concepts within modules, can be very effective. The most effective technique I've found to date, for some cases.

Two additional points are worth making about how two execute this for maximum effectiveness (they may match what you've done, in fact, but are sometimes missed & so worth calling out for others.

Point 1: lots of annotation on the placeholders. The location & connection of the well-defined concepts to link them to is often being saved for later and possibly for someone else. In order to make sure the right external concept is connected, whatever is known or desired of the underspecifies concept shoud be captured (in the location case, for example, may be that it needs to support enough granularit to be used for location at which a person can be contacted at current time, or must be the kind os location that has a shipping address, or is only intended to be the place of business of the enterprise to which Person is assigned & out of which they operate (e.g., embassy, business office, base, campus). That's often known or easily elicitable without leaving the focus of a specialized module, and can be captured in an annotation for use in finding existing, well defined ontology content and mapping.

Point 2: advantages of modules, as you described are best maintained when the import and mapping are done *not* in the specialized module, but in a "lower" mapping module that inherits the specialized module and the mapping-target ontologies. Spindles of ontologies, which can be more or less intricate, allow for independent development and reuse of specialized modules, with lower mapping and integration modules, with a spindle-bottom that imports all in the spindle and effectivle acts as the integrated query, testing, and application module for all the modules contained in that spindle, providing a simplified and integrated interface to a more complex and highly modular system of ontologies. Meanwhile, specialized modules can be developed with SMEs who don't know, care, or have time to think about the stuff they aren't experts about, like distinguishing kinds location or temporal relations or the weather. Using placeholders and doing your mapping elsewhere may sound like extra work, but considering what it can enable, it can be an incredibly effective approach.

Indeed, the second point is exactly my "integrating" ontology, which imports the target ontologies and does the mapping. As to the first point, that is very much worth highlighting. I err on the side of over-documenting and use various different kinds of notes and annotation. For a good example, take a look at the annotation properties in the FIBO Foundations ontology. It includes comment, description, directSource, keyword, definition, various kinds of notes, and much more.

Another set of annotation properties that I use (which I have not seen documented before, but that I think is valuable for future mapping exercises) are WordNet synset references - as direct references or designating them as hyponyms or hypernyms. (For those not familiar with WordNet, check out this page and a previous blog post.)

Andrea

Sunday, February 2, 2014

Creating a modular ontology and then tying the pieces together

In my previous post, I talked about creating small, focused "modules" of cohesive semantic content.  And, since these modules have to be small, they can't (and shouldn't) completely define everything that might be referenced.  Some concepts will be under-specified.  

So, how we tie the modules together in an application?

In a recent project, I used the equivalentClass OWL semantic to do this. For example, in a Person ontology, I defined the Person concept with its relevant properties.  When it came to the Person's Location - that was just an under-specified (i.e., empty) Location class.  I then found a Location ontology, developed by another group, and opted to use that.  Lastly, I defined an "integrating" ontology that imported the Person and Location ontologies, and specified an equivalence between the relevant concepts.  So, PersonNamespace:Location was defined as an equivalentClass to LocationNamespace:Location. Obviously, the application covered up all this for the users, and my triple store (with reasoner) handled the rest.

This approach left me with a lot of flexibility for reuse and ontology evolution, and didn't force imports except in my "integrating" ontology.  And, a different application could bring in its own definition of Location and create its own "integrating" ontology.

But, what happens if you can't find a Location ontology that does everything that you need?  You can still integrate/reuse other work, perhaps defined in your integrating ontology as subclasses of the (under-specified) PersonNamespace:Location concept.

This approach also works well when developing and reusing ontologies across groups.  Different groups may use different names for the same semantic, may need to expand on some concept, or want to incorporate different semantics.  If you have a monolithic ontology, these differences will be impossible to overcome.  But, if you can say things like  "my concept X is equivalent to your concept Y" or "my concept X is a kind of your Y with some additional restrictions" - that is very valuable.  Now you get reuse instead of redefinition.

Andrea

Wednesday, January 29, 2014

Reuse of ontology and model concepts

Reuse is a big topic in this year's Ontology Summit.  In a Summit session last week, I discussed some experiences related to my recent work on a network management ontology.  The complete presentation is available from the Summit wiki.  And, I would encourage you to look at all the talks given that day since they were all very interesting! (The agenda, slides, chat transcript, etc. are accessible from the conference call page.)

But ... I know that you are busy.  So, here are some take-aways from my talk:

  • What were the candidates for reuse?  There were actually several ontologies and models that were looked at (and I will talk about them in later posts), but this talk was about two specific standards:  ISO 15926 for the process industry, and FIBO for the financial industry.
  • Why did we reuse since there was not perfect overlap of the chosen domain models/ontologies and network management?  Because there was good thought and insight put into the standards, and there also was tooling developed that we want to reuse.  Besides that, we have limited time and money - so jump starting the development was "a good thing". 
  • Did we find valuable concepts to reuse?  Definitely.  Details are in the talk but two examples are:
    • Defining individuals as possible versus actual.  For anyone that worries about network and capacity planning, inventory management, or staging of new equipment, the distinction between what you have now, what you will have, and what you might have is really important.
    • Ontology annotation properties.  Documentation of definitions, sources of information, keywords, notes, etc. are extremely valuable to understand semantics.  I have rarely seen good documentation in an ontology itself (it might be done in a specification that goes with the ontology).  The properties defined and used in FIBO were impressive.
  • Was reuse easy?  Not really.  It was difficult to pull apart sets of distinct concepts in ISO 15926, although we should have (and will do) more with templates in the future.  Also, use of OWL was a mapping from the original definition, which made it far less "natural"/native.  FIBO was much more modular and defined in OWL.  But due to ontology imports, we pretty much ended up loading and working through the complete foundational ontology.  

Given all this, what are some suggestions for getting more reuse?

  1. Create and publish more discrete, easily understood "modules" that:
    • Define a maximum of 12-15 core entities with their relationships (12-15 items is about the limit of what people can visually retain)
    • Document the assumptions made in the development (where perhaps short cuts were made, or could be made)
    • Capture the axioms (rules) that apply separately from the core entities (this could allow adjustments to the axioms or assumptions for different domains or problem spaces, without invalidating the core concepts and their semantics)
    • Encourage evolution and different renderings of the entities and relationships (for example, with and without short cuts)
  2. Focus on "necessary and sufficient" semantics when defining the core entities in a module and leave some things under-specified  
    • Don't completely define everything just because it touches your semantics (admittedly, you have to bring all the necessary semantics together to create a complete model or ontology, but more on that in the next post)
    • A contrived example is that physical hardware is located somewhere in time and space, but it is unlikely that everyone's requirements for spatial and temporal information will be consistent.  So, relate your Hardware entity to a Location and leave it at that.  Let another module (or set of modules) handle the idiosyncrasies of Location.
In my next post, I promise to talk more about how to combine discrete "modules" with under-specified concepts to create a complete solution.

Andrea


Tuesday, January 21, 2014

Semantic Technologies and Ontologies Overview Presentation

Last year, I did a short talk on semantic technologies and ontologies, and thought that I would share it. The audience was mostly people who were new to the technologies, and needed to understand their basics and how/where they are used.

[Disclaimer] The presentation is pretty basic ...

But it seemed to work. It overviews key terms (like the "o-word", ontology :-) and standards (based on the ever popular, semantic "layer cake" image). In looking over the deck, I see that I should have talked about RIF (Rule Interchange Format). But, I was using SWRL at the time, and so gravitated to that. (My apologies for not being complete.)

Since the talk was meant to show that semantic technologies are not just an academic exercise, I spent most of the time highlighting how and where the technologies are used. IMHO, I think that the major uses are:
  • Semantic search and query expansion
  • Mapping and merging of data
  • Knowledge management
The last bullet might be a bit ambiguous. For me, it means organizing knowledge (my blog's namesake) and inferring new knowledge (via reasoning and logic).

There are also quite a few examples of real companies using ontologies and semantic technologies. It is kind of amazing when you look at what is being done.

So, take a look and let me know what you think.

And, as a teaser, I want to highlight that I will be presenting at the next Ontology Summit 2014 session on Thursday, January 23rd, on "Reuse of Content from ISO 15926 and FIBO". If you want to listen in, the details for the conference call are here.  Hopefully, you can join in.

Andrea

Friday, March 18, 2011

NIST and Access Control

I ran across an excellent paper from NIST (the US's National Institute of Standards and Technology), A Survey of Access Control Methods. The document is a component of the publication, "A Report on the Privilege (Access) Management Workshop". I highly recommend reading it, since the security landscape is evolving ... as the technology, online information, regulations/legislation, and "need to share" requirements of a modern, agile enterprise keep expanding.

Access control is discussed from the hard-core (and painfully detailed) ACL approach (access control lists) all the way through policy and risk-adaptive control (PBAC and RAdAC). Here is a useful image from the document, showing the evolution:



Reading the paper triggered some visceral reactions, on my part ... For example, I strongly feel that role-based access control is no longer adequate for the real-world. Yet, it is where most of us live today.

The problem is the need for agility. The world is no longer only about restricting access to specific, known-in-advance entities using a one-size-fits-all-conditions analysis ("need to protect" with predefined roles) - but also about granting the maximum access to information that is allowed ("need to share" considering the conditions under which sharing occurs).

Here are some examples ... Firefighters need the maximum data about the location and conditions of a fire that they can legally obtain (see my previous post, Using the Semantic Web to Fight Fires). Law enforcement personnel, at the federal, state or local levels, need all the data about suspicious activities that can be legally shared. An information worker needs to see and analyze all relevant data that is permitted (legally and within the corporate guidelines). *The word, "legally", comes up a lot here ... more on that in another post.

So, how do you accomplish this with simple roles? You can certainly build new roles that take various situational attributes into account. But how far can you go with this approach? At some point, the number of roles (variations on a theme) spirals out of control. You really need attribute based control. As the NIST paper points out, with attributes, you don't need to know all the requesters in advance. You just need to know about the conditions of the access.

But, simply adding attribute data (data about the information being accessed, the entity accessing it, the environment where the access occurs or is needed, ...) can get quite complex. The real problem is figuring out how to harmonize and evaluate the attribute information if it is accessed from several data stores or infrastructures. Then, closely associated with that problem is the need to be consistent across an enterprise - to not allow access (under the same conditions) through one infrastructure that is disallowed by another.

Policy-based access control, the next concept in the evolution, starts to address some of these concerns. NIST describes PBAC as "a harmonization and standardization of the ABAC model at an enterprise level in support of specific governance objectives." It concerns the creation and administration of organization-wide rule sets (policies) for access control, using attribute criteria that are also semantically consistent across the enterprise.

Wow, reading that last sentence made my head hurt. :-) Let me decompose the concepts. For policy-based access control to really work, we need (IMHO, in order of implementation):

  1. A well defined (dare I say "standard") policy/rule structure
  2. A well understood vocabulary for the actors, resources and attributes
  3. Ability to use #1 and #2 to define access control rules
  4. Ability to analyze the rules for consistency and completeness
  5. An infrastructure to support the evaluation and enforcement of the rules (at least by transforming between local data stores and infrastructures, and the well understood and defined vocabulary and policies/rules)

Some day, we will have best practices and standards for #1 and #2. Even better, we could have government-blessed renderings of the standard legislation (SOX, HIPAA, ...) using #1 and #2.

Can NIST also help with these activities? I hope that it can. In the meantime, there are some technologies like Semantic Web that can help.

As you can imagine, I have lots more things to discuss about the specifics of PBAC and RAdAC, in my next posts.

Andrea

Monday, June 8, 2009

PriceWaterhouseCoopers Spring Technology Forecast (Part 3)

This is the last in a series of posts summarizing the PriceWaterhouseCooper Spring Technology Forecast. I spent a lot of time on the report, since it highlights many important concepts about the Semantic Web and business.

The last featured article in the report is entitled 'A CIO's strategy for rethinking "messy BI"'. The recommendation is to use Linked Data to bring together internal and external information - to help with the "information problem". How does PwC define the "information problem"? As follows ... "there's no way traditional information systems can handle all the sources [of data], many of which are structured differently or not structured at all." The recommendation boils down to creating a shared or upper ontology for information mediation, and then using it for analysis, for helping to create a business ecosystem, and to harmonize business logic and operating models. The two figures below illustrate these concepts.





The article includes a great quote on the information problem, why today's approaches (even metadata) are not enough, and the uses of Semantic Web technologies ... "Think of Linked Data as a type of database join that relies on contextual rules and pattern matching, not strict preset matches. As a user looks to mash up information from varied sources, Linked Data tools identify the semantics and ontologies to help the user fit the pieces together in the context of the exploration. ... Many organizations already recognize the importance of standards for metadata. What many don’t understand is that working to standardize metadata without an ontology is like teaching children to read without a dictionary. Using ontologies to organize the semantic rationalization of the data that flow between business partners is a process improvement over electronic data interchange (EDI) rationalization because it focuses on concepts and metadata, not individual data elements, such as columns in a relational database management system. The ontological approach also keeps the CIO’s office from being dragged into business-unit technical details and squabbling about terms. And linking your ontology to a business partner’s ontology exposes the context semantics that data definitions lack."

PwC suggests taking 2 (non-exclusive) approaches to "explore" the Semantic Web and Linked Data:

  • Add the dimension of semantics and ontologies to existing, internal data warehouses and data stores
  • Provide tools to help users get at both internal and external Linked Data
And, as with the previous posts, I want to finish with a quote from one of the interviews in the report. This quote comes from Frank Chum of Chevron, and discusses why they are now looking to the Semantic Web and ontologies to advance their business. "Four things are going on here. First, the Semantic Web lets you be more expressive in the business logic, to add more contextual meaning. Second, it lets you be more flexible, so that you don’t have to have everything fully specified before you start building. Then, third, it allows you to do inferencing, so that you can perform discovery on the basis of rules and axioms. Fourth, it improves the interoperability of systems, which allows you to share across the spectrum of the business ecosystem. With all of these, the Semantic Web becomes a very significant piece of technology so that we can probably solve some of the problems we couldn’t solve before. One could consider these enhanced capabilities [from Semantic Web technology] as a “souped up” BI [business intelligence]."

Wednesday, June 3, 2009

PriceWaterhouseCoopers Spring Technology Forecast (Part 2)

This post continues the review and summarization of PwC's Spring Technology Forecast, focused on the Semantic Web.

The second featured article is Making Semantic Web connections. It discusses the business value of using Linked Data, and includes interesting information from a CEO survey about information gaps (and how the Semantic Web can address these gaps). The article argues that to get adequate information, the business must better utilize its own internal data, as well as data from external sources (such as information from members of the business' ecosystem or the Web). This is depicted in the following two figures from the article ...




















I also want to include some quotes from the article - especially since they support what I said in an earlier blog from my days at Microsoft,
Question on what "policy-based business" means ... :-)
  • Data aren’t created in a vacuum. Data are created or acquired as part of the business processes that define an enterprise. And business processes are driven by the enterprise business model and business strategy, goals, and objectives. These are expressed in natural language, which can be descriptive and persuasive but also can create ambiguities. The nomenclature comprising
  • ... the natural language used to describe the business, to design and execute business processes, and to define data elements is often left out of enterprise discussions of performance management and performance improvement.
  • ... ontologies can become a vehicle for the deeper collaboration that needs to occur between business units and IT departments. In fact, the success of Linked Data within a business context will depend on the involvement of the business units. The people in the business units are the best people to describe the domain ontology they’re responsible for.
  • Traditional integration methods manage the data problem one piece at a time. It is expensive, prone to error, and doesn’t scale. Metadata management gets companies partway there by exploring the definitions, but it still doesn’t reach the level of shared semantics defined in the context of the extended virtual enterprise. Linked Data offers the most value. It creates a context that allows companies to compare their semantics, to decide where to agree on semantics, and to select where to retain distinctive semantics because it creates competitive advantage.
As in my last post, I want to reinforce the message and include a quote from one of the interviews. This one comes from Uche Ogbuji of Zepheira ... "... it’s not a matter of top down. It’s modeling from the bottom up. The method is that you want to record as much agreement as you can. You also record the disagreements, but you let them go as long as they’re recorded. You don’t try to hammer them down. In traditional modeling, global consistency of the model is paramount. The semantic technology idea turns that completely on its head, and basically the idea is that global consistency would be great. Everyone would love that, but the reality is that there’s not even global consistency in what people are carrying around in their brains, so there’s no way that that’s going to reflect into the computer. You’re always going to have difficulties and mismatches, and, again, it will turn into a war, because people will realize the political weight of the decisions that are being made. There’s no scope for disagreement in the traditional top-down model. With the bottom-up modeling approach you still have the disagreements, but what you do is you record them."

And, yes, I did say something similar to this in an earlier post on Semantic Web and Business. (Thumbs up :-)

Tuesday, June 2, 2009

PriceWaterhouseCoopers Spring Technology Forecast (Part 1)

In an earlier post, I mentioned PriceWaterhouseCoopers' spring technology forecast and its discussion of the Semantic Web in business. In this and the following post, I want to overview and highlight several of the articles. Let's start with the first featured article ...

Spinning a data Web overviewed the technologies of the Semantic Web, and discussed how businesses can benefit from developing domain ontologies and then mediating/integrating/querying them across both internal and external data. The value of mediation is summarized in the following figure ...








I like this, since I said something similar in my post on the Semantic Web and Business.

Backing up this thesis, Tom Scott of BBC Earth provided a supporting quote in his interview, Traversing the Giant Global Graph. "... when you start getting either very large volumes or very heterogeneous data sets, then for all intents and purposes, it is impossible for any one person to try to structure that information. It just becomes too big a problem. For one, you don’t have the domain knowledge to do that job. It’s intellectually too difficult. But you can say to each domain expert, model your domain of knowledge— the ontology—and publish the model in the way that both users and machine can interface with it. Once you do that, then you need a way to manage the shared vocabulary by which you describe things, so that when I say “chair,” you know what I mean. When you do that, then you have a way in which enterprises can join this information, without any one person being responsible for the entire model. After this is in place, anyone else can come across that information and follow the graph to extract the data they’re interested in. And that seems to me to be a sane, sensible, central way of handling it."

Monday, May 11, 2009

Going to School - Knowledge Management Style

In May 2001, Michael Earl wrote about three main categories and seven schools of knowledge management. His article was published in the Journal of Management Information Systems (Vol 18, Issue 1).

The three categories for capturing and sharing knowledge are:
  • Technocratic - involved with tooling and the use of technology for knowledge management
  • Economic - relating knowledge and income
  • Behavioral -dealing with how to organize to facilitate knowledge capture and exchange
Because these categories are so different, Earl pointed out that they are not mutually exclusive, and could be used in conjunction. In fact, doing so should better enable overall knowledge capture and use.

Within each of the categories, Earl posited that there are "schools" or focuses for knowledge management. Earl's seven schools are listed below (with some short descriptions):
  • Systems - Part of the technocratic category, focusing on the use of technology and the storing of explicit knowledge in databases and various systems and repositories. The knowledge is typically organized by domain.
  • Cartographic - Part of the technocratic category, focusing on who the "experts" are, in a company, and how to find and contact them. So, instead of explicit captured knowledge, the tacit knowledge held by individuals is paramount.
  • Engineering - Part of the technocratic category, focusing on capturing and sharing knowledge for process improvement. In addition, the details and outputs of various processes and knowledge flows are captured. The knowledge in this school is organized by activities with the goal of business process improvement.
  • Commercial - This is the only "economic" school and focuses on knowledge as a commercial asset. The emphasis is on income, which can be achieved in various ways ... such as limiting access to knowledge, based on payments or other exchanges, or rigorously managing a company's intellectual portfolio (individual know-how, patents, trademarks, etc.).
  • Organizational - Part of the behavioral category, focusing on building and enabling knowledge-sharing networks and communities of practice, for some business purpose. Earl defines it as a behavioral school "because the essential feature of communities is that they exchange and share knowledge interactively, often in nonroutine, personal, and unstructured ways". For those not familiar with the term "community of practice", it is defined by Etienne Wenger as “groups of people who share a concern or a passion for something they do and learn how to do it better as they interact regularly.”
  • Spatial - Part of the behavioral category, focusing on how space is used to facilitate socialization and the exchange of knowledge. This can be achieved by how office buildings are arranged, co-locating individuals working on the same project, etc.
  • Strategic - Part of the behavioral category, focusing on knowledge (according to Earl) as "the essence of a firm's strategy ... The aim is to build, nurture, and fully exploit knowledge assets through systems, processes, and people and convert them into value as knowledge-based products and services." This may seem like the strategic school rolls all the others into it, and it does. But, what distinguishes it, again according to Earl, "is that knowledge or intellectual capital are viewed as the key resource."
My personal focus is the strategic school, but with less interest in the spatial component and more in the systems aspects ... I believe that good collaboration needs to be (and can be) enabled, regardless of the physical environment or physical distances separating teams.

And, how do you do this? Via capturing, publishing and mapping each business group's/community's vocabularies (ontologies) and processes, and understanding that community's organizational structure.

Wednesday, April 15, 2009

"Top Down" or "Bottom Up" Ontologies

I received the following question from a colleague of mine... He asked about the benefits and risks of using a single standardized ontology (a “top down” approach) versus using local, private, or community ontologies (“bottom up”). Unfortunately, the benefits of one are the risks of the other! A single standardized ontology admits no errors of translation or omission. However, consensus ranges from difficult to impossible to obtain, and usually many concessions have to be made during its definition. Local or community ontologies are natural, and admit no frustrations or human errors due to learning new representations, or due to using concepts that have little semantic meaning in a community. However, you typically have lots of community ontologies and need to interoperate between them.

What is a possible answer? Take the local, private and community ontologies of your business and map them "up" to an existing "standardized ontology" - such as exists in medicine or even construction - see, for example, ISO 15926. (I already discussed the possibilities of ontology alignment provided by the Semantic Web in earlier posts, and will provide more details over the next few weeks.)

Or, if a standard ontology does not exist, create one from the local ontologies by mapping the local ones to one or more "upper" ontologies. At this point, some people will say "ughhh" another term - "upper" ontology - what the heck is that? Upper ontologies capture very general and reusable terms and definitions. Two examples that are both interesting and useful are:
  • SUMO (http://www.ontologyportal.org), the Suggested Upper Merged Ontology - SUMO incorporates much knowledge and broad content from a variety of sources. Its downside is that it is not directly importable into the Semantic Web infrastructure, as it is written in a different syntax (something called KIF). Its upsides are its vast, general coverage, its public domain IEEE licensing, and the many domain ontologies defined to extend it.
  • Proton (http://proton.semanticweb.org/D1_8_1.pdf), PROTo ONtology - PROTON takes a totally different approach to its ontology definition. Instead of theoretical analysis and hand-creation of the ontology, PROTON was derived from a corpus of general news sources, and hence addresses modern day, political, financial and sports concepts. It is encoded in OWL (OWL-Lite to be precise) for Semantic Web use, and was defined as part of the European Union's SEKT (Semantically Enabled Knowledge Technologies) project, http://www.sekt-project.com. (I will definitely be blogging more about SEKT in future posts. There is much interesting work there!)
Now, I must be clear that I do NOT advocate pushing the standard ontology down to the local communities - unless there are only small tweaks to making the standard ontologies work there. With ontology alignment technologies, you can have the best of all worlds - a standard ontology to use when unifying and analyzing the local ontologies, but all the naturalness of the local ontologies for the communities.

Thursday, April 9, 2009

Semantic Web and Business (Part 3)

In case anyone is confused by all the technical terms used in semantic computing, here is an explicit translation from ontology "language" to English:
  • Concept = class = noun = vocabulary word
  • Triple = subject-predicate-object (such as "John went to the library" - where "John" is the subject, "went-to" is the predicate, and "library" is the object)
  • Role = relation = association = the predicate in the triple = verb
  • Instance = a specific occurrence of a concept or relationship (can be manually defined or inferred)
  • Axiom = a statement of fact/truth that is taken for granted (i.e., is not proved)
  • Inference = deriving a logical conclusion from definitions and axioms
  • T-Box = a set of concepts and relationships (i.e., the definitions)
  • A-Box = a set of instances of the concepts and relationships
  • Hierarchy = arrangement of concepts or instances by some kind of classification/relationship mechanism - typical classification hierarchies are by type ("is-a" relationships - for example, "a tiger is a mammal") or by composition ("has-a" relationships - for example, "a person's name has the strucutre: personal or first name, zero or more middle names, and surname or last name")
  • Subsumption = is-a classification (determining the ordering of more general to more specific categories/concepts)
  • Consistency analysis = check to see that all specific instances make sense given the definitions, rules and axioms of an ontology
  • Satisfiability analysis = check to see that an instance of a concept can be created (i.e., that creating an instance will not produce an inconsistency/error)
  • Key = one or more properties that uniquely identify an individual instance of a concept/class
  • Monothetic classification = identifying a particular instance with a single key
  • Polythetic classification = identifying a particular instance by several possible keys which may not all exist for that instance
  • Surrogate key = an artificial key
  • Natural key = a key that has semantic meaning
  • CWA = Closed World Assumption (in databases) = anything not explicitly known to be true is assumed to be false (for example, if you know that John is the son of Mary but have a total of 3 children defined - John, Sue and Albert - and you ask who all the children of Mary are ... you get the answer "John" - 1 child)
  • OWA = Open World Assumption (in semantic computing) = anything not explicitly known is assumed to be true (using the same scenario above, asking the same question ... you get the answer "John, Sue and Albert" - 3 children)
These are the terms that come to the top of my mind, when I think about ontologies. But, if there are others, just send me email or leave me a comment.

Semantic Web and Business (Part 2)

In the last post, I talked about business' implicit ontologies and using semantic computing to help map and align different ontologies. In this post, I want to spend some time on the basics of ontology analysis and what a semantic (description logic) reasoner can do.

A description-logic reasoner (DL reasoner) takes concepts, individual instances of those concepts, roles (relationships between concepts and individuals) and sometimes constraints and rules - and then "reasons" over them to find inconsistencies (errors), infer new information, and determine classifications and hierarchies. Some basic relationships that are always present come from first-order logic - like intersections, unions, negations, etc. These are explicitly formalized in languages like OWL.

The reasoner that I am now using is Pellet from Clark and Parsia (http://clarkparsia.com/pellet/). It is integrated with Protege (which I mentioned in an earlier post), but also operates standalone. The nice thing is that Pellet has both open-source and commercial licenses to accomodate any business model - and is doing some very cool research on data validation and probabilistic reasoning (which you can read about on their blog, http://clarkparsia.com/weblog/).

How cool is it when you can get a program to tell you when your vocabulary is inconsistent or incomplete? Or, when a program can infer new knowledge for you, when you align two different vocabularies and then reason over the whole? No more relying on humans and test cases to spot all the errors!

Wednesday, April 8, 2009

Semantic Web and Business (Part 1)

Most people think of Semantic Web as a "pie in the sky", impossible "field of dreams". But, that is being short-sighted. Semantic web technologies are here today and being used for some extremely interesting work.

Typically, you hear about semantic web as a way for computers to understand and operate over the data on the web, and not just exchange it via (mostly XML-based) syntaxes. However, to "understand" something, you must speak a common language and then have insight into the vocabulary and concepts used in that language. Well, the semantic web languages exist - they are standards like RDF (Resource Description Language), RDF-S (RDF Schema), and OWL (Web Ontology Language). These syntaxes carry the details of the concepts, terms and relationships of the vocabulary. (Note that I provided only basic links to the specifications here. There is much more detail available!)

One problem is defining the syntax - and we are getting there via the work of the W3C. The next problem is getting agreement about the vocabulary. That is much harder - since every group has their own ideas about what the vocabulary should be. So, here again, the Semantic Web steps in. Semantic Web proponents are not just researching how to define and analyze vocabularies (you could also use the word, "ontology", here) - but how to merge and align them!

So, where does this intersect with business? Businesses have lots of implicit vocabularies/ontologies (for example, belonging to procurement, accounts payable, specific domain technologies integral to the organization, IT and other groups). And, business processes and data flows cross groups and therefore, cross vocabularies - and this leads to errors! Typically, lots of them!

Does this mean that everyone adopt a single vocabulary? Usually that is not even possible ... People who have learned a vocabulary and use it to mean very specific things, cannot easily change to use a new, different word. Another problem is agreeing on what a term means - like "customer" (is that the entity that pays for something, an end-user, or some other variant on this theme?).

Changing words will cause a slow down in the operations of the business due to the need to argue over terminology and representation. Then if a standard vocabulary is ever in place, there will be slowdowns and errors as people try to work the new vocabulary into their practices and processes. (BTW, I think that this is one reason that "standard" common models or a single enterprise information model are so difficult to achieve.)

How do we get around this? Enter the Semantic Web to help with the alignment of vocabularies/ontologies. But, first the vocabularies have to be captured. Certainly, no one expects people to write RDF, RDF-S or OWL. But, we all can write our natural languages - and that takes us back to "controlled languages" as I discussed in my previous post. I have a lot of ideas on how to achieve this ... but, this will come in later posts.

So, more on this in later weeks, but hopefully this post provides some reasons to be interested in the semantic web (more than just its benefits to search) ...