SlideShare a Scribd company logo
Publishing Data on the Semantic Web Peter Mika  Researcher, Data Architect Yahoo! Research
Intro to the Semantic Web
Vague, but exciting… Berners-Lee and the dawn of the Web
Semantic Web Publish information in a way that is easier to process for machines Web of Data instead of Web of Documents Two main architectural challenges A common format for sharing data Sharing the meaning of data Through social means (shared schemas) By using powerful schema languages Semantic Web standards from W3C Languages (RDF, OWL, RIF) Serializations (RDF/XML, RDFa) Protocols (SPARQL, HTTP) Semantic Web research into knowledge representation and reasoning, data integration, data quality and many other topics Community efforts to publish data and develop schemas
RDF (Resource Description Framework) The basic data model of the Semantic Web A universal model to capture all sorts of data: networks, relational, object-oriented… Basic unit of information is a triple  A tuple of (subject, predicate, object) Example: (Joe, loves, Mary) Each triple gives the value of a property for a given resource or relates two objects to one another Object is either a resource or a literal An RDF model is a set of triples Ordering of statements in an RDF document is irrelevant (unlike XML)
Resources vs. literals Resources are identified by a URI or otherwise the are  called a blank node URIs are a generalization of URLs Notation:  <http://www.example.org/Person>  or  ex:Person Literals have an optional language and datatype (string, integer etc.) Literals can not be subjects of statements Datatypes are identified by URIs, e.g. XML Schema datatypes Two literals are the same if their components are the same Notation:  “Joe B.”  or  Joe@en^^http://…#string
Advanced topic: Resources vs Literals Resources are objects, Literals are strings Resources are instances of classes, Literals have datatypes Whether something is a resource or literal sometimes depends on the detail of modeling <meta property=“myvocab:knows”>Paris Hilton</meta> <item rel=“foaf:knows”> <meta property=“foaf:name”>Paris Hilton</meta> </item> You cannot make statements about literals (literals are always the object in a triple) Resources can carry a globally unique identifier, literals have no identity Web resources such as documents and images are resources <item rel=“rdfs:seeAlso” resource=“http://www.some.related.page.com/”/> <item rel=“foaf:img” resource=“http://photosite.example.org/photo.jpg”/> When in doubt: it’s a resource
Graphical and textual notation A number of ways to serialize an RDF model into an RDF document RDF/XML, Turtle, N3, N-Triples Example: http://www.cs.vu.nl/~pmika/foaf.rdf my:Joe “ Joe A.” name foaf:Person type
Informational versus non-informational resources Informational resource: an HTML document, image, any other file on the Web Retrievable in its entirety from the Web Retrieving it can return a 200 OK Conceptual (non-informational) resource: a person, an event, a place, etc. A description of it may be retrievable from the Web When identified by a URL, retrieving it should return a 303 Redirect Never confuse a webpage with what it describes! You are not your Facebook profile: one is a document, the other is a person. A document has properties such as byte-size, media-type etc, a person has name, age, etc. Make sure you don’t use the URL of an existing webpage as the URI of a resource
Vocabularies (ontologies) Ontologies are collections of classes and properties used to describe objects in a particular domain OWL (the Web Ontology Language) is the standard ontology language OWL has an RDF serialization: ontologies are part of the Semantic Web Classes can be described by sub- and superclasses, required properties Class membership in RDF is expressed using the rdf:type property An instance can have multiple classes (types) A class can have multiple superclasses Properties can be described by their domain, range, cardinalities, etc.
RDF is designed for distributed systems URIs provide web-wide global identification across documents A resource may be described by multiple documents We know it’s the same resource because the same URI is used or through reasoning (advanced topic…) URIs are intented to be reused Unique, but not single identifiers: two URIs may denote the same thing URIs are dereferencable (can be retrieved) A well-behaved URI returns a description of the resource  Provides authority: the definition of foaf:Person lives at that URI Ontologies can be looked up as well Typically at the root of the URIs, also known as the namespace Example:  http://xmlns.com/foaf/0.1/Person  redirects to the specification
URIs implicitly link data together  (#joe, #name, “Joe A.”) (#joe, #email, mailto:joe@joe.com) (#mary, name, “Mary B.”) (#mary, gender, “female”) (#joe, #loves, #mary) Joe’s homepage A dating site Mary’s homepage (#name, #type, #Property) (#name, #domain, #Person) Schema doc
Put together, triples form a single ‘global’ graph “ Joe A.” #joe #name “ joe@joe.com” #email #mary #loves “ Mary B.” “ female” #name #gender
Publishing for the Semantic Web
Motivation Why publish data on the (Semantic) Web? In a business context Increase the potential for linking, reuse and aggregation Drive traffic back from other sites on the Web Pre-competitive data integration (e.g. drug discovery) Make your data more easily findable Drive traffic from search engines In a non-profit context Increase industry or government transparency, accountability Support research and education by making data accessible
Publishing and consuming data on the Semantic Web Publishing data involves Deciding in which format to publish your data Deciding which schema (ontology, vocabulary) to use OR you can create a new schema and publish it as well Multiple ways of publishing RDF data: Linked Data Metadata in HTML SPARQL endpoints Feeds GRDDL Automated tools Note: you may implement more than one
Option 1: Linked Data A web of RDF documents in parallel to the current Web Most often implemented as wrappers around databases or APIs The four rules of Linked Data: Use  URIs  to identify things. Use  HTTP  URIs so that these things can be referred to and looked up (&quot; dereference &quot;) by people and  user agents . Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF-XML. Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web. . . . #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
Option 1: Linked Data Advantages:  No change to the publishing of the HTML documents Data can be published by third party (e.g. Dbpedia) Disadvantages: Web servers need to be configured to properly handle URIs that identify concepts instead of documents Not favored by search engines  Lack of use cases Crawling needs to be changed Authority is difficult to determine Tools Triple stores (Virtuoso, Oracle etc.) and front-ends ( Pubby ) RDB-to-RDF mappers (e.g.  D2RQ ,  Triplify ) Validators ( Vapour ) Linked Data browsers ( many )
Linked Data as a movement Rapidly growing community effort to (re)publish open datasets as Linked Data In particular, scientific and government datasets see  linkeddata.org
Option 2: Metadata in HTML Using microformats, RDFa, Microdata (more later) Advantages: Data and document are always in sync Browser plug-in friendly Search engine friendly Copy-paste friendly Tools:  XML editors (e.g. Oxygen) Triplr RDFa Distiller RDFa bookmarklet Ubiquity RDFa plugin Optimus microformat parser Examples: many, including SlideShare, YouTube, LinkedIn, Digg, Myspace, Facebook… Peter Mika was born in Budapest. Peter Mika was born in Budapest. #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
Option 3: SPARQL endpoints An API for accessing RDF databases on the Web A query language and an HTTP protocol Advantages: Flexible access: make any query you want Also possible to expose a traditional RDBMs via a wrapper Disadvantages: For the publisher: cost of supporting arbitrary queries For the search engine: discovery of SPARQL servers is unsolved Tools:  Triple stores (Oracle, Virtuoso, Sesame, Jena, OWLIM etc.) RDB-to-RDF mappers such as D2RQ and Triplify #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
Option 4: Feeds Disadvantages: No standard feed format for RDF: data needs to be formatted and often manually submitted for each search engine Advantages Submit your data without making it public Competing and incompatible formats DataRSS (Yahoo!) Google Data Protocol  Open Data Protocol (Microsoft) . #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
Option 5: Publishing a transformation of the data  Publish the rule to transform the HTML to structured data GRDDL  is a standard for linking an HTML page to a transformation that produces RDF data Advantages No change to the page Disadvantages Transformation needs to be executed to get to the data Not much support by search engines Tools Intel MashMaker Dapper Glue API from AdaptiveBlue <XSLT> xx yy 1 2
Option 6: Automatic markup Web services that annotate HTML automatically Advantages No manual effort Disadvantages Limited to finding relevant entities in text Tools OpenCalais Zemanta API Peter Mika was born in Budapest. <person>Peter Mika</person> was born in <location>Budapest</location>.
Example:  Zemanta A personal writing assistant for bloggers Plugin for popular blogging platforms and web mail clients Analyzes text as you type and suggests hyperlinks, tags, categories, images and related articles API available with the same functionality
Choosing a vocabulary No vocabularies in many domains Books, movies, stuff people care about… Too many competing proposals in other domains Often versions of the same proposal Example: vocabularies for microformats Not maintained I cannot maintain your vocabulary for you Limited tool support Too many expert tools until now Many vocabularies are not designed for annotation Missing meeting point and social process An ontology is a  shared , formal representation of a domain
Choosing a vocabulary Search the Web or ask for advice on mailing lists [email_address] [email_address] Wikis semanticweb.org vocamp.org Beware of people who claim to have the vocabulary of everything Preferably you want something small and targeted Never a 100% fit    you will need to introduce vocabulary terms (classes and properties) Do not introduce new classes/properties in existing namespaces Example: the namespace http://xmlns.com/foaf/0.1/ is used by the FOAF project. Try not to introduce a new term without contacting the owner, i.e. the membership of the FOAF mailing list.
Advanced topic: creating a vocabulary Get advice on methodology vocamp.org and semanticweb.org Choose a namespace and a prefix Give sensible names, e.g. name it after your site, but don’t call it searchmonkey Namespace ends either with a slash or a hash Create   an RDF or OWL document describing your classes and properties Use an ontology editor such as Protégé 4.0 Follow naming conventions Publish your vocabulary Make sure the URIs of your properties and classes are resolvable E.g. myvocab:digicam should resolve to a document containing the definition of myvocab:digicam Convince others to adopt your vocabulary If you are in fishing, convince other fishing businesses
How do we build communities? www.vocamp.org
Metadata in HTML
Brief history of the Annotated Web 1995: HTML meta tags 1996: Simple HTML Ontology Extensions (SHOE) 1998: RDF/XML RDF/XML in HTML RDF linked from HTML 2003: Web 2.0 Tagging Microformats Metadata in Wikipedia Machine tags in Flickr 2005: eRDF  2008: RDFa 1.0 2011: RDFa 1.1 2012: Microdata?
HTML meta tags <HTML> <HEAD profile=&quot;http://dublincore.org/documents/dcq-html/&quot;> <META  name=&quot;DC.author &quot; content=&quot; Peter Mika &quot;> <LINK  rel=&quot;DC.rights  copyright&quot; href=&quot; http://www.example.org/rights.html &quot; />  <LINK  rel=&quot;meta&quot;  type=&quot;application/rdf+xml&quot; title=&quot;FOAF&quot;    href= &quot; http://www.cs.vu.nl/~pmika/foaf.rdf &quot;>  </HEAD>  … </HTML>
SHOE example  (Hefflin & Hendler, 1996)  <ONTOLOGY &quot;our-ontology&quot; VERSION=&quot;1.0&quot;>  <ONTOLOGY-EXTENDS &quot;organization-ontology&quot; VERSION=&quot;2.1&quot; PREFIX=&quot;org&quot; URL=&quot;http://www.ont.org/orgont.html&quot;>  <ONTDEF CATEGORY=&quot;Person&quot; ISA=&quot;org.Thing&quot;>  <ONTDEF RELATION=&quot;lastName&quot; ARGS=&quot;Person STRING&quot;>  <ONTDEF RELATION=&quot;firstName&quot; ARGS=&quot;Person STRING&quot;>  <ONTDEF RELATION=&quot;marriedTo&quot; ARGS=&quot;Person Person&quot;>  <ONTDEF RELATION=&quot;employee&quot; ARGS=&quot;org.Organization Person&quot;> </ONTOLOGY >   <HEAD> <META HTTP-EQUIV=&quot;Instance-Key&quot; CONTENT=&quot;http://www.cs.umd.edu/~george&quot;>  <USE-ONTOLOGY &quot;our-ontology&quot; VERSION=&quot;1.0&quot; PREFIX=&quot;our&quot; URL=&quot;http://ont.org/our-ont.html&quot;>  </HEAD> <BODY> <CATEGORY &quot;our.Person&quot;>  <RELATION &quot;our.marriedTo&quot; TO=&quot;http://www.cs.umd.edu/~helena&quot;>  <RELATION &quot;our.employee&quot;   FROM=&quot;http://www.cs.umd.edu&quot;>   My name is  <ATTRIBUTE &quot;our.firstName&quot;>  George  </ATTRIBUTE> <ATTRIBUTE &quot;our.lastName&quot;> Cook </ATTRIBUTE>  and I live at...
SHOE system
SHOE Text-based query interface
SHOE Graphical Query Interface
Example: Creative Commons Embedding CC license in HTML (now deprecated): <HTML> <HEAD>… </HEAD> <BODY> … <!–-   <rdf:RDF xmlns=&quot;http://creativecommons.org/ns#&quot; xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot; xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;> <Work rdf:about=&quot;http://www.yergler.net/averages/&quot;> <dc:title>The Law of Averages</dc:title> <dc:description>...because eventually i&apos;ll be right...</dc:description> <license rdf:resource=&quot;http://creativecommons.org/licenses/by-nc/1.0/&quot; /> </Work> <License rdf:about=&quot;http://creativecommons.org/licenses/by-nc/1.0/&quot;> <requires rdf:resource=&quot;http://web.resource.org/cc/Notice&quot; /> <permits rdf:resource=&quot;http://web.resource.org/cc/Reproduction&quot; /> <permits rdf:resource=&quot;http://web.resource.org/cc/Distribution&quot; /> <prohibits rdf:resource=&quot;http://web.resource.org/cc/CommercialUse&quot; /> </License> </rdf:RDF> -->
Example: Creative Commons Current: rel attribute (HTML4) This work is licensed under a <a  rel=&quot;license&quot;  href=&quot;http://creativecommons.org/licenses/by/3.0/us/&quot;>Creative Commons Attribution 3.0 United States License</a>. Use of the “rel” attribute for semantic annotation is the birth of the microformat…
Microformats (μf) Agreements on the way to encode certain kinds metadata in HTML Reuse of semantic-bearing HTML elements Based on existing standards Minimality Microformats exist for a limited set of objects hCard (persons and organizations) hCalendar (events) hResume hProduct hRecipe Varying degrees of support and stability hCard and rel-tag are widely supported Community centered around microformats.org Specifications and discussions are hosted there
Microformats: limitations No shared syntax Each microformat has a separate syntax tailored to the vocabulary  No formal schemas Limited reuse, extensibility of schemas Unclear which combinations are allowed No datatypes No  namespaces, unique  identifiers  (URIs)  no interlinking mapping between instances is required Always appears in the HTML <body>
Example: the hCard microformat <cite  class=&quot;vcard&quot; > <a  class=&quot;fn url&quot;  rel=&quot;friend colleague met” href=&quot;http://meyerweb.com/&quot;> Eric Meyer</a> </cite> wrote a post (<cite> <a href=&quot;http://meyerweb.com/eric/thoughts/2005/12/16/tax-relief/&quot;> Tax Relief</a></cite>) about an unintentionally humorous letter he received from  the <span  class=&quot;vcard” > <a  class=&quot;fn org url&quot;  href=&quot;http://irs.gov/&quot;> Internal Revenue Service</a>  </span>.  <div  class=&quot;vcard&quot; >  <a  class=&quot;email fn&quot;  href=&quot;mailto:jfriday@host.com&quot;>Joe Friday</a>  <div  class=&quot;tel&quot; >+1-919-555-7878</div>  <div  class=&quot;title&quot; >Area Administrator, Assistant</div>  </div>
RDFa W3C standard for embedding RDF data in HTML documents A set of new HTML attributes to be used in head or body A specification of how to extract the data from these attributes  RDFa is just a syntax, you have to choose a vocabulary separately RDFa 1.0 is a W3C Recommendation since October, 2008 RDFa Primer RDFa 1.1 is a small update on RDFa to make it easier to use Currently  Working Draft (March 31, 2011) Updated version of the  RDFa Primer (April 19, 2011) RDFa API for accessing RDFa data in a webpage in the browser from JavaScript Currently  Working Draft (April 19, 2011)
RDFa 1.1 Changes New  vocab  attribute to define the default namespace for the document or subtree Profile documents  to define multiple namespace prefixes The  prefix  attribute as a recommended replacement of xmlns You can use URIs even where only CURIEs where allowed before RDFa 1.1 is backward compatible with RDFa 1.0 RDFa 1.1 is recommended if you want to use HTML5
When to use RDFa Choose microformats when you find a microformat that fits your needs and supported by your consumers Microformats are first option because they are simple Yahoo supports all major microformats, see the documentation It’s a common misconception that RDFa requires XHTML or that it’s compatible with HTML5 It’s compatible with HTML4, HTML5, XHTML If you find none that  perfectly  fits your needs then you need RDFa Microformats have a fixed schema: you can not add your own attributes Example: a social networking site with user profiles VCard is a good candidate, but for example it doesn’t have a way to express the user’s social connections You either live without this, or go with RDFa
RDFa intro: metadata in the header More info in the <html  prefix=&quot;og: http://ogp.me/ns#&quot; > <head> <title>The Trouble with Bob</title> <meta  property=&quot;og:title&quot; content=&quot;The Trouble with Bob&quot;  /> <meta  property=&quot;og:type&quot; content=&quot;text&quot;  /> <meta  property=&quot;og:image&quot; content=&quot;http://example.com/alice/bob-ugly.jpg &quot; /> ... </head>
RDFa intro: links with a flavor More info in the All content on this site is licensed under <a  rel=&quot;license&quot;  href=&quot;http://creativecommons.org/licenses/by/3.0/&quot;> a Creative Commons License </a>.
RDFa links: talking about subjects other than the page More info in the The trouble with Bob is that he takes much better photos than me:  <div  about=&quot;http://example.com/bob/photos/sunset.jpg&quot; >  <img src=&quot;http://example.com/bob/photos/sunset.jpg&quot; />  <span property=&quot;og:title&quot;>Beautiful Sunset</span> by <span property=&quot;dc:creator&quot;>Bob</span>.  </div>
RDFa links: talking about subjects other than the page More info in the <div  typeof=”foaf:Person&quot; >  <p property=”foaf:name&quot;> Alice Birpemswick </p>  <p> Email:  <a rel=”foaf:mbox” href=&quot;mailto:alice@example.com&quot;>  alice@example.com  </a>  </p>  <p> Phone:  <a rel=”foaf:phone&quot; href=&quot;tel:+1-617-555-7332&quot;>+1 617.555.7332</a>  </p>  </div>
The process of annotating with RDFa  Find a vocabulary that fits your needs and supported by your consumers A vocabulary describes a set of types and attributes within a given domain  If you don’t find a good candidate, extend an existing one or create a new one Annotate your page. Before you start, you might want to validate your page for (X)HTML conformance using the W3C’s  (X)HTML Validator  to reduce the chance of errors. Choose Document Type XHTML + RDFa. No specific tool support. If you have an HTML or XML editor that supports DTDs, you will have syntax checking and highlighting. Use the  RDFa Distiller  to validate which data can be extracted from your page. If you fancy, use the  RDF Validator  to graphically visualize the RDF graph that is outputted. Put the annotated page online The data will be extracted by Google/Bing/Yahoo the next time your page is crawled and indexed The data will be available to browser extensions, bookmarklets etc. See  http://rdfa.info/rdfa-implementations  for new tools and APIs
RDFa can be hard to get right… Validation problems can stop us from extracting data Use the W3C validator Use the right DOCTYPE declaration if using XHTML Set the encoding of your page properly (using HTTP headers or XML declaration) Prefixes need to be defined using the xmlns attribute Unless you are making statements about the document, set the subject using the about attribute Do not include HTML elements in literal values Incorrect: <div property=“foaf:name”><b>Peter Mika</b></div> Use absolute URIs as the value of the resource attribute Or make sure you specify HTML base
RDFa can be hard to get right… II. Be careful when using rel and typeof in combination because of the precedence rules BAD example: <div about=“#id”> <span property=“foaf:name“>Peter Mika</span> <span rel=“foaf:img“ typeof=“foaf:Image”> <span property=“dc:format”>jpg</span> … </span </div> To correct, you need to put the typeof inside the <span> node with rel=“foaf:img”
RDFa can be hard to get right… III. Typeof does two things at once: it creates a new subject resource and assigns the type to it BAD example: <div about=“#id”> <span property=“foaf:name“>Peter Mika</span> <span rel=“foaf:img“ resource=“http://www.example.org/photo.jpg”> <span typeof=“foaf:Image”>   <span property=“dc:format”>jpg</span> </span </span </div> To correct, you have to repeat the resource attiribute on the span node with the typeof
RDFa can be hard to get right… IV. Marking up <h1>: <h1 property=“dc:title”>My homepage</h1> NOT: <h1><div property=“dc:title”>My homepage</h1>   Marking up an image:  <span rel=”foaf:img&quot;>         <img alt=&quot;Alex&quot; src=&quot;http://example.org/alex.jpg&quot;/>  </span>  NOT: <img rel=“foaf:img” src=“photo.jpg/> Header  <meta property=“…” content=“…”> NOT  <meta name=“…” content=“…”>
RDFa can be hard to get right… V. You can not break up a description like this: <span rel=“foaf:knows&quot;>    <span property=“foaf:name&quot;>Peter Mika</span> </span> …. <span rel=“foaf:knows&quot;>    <a rel=“foaf:email“ href=“mailto:pmika@yahoo-inc.com /> </span> This is not the same as: <span rel=“foaf:knows&quot;>    <span property=“foaf:name&quot;>Peter Mika</span>       <a rel=“foaf:email“ href=“mailto:pmika@yahoo-inc.com /> </span> In the first case there are two related resources, with one attribute each, in the second case there is a single related resource with two attributes.
Tips Hiding information from being displayed Links without content will not be rendered Use <span property=“foaf:name” content=“Peter Mika”/> Use datatypes to provide the expected type of a literal. This helps validation because any tool can check whether the literal is indeed of that type.
Example: Facebook’s Like and the Open Graph Protocol The ‘Like’ button provides publishers with a way to promote their content on Facebook and build communities  Shows up in profiles and news feed Site owners can later reach users who have liked an object Facebook Graph API allows 3 rd  party developers to access the data  Open Graph Protocol is an RDFa-based format that allows to describe the object that the user ‘Likes’
Example: Facebook’s Open Graph Protocol RDF vocabulary to be used in conjunction with RDFa Simplify the work of developers by restricting the freedom in RDFa Activities, Businesses, Groups, Organizations, People, Places, Products and Entertainment Only HTML <head> accepted http://opengraphprotocol.org/ <html  xmlns:og=&quot;http://opengraphprotocol.org/schema/&quot; >  <head>  <title>The Rock (1996)</title>  <meta  property=&quot;og:title&quot;  content=&quot;The Rock&quot; />  <meta  property=&quot;og:type&quot;  content=&quot;movie&quot; />  <meta  property=&quot;og:url&quot;  content=&quot;http://www.imdb.com/title/tt0117500/&quot; />  <meta  property=&quot;og:image&quot;  content=&quot;http://ia.media-imdb.com/images/rock.jpg&quot; /> … </head> ...
Example: Yahoo! Enhanced Results (was: SearchMonkey) Guide for publishers to mark-up their pages for common types of objects Product, Local, News, Video, Events, Documents, Discussion, Games Using popular microformats and RDF vocabularies Copy-paste code  Validator Yahoo as a consumer See later
Example: Google’s Rich Snippets Google accepts popular microformats and its own RDFa vocabulary Similar approach to RDFa as Facebook Validator  to check if the markup is correct Google displays enhanced results based on this metadata Rich Snippets
Microdata example <div  itemscope itemid=“http://www.yahoo.com/resource/person ”> <p>My name is <span  itemprop=&quot;name&quot; >Neil</span>.</p> <p>My band is called  <span  itemprop =&quot;band&quot;>Four Parts Water</span>. I was born on  <time  itemprop=&quot;birthday&quot;  datetime=&quot;2009-05-10&quot;>May 10th 2009</time>. <img  itemprop=&quot;image&quot;  src=”me.png&quot; alt=”me”> </p> </div
Microdata Currently under standardization at the W3C Originally part of the HTML5 spec, but now a separate document Similar to microformats, but with the extensibility of RDFa Introduce new terms using reverse domain names or full URIs HTML5 also has a number of “semantic” elements such as <time>, <video>, <article>…
RDFa on the rise Percentage of URLs with embedded metadata in various formats 510% increase between March, 2009 and October, 2010
The state of metadata in HTML 5-10% of webpages contain some explicit metadata Depending on how you count… Too many competing approaches Too many formats: microformats vs RDFa vs Microdata When using RDFa, publishers may need to use multiple different vocabularies to satisfy everyone
Ad

More Related Content

What's hot (19)

Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
Lukas Koster
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
EUCLID project
 
It19 20140721 linked data personal perspective
It19 20140721 linked data personal perspectiveIt19 20140721 linked data personal perspective
It19 20140721 linked data personal perspective
Janifer Gatenby
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
University of Toronto Libraries - Information Technology Services
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
Cason Snow
 
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
National Information Standards Organization (NISO)
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked Data
Juan Sequeda
 
Metadata Training for Staff and Librarians for the New Data Environment
Metadata Training for Staff and Librarians for the New Data EnvironmentMetadata Training for Staff and Librarians for the New Data Environment
Metadata Training for Staff and Librarians for the New Data Environment
Diane Hillmann
 
Search Engines After The Semanatic Web
Search Engines After The Semanatic WebSearch Engines After The Semanatic Web
Search Engines After The Semanatic Web
samar_slideshare
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking up
Davide Palmisano
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Ig Bittencourt
 
Linked data for librarians
Linked data for librariansLinked data for librarians
Linked data for librarians
trevorthornton
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
Bernhard Haslhofer
 
Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010
Juan Sequeda
 
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
National Information Standards Organization (NISO)
 
Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic Web
Myungjin Lee
 
when the link makes sense
when the link makes sensewhen the link makes sense
when the link makes sense
Fabien Gandon
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
Nilesh Wagmare
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
Michael Hausenblas
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
Lukas Koster
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
EUCLID project
 
It19 20140721 linked data personal perspective
It19 20140721 linked data personal perspectiveIt19 20140721 linked data personal perspective
It19 20140721 linked data personal perspective
Janifer Gatenby
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
Cason Snow
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked Data
Juan Sequeda
 
Metadata Training for Staff and Librarians for the New Data Environment
Metadata Training for Staff and Librarians for the New Data EnvironmentMetadata Training for Staff and Librarians for the New Data Environment
Metadata Training for Staff and Librarians for the New Data Environment
Diane Hillmann
 
Search Engines After The Semanatic Web
Search Engines After The Semanatic WebSearch Engines After The Semanatic Web
Search Engines After The Semanatic Web
samar_slideshare
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking up
Davide Palmisano
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Ig Bittencourt
 
Linked data for librarians
Linked data for librariansLinked data for librarians
Linked data for librarians
trevorthornton
 
Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010
Juan Sequeda
 
Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic Web
Myungjin Lee
 
when the link makes sense
when the link makes sensewhen the link makes sense
when the link makes sense
Fabien Gandon
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
Nilesh Wagmare
 

Similar to Publishing data on the Semantic Web (20)

Year of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkeyYear of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkey
Peter Mika
 
Linked Data
Linked DataLinked Data
Linked Data
Danny Ayers
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web
Morgan Briles
 
Quick Introduction to the Semantic Web, RDFa & Microformats
Quick Introduction to the Semantic Web, RDFa & MicroformatsQuick Introduction to the Semantic Web, RDFa & Microformats
Quick Introduction to the Semantic Web, RDFa & Microformats
University of California, San Diego
 
SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n Bolts
Rinke Hoekstra
 
Hacking with Semantic Web
Hacking with Semantic WebHacking with Semantic Web
Hacking with Semantic Web
Tom Praison Praison
 
Madrid Building blocks of Linked Data
Madrid Building blocks of Linked DataMadrid Building blocks of Linked Data
Madrid Building blocks of Linked Data
Victor de Boer
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0
John Breslin
 
Building a semantic website
Building a semantic websiteBuilding a semantic website
Building a semantic website
CJ Jenkins
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
Martin Necasky
 
Lifting the Lid on Linked Data
Lifting the Lid on Linked DataLifting the Lid on Linked Data
Lifting the Lid on Linked Data
Jane Stevenson
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
Jane Stevenson
 
Web of data
Web of dataWeb of data
Web of data
Yves Raimond
 
Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2
Jenel Farrell
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Cory Lampert
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
Dan Brickley
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web Tutorial
LeeFeigenbaum
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark Greaves
Mediabistro
 
Semantic Web and Linked Open Data
Semantic Web and Linked Open DataSemantic Web and Linked Open Data
Semantic Web and Linked Open Data
University of Wisconsin-Madison
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
Laura Hollink
 
Year of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkeyYear of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkey
Peter Mika
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web
Morgan Briles
 
SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n Bolts
Rinke Hoekstra
 
Madrid Building blocks of Linked Data
Madrid Building blocks of Linked DataMadrid Building blocks of Linked Data
Madrid Building blocks of Linked Data
Victor de Boer
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0
John Breslin
 
Building a semantic website
Building a semantic websiteBuilding a semantic website
Building a semantic website
CJ Jenkins
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
Martin Necasky
 
Lifting the Lid on Linked Data
Lifting the Lid on Linked DataLifting the Lid on Linked Data
Lifting the Lid on Linked Data
Jane Stevenson
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
Jane Stevenson
 
Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2Lodlam saa 2011_jenelfarrell_2
Lodlam saa 2011_jenelfarrell_2
Jenel Farrell
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Cory Lampert
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
Dan Brickley
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web Tutorial
LeeFeigenbaum
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark Greaves
Mediabistro
 
Ad

More from Peter Mika (14)

What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?
Peter Mika
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in Practice
Peter Mika
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through Entities
Peter Mika
 
Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the Rise
Peter Mika
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistants
Peter Mika
 
Semantic Search at Yahoo
Semantic Search at YahooSemantic Search at Yahoo
Semantic Search at Yahoo
Peter Mika
 
Related Entity Finding on the Web
Related Entity Finding on the WebRelated Entity Finding on the Web
Related Entity Finding on the Web
Peter Mika
 
Hackathon s pb
Hackathon s pbHackathon s pb
Hackathon s pb
Peter Mika
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012
Peter Mika
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
Peter Mika
 
Investigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisInvestigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log Analysis
Peter Mika
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorial
Peter Mika
 
Making things findable
Making things findableMaking things findable
Making things findable
Peter Mika
 
Semantic Search Summer School2009
Semantic Search Summer School2009Semantic Search Summer School2009
Semantic Search Summer School2009
Peter Mika
 
What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?
Peter Mika
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in Practice
Peter Mika
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through Entities
Peter Mika
 
Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the Rise
Peter Mika
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistants
Peter Mika
 
Semantic Search at Yahoo
Semantic Search at YahooSemantic Search at Yahoo
Semantic Search at Yahoo
Peter Mika
 
Related Entity Finding on the Web
Related Entity Finding on the WebRelated Entity Finding on the Web
Related Entity Finding on the Web
Peter Mika
 
Hackathon s pb
Hackathon s pbHackathon s pb
Hackathon s pb
Peter Mika
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012
Peter Mika
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
Peter Mika
 
Investigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisInvestigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log Analysis
Peter Mika
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorial
Peter Mika
 
Making things findable
Making things findableMaking things findable
Making things findable
Peter Mika
 
Semantic Search Summer School2009
Semantic Search Summer School2009Semantic Search Summer School2009
Semantic Search Summer School2009
Peter Mika
 
Ad

Recently uploaded (20)

Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 

Publishing data on the Semantic Web

  • 1. Publishing Data on the Semantic Web Peter Mika Researcher, Data Architect Yahoo! Research
  • 2. Intro to the Semantic Web
  • 3. Vague, but exciting… Berners-Lee and the dawn of the Web
  • 4. Semantic Web Publish information in a way that is easier to process for machines Web of Data instead of Web of Documents Two main architectural challenges A common format for sharing data Sharing the meaning of data Through social means (shared schemas) By using powerful schema languages Semantic Web standards from W3C Languages (RDF, OWL, RIF) Serializations (RDF/XML, RDFa) Protocols (SPARQL, HTTP) Semantic Web research into knowledge representation and reasoning, data integration, data quality and many other topics Community efforts to publish data and develop schemas
  • 5. RDF (Resource Description Framework) The basic data model of the Semantic Web A universal model to capture all sorts of data: networks, relational, object-oriented… Basic unit of information is a triple A tuple of (subject, predicate, object) Example: (Joe, loves, Mary) Each triple gives the value of a property for a given resource or relates two objects to one another Object is either a resource or a literal An RDF model is a set of triples Ordering of statements in an RDF document is irrelevant (unlike XML)
  • 6. Resources vs. literals Resources are identified by a URI or otherwise the are called a blank node URIs are a generalization of URLs Notation: <http://www.example.org/Person> or ex:Person Literals have an optional language and datatype (string, integer etc.) Literals can not be subjects of statements Datatypes are identified by URIs, e.g. XML Schema datatypes Two literals are the same if their components are the same Notation: “Joe B.” or Joe@en^^http://…#string
  • 7. Advanced topic: Resources vs Literals Resources are objects, Literals are strings Resources are instances of classes, Literals have datatypes Whether something is a resource or literal sometimes depends on the detail of modeling <meta property=“myvocab:knows”>Paris Hilton</meta> <item rel=“foaf:knows”> <meta property=“foaf:name”>Paris Hilton</meta> </item> You cannot make statements about literals (literals are always the object in a triple) Resources can carry a globally unique identifier, literals have no identity Web resources such as documents and images are resources <item rel=“rdfs:seeAlso” resource=“http://www.some.related.page.com/”/> <item rel=“foaf:img” resource=“http://photosite.example.org/photo.jpg”/> When in doubt: it’s a resource
  • 8. Graphical and textual notation A number of ways to serialize an RDF model into an RDF document RDF/XML, Turtle, N3, N-Triples Example: http://www.cs.vu.nl/~pmika/foaf.rdf my:Joe “ Joe A.” name foaf:Person type
  • 9. Informational versus non-informational resources Informational resource: an HTML document, image, any other file on the Web Retrievable in its entirety from the Web Retrieving it can return a 200 OK Conceptual (non-informational) resource: a person, an event, a place, etc. A description of it may be retrievable from the Web When identified by a URL, retrieving it should return a 303 Redirect Never confuse a webpage with what it describes! You are not your Facebook profile: one is a document, the other is a person. A document has properties such as byte-size, media-type etc, a person has name, age, etc. Make sure you don’t use the URL of an existing webpage as the URI of a resource
  • 10. Vocabularies (ontologies) Ontologies are collections of classes and properties used to describe objects in a particular domain OWL (the Web Ontology Language) is the standard ontology language OWL has an RDF serialization: ontologies are part of the Semantic Web Classes can be described by sub- and superclasses, required properties Class membership in RDF is expressed using the rdf:type property An instance can have multiple classes (types) A class can have multiple superclasses Properties can be described by their domain, range, cardinalities, etc.
  • 11. RDF is designed for distributed systems URIs provide web-wide global identification across documents A resource may be described by multiple documents We know it’s the same resource because the same URI is used or through reasoning (advanced topic…) URIs are intented to be reused Unique, but not single identifiers: two URIs may denote the same thing URIs are dereferencable (can be retrieved) A well-behaved URI returns a description of the resource Provides authority: the definition of foaf:Person lives at that URI Ontologies can be looked up as well Typically at the root of the URIs, also known as the namespace Example: http://xmlns.com/foaf/0.1/Person redirects to the specification
  • 12. URIs implicitly link data together (#joe, #name, “Joe A.”) (#joe, #email, mailto:[email protected]) (#mary, name, “Mary B.”) (#mary, gender, “female”) (#joe, #loves, #mary) Joe’s homepage A dating site Mary’s homepage (#name, #type, #Property) (#name, #domain, #Person) Schema doc
  • 13. Put together, triples form a single ‘global’ graph “ Joe A.” #joe #name “ [email protected]” #email #mary #loves “ Mary B.” “ female” #name #gender
  • 14. Publishing for the Semantic Web
  • 15. Motivation Why publish data on the (Semantic) Web? In a business context Increase the potential for linking, reuse and aggregation Drive traffic back from other sites on the Web Pre-competitive data integration (e.g. drug discovery) Make your data more easily findable Drive traffic from search engines In a non-profit context Increase industry or government transparency, accountability Support research and education by making data accessible
  • 16. Publishing and consuming data on the Semantic Web Publishing data involves Deciding in which format to publish your data Deciding which schema (ontology, vocabulary) to use OR you can create a new schema and publish it as well Multiple ways of publishing RDF data: Linked Data Metadata in HTML SPARQL endpoints Feeds GRDDL Automated tools Note: you may implement more than one
  • 17. Option 1: Linked Data A web of RDF documents in parallel to the current Web Most often implemented as wrappers around databases or APIs The four rules of Linked Data: Use URIs to identify things. Use HTTP URIs so that these things can be referred to and looked up (&quot; dereference &quot;) by people and user agents . Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF-XML. Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web. . . . #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
  • 18. Option 1: Linked Data Advantages: No change to the publishing of the HTML documents Data can be published by third party (e.g. Dbpedia) Disadvantages: Web servers need to be configured to properly handle URIs that identify concepts instead of documents Not favored by search engines Lack of use cases Crawling needs to be changed Authority is difficult to determine Tools Triple stores (Virtuoso, Oracle etc.) and front-ends ( Pubby ) RDB-to-RDF mappers (e.g. D2RQ , Triplify ) Validators ( Vapour ) Linked Data browsers ( many )
  • 19. Linked Data as a movement Rapidly growing community effort to (re)publish open datasets as Linked Data In particular, scientific and government datasets see linkeddata.org
  • 20. Option 2: Metadata in HTML Using microformats, RDFa, Microdata (more later) Advantages: Data and document are always in sync Browser plug-in friendly Search engine friendly Copy-paste friendly Tools: XML editors (e.g. Oxygen) Triplr RDFa Distiller RDFa bookmarklet Ubiquity RDFa plugin Optimus microformat parser Examples: many, including SlideShare, YouTube, LinkedIn, Digg, Myspace, Facebook… Peter Mika was born in Budapest. Peter Mika was born in Budapest. #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
  • 21. Option 3: SPARQL endpoints An API for accessing RDF databases on the Web A query language and an HTTP protocol Advantages: Flexible access: make any query you want Also possible to expose a traditional RDBMs via a wrapper Disadvantages: For the publisher: cost of supporting arbitrary queries For the search engine: discovery of SPARQL servers is unsolved Tools: Triple stores (Oracle, Virtuoso, Sesame, Jena, OWLIM etc.) RDB-to-RDF mappers such as D2RQ and Triplify #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
  • 22. Option 4: Feeds Disadvantages: No standard feed format for RDF: data needs to be formatted and often manually submitted for each search engine Advantages Submit your data without making it public Competing and incompatible formats DataRSS (Yahoo!) Google Data Protocol Open Data Protocol (Microsoft) . #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
  • 23. Option 5: Publishing a transformation of the data Publish the rule to transform the HTML to structured data GRDDL is a standard for linking an HTML page to a transformation that produces RDF data Advantages No change to the page Disadvantages Transformation needs to be executed to get to the data Not much support by search engines Tools Intel MashMaker Dapper Glue API from AdaptiveBlue <XSLT> xx yy 1 2
  • 24. Option 6: Automatic markup Web services that annotate HTML automatically Advantages No manual effort Disadvantages Limited to finding relevant entities in text Tools OpenCalais Zemanta API Peter Mika was born in Budapest. <person>Peter Mika</person> was born in <location>Budapest</location>.
  • 25. Example: Zemanta A personal writing assistant for bloggers Plugin for popular blogging platforms and web mail clients Analyzes text as you type and suggests hyperlinks, tags, categories, images and related articles API available with the same functionality
  • 26. Choosing a vocabulary No vocabularies in many domains Books, movies, stuff people care about… Too many competing proposals in other domains Often versions of the same proposal Example: vocabularies for microformats Not maintained I cannot maintain your vocabulary for you Limited tool support Too many expert tools until now Many vocabularies are not designed for annotation Missing meeting point and social process An ontology is a shared , formal representation of a domain
  • 27. Choosing a vocabulary Search the Web or ask for advice on mailing lists [email_address] [email_address] Wikis semanticweb.org vocamp.org Beware of people who claim to have the vocabulary of everything Preferably you want something small and targeted Never a 100% fit  you will need to introduce vocabulary terms (classes and properties) Do not introduce new classes/properties in existing namespaces Example: the namespace http://xmlns.com/foaf/0.1/ is used by the FOAF project. Try not to introduce a new term without contacting the owner, i.e. the membership of the FOAF mailing list.
  • 28. Advanced topic: creating a vocabulary Get advice on methodology vocamp.org and semanticweb.org Choose a namespace and a prefix Give sensible names, e.g. name it after your site, but don’t call it searchmonkey Namespace ends either with a slash or a hash Create an RDF or OWL document describing your classes and properties Use an ontology editor such as Protégé 4.0 Follow naming conventions Publish your vocabulary Make sure the URIs of your properties and classes are resolvable E.g. myvocab:digicam should resolve to a document containing the definition of myvocab:digicam Convince others to adopt your vocabulary If you are in fishing, convince other fishing businesses
  • 29. How do we build communities? www.vocamp.org
  • 31. Brief history of the Annotated Web 1995: HTML meta tags 1996: Simple HTML Ontology Extensions (SHOE) 1998: RDF/XML RDF/XML in HTML RDF linked from HTML 2003: Web 2.0 Tagging Microformats Metadata in Wikipedia Machine tags in Flickr 2005: eRDF 2008: RDFa 1.0 2011: RDFa 1.1 2012: Microdata?
  • 32. HTML meta tags <HTML> <HEAD profile=&quot;http://dublincore.org/documents/dcq-html/&quot;> <META name=&quot;DC.author &quot; content=&quot; Peter Mika &quot;> <LINK rel=&quot;DC.rights copyright&quot; href=&quot; http://www.example.org/rights.html &quot; /> <LINK rel=&quot;meta&quot; type=&quot;application/rdf+xml&quot; title=&quot;FOAF&quot; href= &quot; http://www.cs.vu.nl/~pmika/foaf.rdf &quot;> </HEAD> … </HTML>
  • 33. SHOE example (Hefflin & Hendler, 1996) <ONTOLOGY &quot;our-ontology&quot; VERSION=&quot;1.0&quot;> <ONTOLOGY-EXTENDS &quot;organization-ontology&quot; VERSION=&quot;2.1&quot; PREFIX=&quot;org&quot; URL=&quot;http://www.ont.org/orgont.html&quot;> <ONTDEF CATEGORY=&quot;Person&quot; ISA=&quot;org.Thing&quot;> <ONTDEF RELATION=&quot;lastName&quot; ARGS=&quot;Person STRING&quot;> <ONTDEF RELATION=&quot;firstName&quot; ARGS=&quot;Person STRING&quot;> <ONTDEF RELATION=&quot;marriedTo&quot; ARGS=&quot;Person Person&quot;> <ONTDEF RELATION=&quot;employee&quot; ARGS=&quot;org.Organization Person&quot;> </ONTOLOGY > <HEAD> <META HTTP-EQUIV=&quot;Instance-Key&quot; CONTENT=&quot;http://www.cs.umd.edu/~george&quot;> <USE-ONTOLOGY &quot;our-ontology&quot; VERSION=&quot;1.0&quot; PREFIX=&quot;our&quot; URL=&quot;http://ont.org/our-ont.html&quot;> </HEAD> <BODY> <CATEGORY &quot;our.Person&quot;> <RELATION &quot;our.marriedTo&quot; TO=&quot;http://www.cs.umd.edu/~helena&quot;> <RELATION &quot;our.employee&quot; FROM=&quot;http://www.cs.umd.edu&quot;> My name is <ATTRIBUTE &quot;our.firstName&quot;> George </ATTRIBUTE> <ATTRIBUTE &quot;our.lastName&quot;> Cook </ATTRIBUTE> and I live at...
  • 36. SHOE Graphical Query Interface
  • 37. Example: Creative Commons Embedding CC license in HTML (now deprecated): <HTML> <HEAD>… </HEAD> <BODY> … <!–- <rdf:RDF xmlns=&quot;http://creativecommons.org/ns#&quot; xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot; xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;> <Work rdf:about=&quot;http://www.yergler.net/averages/&quot;> <dc:title>The Law of Averages</dc:title> <dc:description>...because eventually i&apos;ll be right...</dc:description> <license rdf:resource=&quot;http://creativecommons.org/licenses/by-nc/1.0/&quot; /> </Work> <License rdf:about=&quot;http://creativecommons.org/licenses/by-nc/1.0/&quot;> <requires rdf:resource=&quot;http://web.resource.org/cc/Notice&quot; /> <permits rdf:resource=&quot;http://web.resource.org/cc/Reproduction&quot; /> <permits rdf:resource=&quot;http://web.resource.org/cc/Distribution&quot; /> <prohibits rdf:resource=&quot;http://web.resource.org/cc/CommercialUse&quot; /> </License> </rdf:RDF> -->
  • 38. Example: Creative Commons Current: rel attribute (HTML4) This work is licensed under a <a rel=&quot;license&quot; href=&quot;http://creativecommons.org/licenses/by/3.0/us/&quot;>Creative Commons Attribution 3.0 United States License</a>. Use of the “rel” attribute for semantic annotation is the birth of the microformat…
  • 39. Microformats (μf) Agreements on the way to encode certain kinds metadata in HTML Reuse of semantic-bearing HTML elements Based on existing standards Minimality Microformats exist for a limited set of objects hCard (persons and organizations) hCalendar (events) hResume hProduct hRecipe Varying degrees of support and stability hCard and rel-tag are widely supported Community centered around microformats.org Specifications and discussions are hosted there
  • 40. Microformats: limitations No shared syntax Each microformat has a separate syntax tailored to the vocabulary No formal schemas Limited reuse, extensibility of schemas Unclear which combinations are allowed No datatypes No namespaces, unique identifiers (URIs) no interlinking mapping between instances is required Always appears in the HTML <body>
  • 41. Example: the hCard microformat <cite class=&quot;vcard&quot; > <a class=&quot;fn url&quot; rel=&quot;friend colleague met” href=&quot;http://meyerweb.com/&quot;> Eric Meyer</a> </cite> wrote a post (<cite> <a href=&quot;http://meyerweb.com/eric/thoughts/2005/12/16/tax-relief/&quot;> Tax Relief</a></cite>) about an unintentionally humorous letter he received from the <span class=&quot;vcard” > <a class=&quot;fn org url&quot; href=&quot;http://irs.gov/&quot;> Internal Revenue Service</a> </span>. <div class=&quot;vcard&quot; > <a class=&quot;email fn&quot; href=&quot;mailto:[email protected]&quot;>Joe Friday</a> <div class=&quot;tel&quot; >+1-919-555-7878</div> <div class=&quot;title&quot; >Area Administrator, Assistant</div> </div>
  • 42. RDFa W3C standard for embedding RDF data in HTML documents A set of new HTML attributes to be used in head or body A specification of how to extract the data from these attributes RDFa is just a syntax, you have to choose a vocabulary separately RDFa 1.0 is a W3C Recommendation since October, 2008 RDFa Primer RDFa 1.1 is a small update on RDFa to make it easier to use Currently Working Draft (March 31, 2011) Updated version of the RDFa Primer (April 19, 2011) RDFa API for accessing RDFa data in a webpage in the browser from JavaScript Currently Working Draft (April 19, 2011)
  • 43. RDFa 1.1 Changes New vocab attribute to define the default namespace for the document or subtree Profile documents to define multiple namespace prefixes The prefix attribute as a recommended replacement of xmlns You can use URIs even where only CURIEs where allowed before RDFa 1.1 is backward compatible with RDFa 1.0 RDFa 1.1 is recommended if you want to use HTML5
  • 44. When to use RDFa Choose microformats when you find a microformat that fits your needs and supported by your consumers Microformats are first option because they are simple Yahoo supports all major microformats, see the documentation It’s a common misconception that RDFa requires XHTML or that it’s compatible with HTML5 It’s compatible with HTML4, HTML5, XHTML If you find none that perfectly fits your needs then you need RDFa Microformats have a fixed schema: you can not add your own attributes Example: a social networking site with user profiles VCard is a good candidate, but for example it doesn’t have a way to express the user’s social connections You either live without this, or go with RDFa
  • 45. RDFa intro: metadata in the header More info in the <html prefix=&quot;og: http://ogp.me/ns#&quot; > <head> <title>The Trouble with Bob</title> <meta property=&quot;og:title&quot; content=&quot;The Trouble with Bob&quot; /> <meta property=&quot;og:type&quot; content=&quot;text&quot; /> <meta property=&quot;og:image&quot; content=&quot;http://example.com/alice/bob-ugly.jpg &quot; /> ... </head>
  • 46. RDFa intro: links with a flavor More info in the All content on this site is licensed under <a rel=&quot;license&quot; href=&quot;http://creativecommons.org/licenses/by/3.0/&quot;> a Creative Commons License </a>.
  • 47. RDFa links: talking about subjects other than the page More info in the The trouble with Bob is that he takes much better photos than me: <div about=&quot;http://example.com/bob/photos/sunset.jpg&quot; > <img src=&quot;http://example.com/bob/photos/sunset.jpg&quot; /> <span property=&quot;og:title&quot;>Beautiful Sunset</span> by <span property=&quot;dc:creator&quot;>Bob</span>. </div>
  • 48. RDFa links: talking about subjects other than the page More info in the <div typeof=”foaf:Person&quot; > <p property=”foaf:name&quot;> Alice Birpemswick </p> <p> Email: <a rel=”foaf:mbox” href=&quot;mailto:[email protected]&quot;> [email protected] </a> </p> <p> Phone: <a rel=”foaf:phone&quot; href=&quot;tel:+1-617-555-7332&quot;>+1 617.555.7332</a> </p> </div>
  • 49. The process of annotating with RDFa Find a vocabulary that fits your needs and supported by your consumers A vocabulary describes a set of types and attributes within a given domain If you don’t find a good candidate, extend an existing one or create a new one Annotate your page. Before you start, you might want to validate your page for (X)HTML conformance using the W3C’s (X)HTML Validator to reduce the chance of errors. Choose Document Type XHTML + RDFa. No specific tool support. If you have an HTML or XML editor that supports DTDs, you will have syntax checking and highlighting. Use the RDFa Distiller to validate which data can be extracted from your page. If you fancy, use the RDF Validator to graphically visualize the RDF graph that is outputted. Put the annotated page online The data will be extracted by Google/Bing/Yahoo the next time your page is crawled and indexed The data will be available to browser extensions, bookmarklets etc. See http://rdfa.info/rdfa-implementations for new tools and APIs
  • 50. RDFa can be hard to get right… Validation problems can stop us from extracting data Use the W3C validator Use the right DOCTYPE declaration if using XHTML Set the encoding of your page properly (using HTTP headers or XML declaration) Prefixes need to be defined using the xmlns attribute Unless you are making statements about the document, set the subject using the about attribute Do not include HTML elements in literal values Incorrect: <div property=“foaf:name”><b>Peter Mika</b></div> Use absolute URIs as the value of the resource attribute Or make sure you specify HTML base
  • 51. RDFa can be hard to get right… II. Be careful when using rel and typeof in combination because of the precedence rules BAD example: <div about=“#id”> <span property=“foaf:name“>Peter Mika</span> <span rel=“foaf:img“ typeof=“foaf:Image”> <span property=“dc:format”>jpg</span> … </span </div> To correct, you need to put the typeof inside the <span> node with rel=“foaf:img”
  • 52. RDFa can be hard to get right… III. Typeof does two things at once: it creates a new subject resource and assigns the type to it BAD example: <div about=“#id”> <span property=“foaf:name“>Peter Mika</span> <span rel=“foaf:img“ resource=“http://www.example.org/photo.jpg”> <span typeof=“foaf:Image”> <span property=“dc:format”>jpg</span> </span </span </div> To correct, you have to repeat the resource attiribute on the span node with the typeof
  • 53. RDFa can be hard to get right… IV. Marking up <h1>: <h1 property=“dc:title”>My homepage</h1> NOT: <h1><div property=“dc:title”>My homepage</h1>   Marking up an image: <span rel=”foaf:img&quot;>         <img alt=&quot;Alex&quot; src=&quot;http://example.org/alex.jpg&quot;/>  </span> NOT: <img rel=“foaf:img” src=“photo.jpg/> Header <meta property=“…” content=“…”> NOT <meta name=“…” content=“…”>
  • 54. RDFa can be hard to get right… V. You can not break up a description like this: <span rel=“foaf:knows&quot;>    <span property=“foaf:name&quot;>Peter Mika</span> </span> …. <span rel=“foaf:knows&quot;>    <a rel=“foaf:email“ href=“mailto:[email protected] /> </span> This is not the same as: <span rel=“foaf:knows&quot;>    <span property=“foaf:name&quot;>Peter Mika</span>    <a rel=“foaf:email“ href=“mailto:[email protected] /> </span> In the first case there are two related resources, with one attribute each, in the second case there is a single related resource with two attributes.
  • 55. Tips Hiding information from being displayed Links without content will not be rendered Use <span property=“foaf:name” content=“Peter Mika”/> Use datatypes to provide the expected type of a literal. This helps validation because any tool can check whether the literal is indeed of that type.
  • 56. Example: Facebook’s Like and the Open Graph Protocol The ‘Like’ button provides publishers with a way to promote their content on Facebook and build communities Shows up in profiles and news feed Site owners can later reach users who have liked an object Facebook Graph API allows 3 rd party developers to access the data Open Graph Protocol is an RDFa-based format that allows to describe the object that the user ‘Likes’
  • 57. Example: Facebook’s Open Graph Protocol RDF vocabulary to be used in conjunction with RDFa Simplify the work of developers by restricting the freedom in RDFa Activities, Businesses, Groups, Organizations, People, Places, Products and Entertainment Only HTML <head> accepted http://opengraphprotocol.org/ <html xmlns:og=&quot;http://opengraphprotocol.org/schema/&quot; > <head> <title>The Rock (1996)</title> <meta property=&quot;og:title&quot; content=&quot;The Rock&quot; /> <meta property=&quot;og:type&quot; content=&quot;movie&quot; /> <meta property=&quot;og:url&quot; content=&quot;http://www.imdb.com/title/tt0117500/&quot; /> <meta property=&quot;og:image&quot; content=&quot;http://ia.media-imdb.com/images/rock.jpg&quot; /> … </head> ...
  • 58. Example: Yahoo! Enhanced Results (was: SearchMonkey) Guide for publishers to mark-up their pages for common types of objects Product, Local, News, Video, Events, Documents, Discussion, Games Using popular microformats and RDF vocabularies Copy-paste code Validator Yahoo as a consumer See later
  • 59. Example: Google’s Rich Snippets Google accepts popular microformats and its own RDFa vocabulary Similar approach to RDFa as Facebook Validator to check if the markup is correct Google displays enhanced results based on this metadata Rich Snippets
  • 60. Microdata example <div itemscope itemid=“http://www.yahoo.com/resource/person ”> <p>My name is <span itemprop=&quot;name&quot; >Neil</span>.</p> <p>My band is called <span itemprop =&quot;band&quot;>Four Parts Water</span>. I was born on <time itemprop=&quot;birthday&quot; datetime=&quot;2009-05-10&quot;>May 10th 2009</time>. <img itemprop=&quot;image&quot; src=”me.png&quot; alt=”me”> </p> </div
  • 61. Microdata Currently under standardization at the W3C Originally part of the HTML5 spec, but now a separate document Similar to microformats, but with the extensibility of RDFa Introduce new terms using reverse domain names or full URIs HTML5 also has a number of “semantic” elements such as <time>, <video>, <article>…
  • 62. RDFa on the rise Percentage of URLs with embedded metadata in various formats 510% increase between March, 2009 and October, 2010
  • 63. The state of metadata in HTML 5-10% of webpages contain some explicit metadata Depending on how you count… Too many competing approaches Too many formats: microformats vs RDFa vs Microdata When using RDFa, publishers may need to use multiple different vocabularies to satisfy everyone

Editor's Notes

  • #40: Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns