SlideShare a Scribd company logo
Making the Web Searchable Peter Mika  Researcher, Data Architect Yahoo! Research
Yahoo! Research (research.yahoo.com)
Yahoo! Research Barcelona Established January, 2006 Led by Ricardo Baeza-Yates Research areas Web Mining  content, structure, usage Distributed Web retrieval  Multimedia retrieval NLP and Semantics
Yahoo! by numbers  (April, 2007) There are approximately  500 million users  of Yahoo! branded services, meaning we reach 50 percent – or  1 out of every 2 users  – online, the largest audience on the Internet (Yahoo! Internal Data). Yahoo! is the most visited site online with nearly  4 billion visits  and  an average of 30 visits per user per month in the U.S.  and leads all competitors in audience reach, frequency and engagement (comScore Media Metrix, US, Feb. 2007). Yahoo! accounts for the largest share of time Americans spend on the Internet with 12 percent (comScore Media Metrix, US, Feb. 2007) and  approximately 8 percent of the world’s online time  (comScore WorldMetrix, Feb. 2007).  Yahoo! is the #1 home page  with 85 million average daily visitors on Yahoo! homepages around the world, an increase of nearly 5 million visitors in a month (comScore WorldMetrix, Feb. 2007).  Yahoo!’s social media properties (Flickr, delicious, Answers, 360, Video, MyBlogLog, Jumpcut and Bix) have  115 million unique visitors worldwide  (comScore WorldMetrix, Feb. 2007). Yahoo! Answers is the largest collection of human knowledge on the Web with more than 90 million unique users and  250 million answers  worldwide (Yahoo! Internal Data).  There are more than  450 million photos  in Flickr in total and  1 million photos  are uploaded daily. 80 percent of the photos are public (Yahoo! Internal Data).  Yahoo! Mail is the #1 Web mail provider in the world  with 243 million users (comScore WorldMetrix, Feb. 2007) and nearly 80 million users in the U.S. (comScore Media Metrix, US, Feb. 2007)  Interoperability between Yahoo! Messenger and Windows Live Messenger has formed the largest IM community approaching 350 million user accounts (Yahoo! Internal Data). Yahoo! Messenger is the most popular in time spent  with an average of 50 minutes per user, per day (comScore WorldMetrix, Feb. 2007). Nearly 1 in 10 Internet users is a member of a  Yahoo! Groups  (Yahoo! Internal Data). Yahoo! is one of only 26 companies to be on both the Fortune 500 list and the Fortune’s “Best Place to Work” List (2006).
Agenda Part 1 Publishing content on the Semantic Web Intro to RDF and the Semantic Web Six ways to publish data on the Semantic Web History of embedded metadata on the Web RDFa, best practices and tools Exercise Part 2 Semantic Web in use SearchMonkey BOSS and YQL Semantic Search and Navigation Part 3 Research in Semantic Search
Motivation Why publish data on the Semantic Web? Multiply the value of your data by increasing content agility The potential for reuse and aggregation with other datasets Make your data more easily findable Why develop applications using semantic technologies? Content agility means you can more rapidly develop applications by reusing and recombining data. Content agility leads to increased agility and robustness of your application.
Intro to the Semantic Web
Basic RDF RDF has two basic types of entities: resources and literals Roughly objects and built-in types in Object Oriented Programming Resources are identified by a URI or otherwise called a blank node URIs are a generalization of URLs Notation:  <http://www.example.org/Person>  or  ex:Person Literals have an optional language and datatype (string, integer etc.) Datatypes are identified by URIs, e.g. XML Schema datatypes Two literals are the same if their components are the same Notation:  “Joe B.”  or  Joe@en^^http://…#string
RDF models  A triple aka a statement is a tuple of (subject, predicate, object) Example: (Joe, loves, Mary) Each triple gives the value of a property for a given resource or relates two objects to one another A predicate is always a resource with a URI A triple is also called a statement An RDF model is a set of triples Ordering of statements in an RDF document is irrelevant (unlike XML)
Graphical and textual notation A number of text-based interchange formats for RDF RDF/XML, Turtle, N3, N-Triples Example: http://www.cs.vu.nl/~pmika/foaf.rdf my:Joe “ Joe A.” name foaf:Person type
Ontologies Ontologies are collections of classes and properties used to describe objects in a particular domain Ontologies themselves are described in RDF or OWL (the Web Ontology Language), an extension of RDF Example: the Friend-Of-A-Friend (FOAF) ontology for personal profiles Classes can be described by sub- and superclasses, required properties Class membership in RDF is expressed using the rdf:type property An instance can have multiple classes (types) A class can have multiple superclasses Properties can be described by their domain, range, cardinalities, etc.
Advanced topic: Resources vs Literals Resources are objects, Literals are strings Resources are instances of classes, Literals have datatypes Whether something is a resource or literal sometimes depends on the detail of modeling <meta property=“myvocab:knows”>Paris Hilton</meta> <item rel=“foaf:knows”> <meta property=“foaf:name”>Paris Hilton</meta> </item> You cannot make statements about literals (literals are always the object in a triple) Resources can carry a globally unique identifier, literals have no identity Web resources such as documents and images are resources <item rel=“rdfs:seeAlso” resource=“http://www.some.related.page.com/”/> <item rel=“foaf:img” resource=“http://photosite.example.org/photo.jpg”/> When in doubt: it’s a resource
Advanced Topic: Informational resources vs. Conceptual resources Informational resource: an HTML document, image, any other file on the Web Retrievable in its entirety from the Web Retrieving it can return a 200 OK Conceptual (non-informational) resource: a person, an event, a place, etc. A description of it may be retrievable from the Web When identified by a URL, retrieving it should return a 303 Redirect Never confuse a webpage with what it describes! You are not your Facebook profile: one is a document, the other is a person. A document has properties such as byte-size, media-type etc, a person has name, age, etc. Make sure you don’t use the URL of an existing webpage as the URI of a resource
RDF is designed for distributed systems URIs provide web-wide global identification across documents A resource may be described by multiple documents We know it’s the same resource because the same URI is used or through reasoning (advanced topic…) URIs are intented to be reused Unique, but not single identifiers: two URIs may denote the same thing URIs are dereferencable (can be retrieved) A well-behaved URI returns a description of the resource  Provides authority: the definition of foaf:Person lives at that URI Ontologies can be looked up as well Typically at the root of the URIs, also known as the namespace Example:  http://xmlns.com/foaf/0.1/Person  redirects to the specification
URIs implicitly link data together (#joe, #name, “Joe A.”) (#joe, #email, mailto:joe@joe.com) (#mary, name, “Mary B.”) (#mary, gender, “female”) (#joe, #loves, #mary) Joe’s homepage A dating site Mary’s homepage (#name, #type, #Property) (#name, #domain, #Person) Schema doc Linked Data : Following links from one document to another allows to discover the entire graph (data and ontologies)
When put together, they form a single ‘global’ graph “ Joe A.” #joe #name “ joe@joe.com” #email #mary #loves “ Mary B.” “ female” #name #gender
The even larger picture: entire datasets connected
Publishing data on the Web
RDF on the Web II. Six ways of publishing RDF  Standalone files (static or dynamically generated) Metadata inside webpages SPARQL endpoints Feeds XSLT/GRDDL Automated tools Note: these are non-exclusive
Option 1: Standalone RDF documents RDF documents linked to other RDF documents Use rdfs:seeAlso to point to a related document It says: Go and look at that document if you want to know more Advantages:  No change to the publishing of the HTML documents Data can be published by third party Tools RDB-to-RDF mappers such as D2RQ or Triplify Linked Data browsers  Examples: Most datasets in the Linked Data cloud . . . #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
Option 1: cntd. For discovery, the metadata is often linked from HTML pages < link  rel=&quot;meta&quot;  type=&quot;application/rdf+xml&quot;  title=&quot;FOAF&quot; href=&quot;http://www.cs.vu.nl/~pmika/foaf.rdf&quot; /> Additional advantages:  Discovery from the webpage It’s clear that the metadata is a machine representation of the human-targeted content of the page Examples: FOAF profiles, BestBuy . Peter Mika was born in Budapest. #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
Option 2: Metadata inside web pages Using microformats, RDFa, MicroData (more later) Advantages: No separate database export required Browser plug-in friendly Search engine friendly Copy-paste friendly Tools:  XML editors (e.g. Oxygen) Triplr RDFa Distiller RDFa  bookmarklet Ubiquity RDFa plugin Optimus  microformat  parser Examples: many, including SlideShare, YouTube, LinkedIn, Digg, Myspace, Facebook… Peter Mika was born in Budapest. Peter Mika was born in Budapest. #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
Option 3: SPARQL endpoints Query access to your RDF database Similar to exposing your database on the Web and giving someone read-only SQL access Advantages: Most flexible and best performing access from a consumer perspective Tools:  Triple stores (Oracle, Virtuoso, Sesame, Jena, OWLIM etc.) RDB-to-RDF mappers such as D2RQ and Triplify #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
Option 4: feeds The equivalent of a database dump No standard feed format for RDF Advantages Submit your data without making it public Yahoo! consumes: DataRSS GoogleBase  feeds NewsML Submit your feed using  SiteExplorer . #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
Option 5: XSLT Publish the transformation from HTML to structured data GRDDL  is a standard for linking an HTML page to a transformation that produces RDF data Advantages No change to the page Disadvantages Transformation needs to be executed to get to the data Tools Intel  MashMaker Dapper Glue API from  AdaptiveBlue <XSLT> xx yy 1 2
Option 6: Automatic markup Restricted mostly to tagging entities with identifiers Advantages Less manual effort Disadvantages Limited to finding relevant entities in text Tools OpenCalais Zemanta API Peter Mika was born in Budapest. <person>Peter Mika</person> was born in <location>Budapest</location>.
Example:  Zemanta A personal writing assistant for bloggers Plugin for popular blogging platforms and web mail clients Analyzes text as you type and suggests hyperlinks, tags, categories, images and related articles API available with the same functionality
Metadata in HTML
Brief history of the Annotated Web 1995: HTML meta tags 1996: Simple HTML Ontology Extensions (SHOE) 1998: RDF/XML RDF/XML in HTML RDF linked from HTML 2003: Web 2.0 Tagging Microformats Metadata in Wikipedia Machine tags in Flickr 2005: eRDF  2008: RDFa
HTML meta tags <HTML> <HEAD profile=&quot;http://dublincore.org/documents/dcq-html/&quot;> <META  name=&quot;DC.author &quot; content=&quot; Peter Mika &quot;> <LINK  rel=&quot;DC.rights  copyright&quot; href=&quot; http://www.example.org/rights.html &quot; />  <LINK  rel=&quot;meta&quot;  type=&quot;application/rdf+xml&quot; title=&quot;FOAF&quot;    href= &quot; http://www.cs.vu.nl/~pmika/foaf.rdf &quot;>  </HEAD>  … </HTML>
SHOE example  (Hefflin & Hendler, 1996)  <ONTOLOGY &quot;our-ontology&quot; VERSION=&quot;1.0&quot;>  <ONTOLOGY-EXTENDS &quot;organization-ontology&quot; VERSION=&quot;2.1&quot; PREFIX=&quot;org&quot; URL=&quot;http://www.ont.org/orgont.html&quot;>  <ONTDEF CATEGORY=&quot;Person&quot; ISA=&quot;org.Thing&quot;>  <ONTDEF RELATION=&quot;lastName&quot; ARGS=&quot;Person STRING&quot;>  <ONTDEF RELATION=&quot;firstName&quot; ARGS=&quot;Person STRING&quot;>  <ONTDEF RELATION=&quot;marriedTo&quot; ARGS=&quot;Person Person&quot;>  <ONTDEF RELATION=&quot;employee&quot; ARGS=&quot;org.Organization Person&quot;> </ONTOLOGY >   <HEAD> <META HTTP-EQUIV=&quot;Instance-Key&quot; CONTENT=&quot;http://www.cs.umd.edu/~george&quot;>  <USE-ONTOLOGY &quot;our-ontology&quot; VERSION=&quot;1.0&quot; PREFIX=&quot;our&quot; URL=&quot;http://ont.org/our-ont.html&quot;>  </HEAD> <BODY> <CATEGORY &quot;our.Person&quot;>  <RELATION &quot;our.marriedTo&quot; TO=&quot;http://www.cs.umd.edu/~helena&quot;>  <RELATION &quot;our.employee&quot;   FROM=&quot;http://www.cs.umd.edu&quot;>   My name is  <ATTRIBUTE &quot;our.firstName&quot;>  George  </ATTRIBUTE> <ATTRIBUTE &quot;our.lastName&quot;> Cook </ATTRIBUTE>  and I live at...
SHOE system
SHOE Text-based query interface
SHOE Graphical Query Interface
Example: Creative Commons Embedding CC license in HTML (now deprecated): <HTML> <HEAD>… </HEAD> <BODY> … <!–-   <rdf:RDF xmlns=&quot;http://creativecommons.org/ns#&quot; xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot; xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;> <Work rdf:about=&quot;http://www.yergler.net/averages/&quot;> <dc:title>The Law of Averages</dc:title> <dc:description>...because eventually i&apos;ll be right...</dc:description> <license rdf:resource=&quot;http://creativecommons.org/licenses/by-nc/1.0/&quot; /> </Work> <License rdf:about=&quot;http://creativecommons.org/licenses/by-nc/1.0/&quot;> <requires rdf:resource=&quot;http://web.resource.org/cc/Notice&quot; /> <permits rdf:resource=&quot;http://web.resource.org/cc/Reproduction&quot; /> <permits rdf:resource=&quot;http://web.resource.org/cc/Distribution&quot; /> <prohibits rdf:resource=&quot;http://web.resource.org/cc/CommercialUse&quot; /> </License> </rdf:RDF> -->
Example: Creative Commons Current: rel attribute (HTML4) This work is licensed under a <a  rel=&quot;license&quot;  href=&quot;http://creativecommons.org/licenses/by/3.0/us/&quot;>Creative Commons Attribution 3.0 United States License</a>. Use of the “rel” attribute for semantic annotation is the birth of the microformat…
Microformats (μf) Community centered around microformats.org Specifications and discussions are hosted there Agreements on the way to encode certain kinds metadata in HTML Reuse of semantic-bearing HTML elements Based on existing standards Minimality Microformats exist for a limited set of objects hCard (persons and organizations) hCalendar (events) hResume hProduct hRecipe Varying degrees of support and stability hCard and rel-tag are widely supported
Microformats: limitations No shared syntax Each microformat has a separate syntax tailored to the vocabulary  No formal schemas Limited reuse, extensibility of schemas Unclear which combinations are allowed No datatypes No  namespaces, unique  identifiers  (URIs)  no interlinking mapping between instances is required  Relationship to page context is often unclear
Example: microformats <cite class=&quot; vcard &quot;> <a class=&quot; fn url &quot; rel=&quot;friend colleague met&quot;  href=&quot;http://meyerweb.com/&quot;> Eric Meyer </a> </cite>  wrote a post   ( <cite> <a href=&quot;http://meyerweb.com/eric/thoughts/2005/12/16/tax-relief/&quot;> Tax Relief </a></cite> ) about an unintentionally humorous letter  he received from the   <span class=&quot; vcard &quot;> <a class=&quot; fn org url &quot; href=&quot;http://irs.gov/&quot;> Internal Revenue Service </a> </span>.  <div class=&quot; vcard &quot;>  <a class=&quot; email fn &quot; href=&quot;mailto:jfriday@host.com&quot;> Joe Friday </a>  <div class=&quot; tel &quot;> +1-919-555-7878 </div>  <div class=&quot; title &quot;> Area Administrator, Assistant </div>  </div>
Microformats vs. RDFa Choose microformats when you find a microformat that fits your needs and supported by Yahoo! Microformats are first option because they are simple We support all major microformats, see the documentation It’s a common misconception that RDFa requires XHTML: it doesn’t If you find none that  perfectly  fits your needs then you need RDFa Microformats have a fixed schema: you can not add your own attributes Example: a social networking site with user profiles VCard is a good candidate, but for example it doesn’t have a way to express the user’s social connections You either live without this, or go with RDFa The rest of this presentation is about RDFa, which is thus more powerful, but also more complex We will focus on the concepts that are hard to grasp
Keep an eye on HTML5 Currently under standardization at the W3C Last Call this fall, keep an eye on it Introduces Microdata Similar to microformats Some predefined vocabularies with central registration Some of the flexibility of RDFa Introduce new terms using reverse domain names or full URIs Semantic HTML elements such as <time>, <video>, <article>…
Microdata example <div item> <p>My name is <span itemprop=&quot; name &quot;> Neil </span>.</p> <p>My band is called  <span itemprop=&quot; band &quot;> Four Parts Water </span>. I was born on  <time itemprop=&quot; birthday &quot; datetime=&quot; 2009-05-10 &quot;>May 10th 2009</time>. <img itemprop=&quot; image &quot; src=” me.png &quot; alt=”me”> </p> </div
Slides courtesy of Mark Birbeck Introduction to RDFa
What does RDFa look like? There are some metadata features in HTML already... ...so we give them an RDF interpretation... ...then we generalise them... ...and then we add a few more.
HTML's metadata features (1) <html>  <head>    <title>RDFa: Now everyone can have an API</title>    <meta name=&quot;author&quot; content=&quot;Mark Birbeck&quot; />    <meta name=&quot;created&quot; content=&quot;2009-05-09&quot; />    <link rel=&quot;license&quot;      href=&quot;http://creativecommons.org/licenses/by-sa/3.0/&quot; />  </head>  .  .  . </html>
HTML's metadata features (2) <a href=&quot;http://creativecommons.org/licenses/by-sa/3.0/&quot;  >CC Attribution-ShareAlike</a> <a rel=&quot;license&quot;   href=&quot;http://creativecommons.org/licenses/by-sa/3.0/&quot;  >CC Attribution-ShareAlike</a>
RDFa extends @rel/@href to images <img src=&quot;image01.png&quot; rel=&quot;license&quot;   href=“http://creativecommons.org/licenses/by-sa/3.0/” /> <img src=&quot;image02.png&quot; rel=&quot;license&quot;   href=“http://creativecommons.org/licenses/by-sa/3.0/” />
RDFa extends meta/@content to body <html>  <head>    <title>RDFa: Now everyone can have an API</title>    <meta name=&quot;author&quot; content=&quot;Mark Birbeck&quot; />    <meta name=&quot;created&quot; content=&quot;2009-05-09&quot; />  </head>  <body>    <h1>RDFa: Now everyone can have an API</h1>    Author: <em>Mark Birbeck</em>    Created: <em>May 9th, 2009</em>  </body> </html>
RDFa extends meta/@content to body <html>  <head>    <title>RDFa: Now everyone can have an API</title>  </head>  <body>    <h1>RDFa: Now everyone can have an API</h1>    Author: <em property=&quot;author&quot; content=&quot;Mark Birbeck&quot;     >Mark Birbeck</em>    Created: <em property=&quot;created&quot; content=&quot;2009-05-09&quot;     >May 9th, 2009</em>  </body> </html>
RDFa extends meta/@content to body <html>  <head>    <title>RDFa: Now everyone can have an API</title>  </head>  <body>    <h1>RDFa: Now everyone can have an API</h1>    Author: <em property=&quot;author&quot;     >Mark Birbeck</em>    Created: <em property=&quot;created&quot; content=&quot;2009-05-09&quot;     >May 9th, 2009</em>  </body> </html>
Vocabularies use CURIEs <html xmlns:dc=&quot;http://purl.org/dc/terms/&quot;>   <head>    <title>RDFa: Now everyone can have an API</title>  </head>  <body>    <h1>RDFa: Now everyone can have an API</h1>    Author: <em property=&quot;dc:creator&quot;     >Mark Birbeck</em>    Created: <em property=&quot;dc:created&quot; content=&quot;2009-05-09&quot;     >May 9th, 2009</em>  </body> </html>
CURIEs, or Compact URIs Named after Marie Curie, who was the first person to receive two Nobel prizes, one for physics and one for chemistry. CURIEs allow a full URI to be expressed in a simple prefix:suffix form. The 'suffix' part is looser than in XML namespaces, supporting formulations such as abc:123.
Properties can also apply to images <img src=&quot;image01.png” rel=&quot;license&quot;   href=“http://creativecommons.org/licenses/by-sa/3.0/” /> <img src=&quot;image02.png” rel=&quot;license&quot;   href=&quot;http://creativecommons.org/licenses/by-sa/3.0/” />
Properties can also apply to images <img src=&quot;image01.png&quot; rel=&quot;license&quot;   href=&quot;http://creativecommons.org/licenses/by-sa/3.0/&quot; property=&quot;dc:creator&quot; content=&quot;Mark Birbeck” /> <img src=&quot;image02.png&quot; rel=&quot;license&quot;   href=&quot;http://creativecommons.org/licenses/by-sa/3.0/&quot;  property=&quot;dc:creator&quot; content=&quot;Mark Birbeck&quot; />
Relationships and properties on anything <a   href=&quot;http://www.slideshare.net/mark.birbeck/the-5-minute-guide-to-rdfain-only-6-minutes-40-seconds&quot;  >The 5 minute guide to RDFa...in only 6 minutes and 40 seconds</a>
Relationships and properties on anything <a rel=&quot;license&quot;   href=&quot;http://www.slideshare.net/mark.birbeck/the-5-minute-guide-to-rdfain-only-6-minutes-40-seconds&quot;  >The 5 minute guide to RDFa...in only 6 minutes and 40 seconds</a> Doesn't say what we want.
Relationships and properties on anything <a    href=&quot;http://www.slideshare.net/mark.birbeck/the-5-minute-guide-to-rdfain-only-6-minutes-40-seconds&quot;  >The 5 minute guide to RDFa...in only 6 minutes and 40 seconds</a> is licensed under <a   href=&quot;http://creativecommons.org/licenses/by-sa/2.5/&quot;   >CC BY SA</a>.
Relationships and properties on anything <a    href=&quot;http://www.slideshare.net/mark.birbeck/the-5-minute-guide-to-rdfain-only-6-minutes-40-seconds&quot;  >The 5 minute guide to RDFa...in only 6 minutes and 40 seconds</a> is licensed under <a about=&quot;http://www.slideshare.net/mark.birbeck/the-5-minute-guide-to-rdfain-only-6-minutes-40-seconds&quot;   rel=&quot;license&quot;   href=&quot;http://creativecommons.org/licenses/by-sa/2.5/&quot;   >CC BY SA</a>.
Relationships and properties on anything <a    href=&quot;http://www.slideshare.net/mark.birbeck/the-5-minute-guide-to-rdfain-only-6-minutes-40-seconds&quot;  >The 5 minute guide to RDFa...in only 6 minutes and 40 seconds</a> is licensed under <a about=&quot;http://www.slideshare.net/mark.birbeck/the-5-minute-guide-to-rdfain-only-6-minutes-40-seconds&quot;   rel=&quot;license&quot;   href=&quot;http://creativecommons.org/licenses/by-sa/2.5/&quot;   property=&quot;dc:creator&quot; content=&quot;Mark Birbeck> CC BY SA </a>.
@about sets context <div about=&quot;http://www.slideshare.net/mark.birbeck/the-5-minute-guide-to-rdfain-only-6-minutes-40-seconds&quot;>     <h1>The 5 minute guide to RDFa...</h1>    Author: <em property=&quot;dc:creator&quot;     >Mark Birbeck</em>    Created: <em property=&quot;dc:created&quot; content=&quot;2009-05-09&quot;     >May 9th, 2009</em> </div>
@about sets context <html xmlns:dc=&quot;http://purl.org/dc/terms/&quot;>   <head>    <title>RDFa: Now everyone can have an API</title>   </head>  <body>     <h1>RDFa: Now everyone can have an API</h1>        Author: <em property=&quot;dc:creator&quot;     >Mark Birbeck</em>    Created: <em property=&quot;dc:created&quot; content=&quot;2009-05-09&quot;     >May 9th, 2009</em>  </body> </html>
Basics of RDFa generalise HTML's existing semantic features; add support for CURIEs for property and relationship names; add @about.
Advanced RDFa use of @datatype to set the data type of @content; use of @typeof to set rdf:type; support for bnodes; support for XML literals; ability to chain statements together. Note that since RDFa supports all of the features you'll find in RDF, then it means that you can even mark-up OWL documents in HTML.
The process of annotating with RDFa  Invest in familiarizing with the RDFa syntax by reading the  RDFa Primer It is also highly recommended that you read the  RDF Primer . RDF is the data model used by RDFa. Choose a vocabulary from the SearchMonkey documentation that fits your needs A vocabulary describes a set of types and attributes within a given domain  If you don’t fin d a good candidate , extend an existing one or create a new one Annotate your page. Before you start, you might want to validate your page for (X)HTML conformance using the W3C’s  (X)HTML Validator  to reduce the chance of errors. Choose Document Type XHTML + RDFa. No specific tool support. If you have an HTML or XML editor that supports DTDs, you will have syntax checking and highlighting. Use the  RDFa Distiller  to validate which data can be extracted from your page. If you fancy, use the  RDF Validator  to graphically visualize the RDF graph that is outputted. Put the annotated page online. The data will extracted the next time your page is crawled No need to explicitly submit anything No notification when your site is crawled See  http://rdfa.info/rdfa-implementations  for new tools and APIs
RDFa pitfalls Validation problems can stop us from extracting data Use the W3C validator Use the right DOCTYPE declaration if using XHTML Set the encoding of your page properly (using HTTP headers or XML declaration) Prefixes need to be defined using the xmlns attribute Unless you are making statements about the document, set the subject using the about attribute Do not include HTML elements in literal values Incorrect: <div property=“foaf:name”><b>Peter Mika</b></div> Use absolute URIs as the value of the resource attribute Or make sure you specify HTML base
More pitfalls: precedence rules Be careful when using rel and typeof in combination because of the precedence rules BAD example: <div about=“#id”> <span property=“foaf:name“>Peter Mika</span> <span rel=“foaf:img“ typeof=“foaf:Image”> <span property=“dc:format”>jpg</span> … </span </div> To correct, you need to put the typeof inside the <span> node with rel=“foaf:img”
More pitfalls: the typeof attribute Typeof does two things at once: it creates a new subject resource and assigns the type to it BAD example: <div about=“#id”> <span property=“foaf:name“>Peter Mika</span> <span rel=“foaf:img“ resource=“http://www.example.org/photo.jpg”> <span typeof=“foaf:Image”>   <span property=“dc:format”>jpg</span> </span </span </div> To correct, you have to repeat the resource attiribute on the span node with the typeof
HTML markup pitfalls Marking up <h1>: <h1 property=“dc:title”>My homepage</h1> NOT: <h1><div property=“dc:title”>My homepage</h1>   Marking up an image: <a href=&quot;http://example.org/user/alex&quot;>      <span about=&quot;#user1&quot; rel=&quot;foaf:img media:image&quot;>         <img alt=&quot;Alex&quot; src=&quot;http://example.org/photos/alex.jpg&quot;/>      </span>  </a>  This doesn’t work: <img rel=“foaf:img” src=“photo.jpg/> In the header you need <meta property=“…” content=“…”> NOT  <meta name=“…” content=“…”>
More pitfalls: breaking up descriptions You can not break up a description like this: <span rel=“foaf:knows&quot;>    <span property=“foaf:name&quot;>Peter Mika</span> </span> …. <span rel=“foaf:knows&quot;>    <a rel=“foaf:email“ href=“mailto:pmika@yahoo-inc.com /> </span> This is not the same as: <span rel=“foaf:knows&quot;>    <span property=“foaf:name&quot;>Peter Mika</span>       <a rel=“foaf:email“ href=“mailto:pmika@yahoo-inc.com /> </span> In the first case there are two related resources, with one attribute each, in the second case there is a single related resource with two attributes.
Tips Hiding information from being displayed Links without content will not be rendered Use <span property=“foaf:name” content=“Peter Mika”/> Use datatypes to provide the expected type of a literal. This helps validation because any tool can check whether the literal is indeed of that type.
Choosing a vocabulary Look at SearchMonkey objects Video, Games, Presentations, Events, News, Businesses, Products, Discussion Search the Web or ask for advice on mailing lists [email_address] [email_address] Beware of people who claim to have the vocabulary of everything Preferably you want something small and targeted Never a 100% fit    you will need to introduce vocabulary terms (classes and properties) Do not introduce new classes/properties in existing namespaces Example: the namespace http://xmlns.com/foaf/0.1/ is used by the FOAF project. Try not to introduce a new term without contacting the owner, i.e. the membership of the FOAF mailing list.
Advanced topic: creating a vocabulary Get advice on methodology vocamp.org and semanticweb.org Choose a namespace and a prefix Give sensible names, e.g. name it after your site, but don’t call it searchmonkey Namespace ends either with a slash or a hash Create   an RDF or OWL document describing your classes and properties Use an ontology editor such as Protégé 4.0 Follow naming conventions Publish your vocabulary Make sure the URIs of your properties and classes are resolvable E.g. myvocab:digicam should resolve to a document containing the definition of myvocab:digicam Convince others to adopt your vocabulary If you are in fishing, convince other fishing businesses
Exercise Explore data on the Web Microformats Search for pages on Yahoo using searchmonkey:com.yahoo.page.uf.hcard Try Operator Firefox Plug-in Try Optimus RDFa Search for pages on Yahoo using searchmonkey:com.yahoo.page.rdf.rdfa Try RDFa bookmarklet Try RDFa Distiller Mark up your webpage using RDFa See process on previous slides
Semantic Web in Use
Microsearch Metadata is out there Just how much data is out there? What is the quality? Idea: bring metadata to the surface of search How does it work? User enters query Metadata is extracted dynamically Entity reconciliation  Metadata is used to display rich abstracts,  related pages  spatial, temporal visualization Microsearch prototype
Example: ivan herman Related pages based on metadata Events from personal calendar, Conferences, and  bio from LinkedIn Geolocation Rich abstract
Example: peter site:flickr.com Flickr users named “Peter” by geography
Example: san francisco conference Conferences in San Francisco by date
Example: greater st. peter Save to  address book Call phone number (other actions)
Lessons More metadata than we expected 53% of unique queries have at least one metadata-enabled page in top 10 (n=7848) Performance is poor Metadata needs to come from the index for performance ‘ Metacrap’ does exist Users  have to  see metadata to spot mistakes in their markup, warn others RDF templating (Fresnel) adds complexity Abstract needs to be customized to the particular site, query
Applications Yahoo’s SearchMonkey and Google’s Rich Snippets BOSS and YQL Semantic search and navigation
Creating an ecosystem of publishers, developers and end-users  Motivating and helping publishers to implement semantic annotation  Providing tools for developers to create compelling applications Focusing on end-user experience  Rich abstracts as a first application Addressing the long tail of query and content production Standard Semantic Web technology dataRSS  = Atom + RDFa Industry standard vocabularies http://developer.yahoo.com/searchmonkey/ SearchMonkey
Before After an open platform for using structured data to build more useful and relevant search results What is SearchMonkey?
image deep links name/value pairs or abstract Enhanced Result
YAHOO! CONFIDENTIAL |  Infobar
SearchMonkey Acme.com’s database Index RDF/Microformat Markup site owners/publishers share structured data with Yahoo!.  1 consumers customize their search experience with Enhanced Results or Infobars 3 site owners & third-party developers build SearchMonkey apps. 2 DataRSS feed Web Services Page Extraction Acme.com’s  Web Pages
Standard enhanced results Embed markup in your page, get an enhanced results without any programming
Documentation Simple and advanced, examples, copy-paste code, validator
DataRSS An Atom extension for structured data Why a new format? A feed format is required by publishers Exclusive content (e.g. partnerships, paid inclusion) No changes necessary to the web page No standard named graph format for the Semantic Web Needed to capture meta-metadata such as source and timestamp of information  Not really a new format An Atom extension Use any RDFa parser to get the triples out cf. Google Base feeds
DataRSS <?profile http://search.yahoo.com/searchmonkey-profile ?> <feed xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xsi:schemaLocation=&quot;http://www.w3.org/2005/Atom ../latest/xsd/datarss.xsd“> <id>http://www.linkedin.com/datarss/</id> <author> <name>Peter Mika (pmika@yahoo-inc.com)</name> </author> <title>Example data feed for social</title> <updated>2007-11-14T04:05:06+07:00</updated> <entry> <!-- title field of entry is not used for anything --> <title>Peter Mika</title> <!--URL of the webpage extracted from --> <id>http://www.linkedin.com/ppl/webprofile?id=5054019</id> <updated>2007-11-14T04:05:06+07:00</updated> <content type=&quot;application/xml&quot;> <y:adjunct version=&quot;1.0&quot; name=&quot;social-simple&quot; xmlns:y=&quot;http://search.yahoo.com/datarss/&quot;> <y:item rel=&quot;dc:subject&quot;> <y:type typeof=&quot;foaf:Person&quot;> <y:meta property=&quot;foaf:name&quot;>John Doe</y:meta> <y:meta property=&quot;foaf:gender&quot;>male</y:meta> <y:item rel=&quot;foaf:homepage&quot; resource=&quot;http://www.joeisageek.com&quot;/> <y:item rel=&quot;foaf:mbox&quot; resource=&quot;mailto:johndoe@example.org&quot;/> <y:item rel=&quot;foaf:weblog&quot; resource=&quot;http://johnblog.example.org&quot;/> <y:item rel=&quot;foaf:knows&quot;> <y:type typeof=&quot;foaf:Person&quot;> <y:meta property=&quot;foaf:name&quot;>Jane Doe</y:meta> <y:meta property=&quot;foaf:gender&quot;>female</y:meta> <y:item rel=&quot;foaf:mbox&quot; resource=&quot;mailto:janedoe@example.org&quot;/> </y:type> </y:item> </y:type> </y:item> </y:adjunct> </entry> </feed> Atom 1.0 XML + RDFa
The data part <adjunct version=&quot;1.0&quot; id=“com.yahoo.page.rdfa&quot; xmlns=&quot;http://search.yahoo.com/datarss/“  updated=“2007-11-14T04:05:06+07:00”> <item rel=&quot;dc:subject&quot;> <type  typeof =&quot;foaf:Person&quot;> <meta  property =&quot;foaf:name&quot;>John Doe</meta> <meta property=&quot;foaf:gender&quot;>male</meta> <item  rel =&quot;foaf:homepage&quot;  resource =&quot;http://www.joeisageek.com&quot;/> <item rel=&quot;foaf:mbox&quot; resource=&quot;mailto:johndoe@example.org&quot;/> <item rel=&quot;foaf:weblog&quot; resource=&quot;http://johnblog.example.org&quot;/> <item rel=&quot;foaf:knows&quot;> <type typeof=&quot;foaf:Person&quot;> <meta property=&quot;foaf:name&quot;>Jane Doe</meta> <meta property=&quot;foaf:gender&quot;>female</meta> <item rel=&quot;foaf:mbox&quot; resource=&quot;mailto:janedoe@example.org&quot;/> </type> </item> </type> </item> </adjunct>
Developer tool: create custom presentations
Developer tool
Developer tool
Developer tool
Developer tool
Gallery
Example apps  LinkedIn hCard plus feed data Creative Commons by Ben Adida CC in RDFa
Example apps. II. Other me by Dan Brickley Google Social Graph API wrapped using a Web Service
Google’s Rich Snippets Shares a subset of the features of SearchMonkey Encourages publishers to embed certain microformats and RDFa into webpages Currently reviews, people, products, business & organizations These are used to generate richer search results SearchMonkey is customizable Developers can develop applications themselves SearchMonkey is open Wide support for standard vocabularies API access
API access to metadata Yahoo BOSS & YQL
BOSS: Build your Own Search Service Ability to re-order results and blend-in addition content No restrictions on presentation No branding or attribution Access to multiple verticals (web search, image, news) 40+ supported language and region pairs Pricing (BOSS) Pay-by-usage 10,000 queries a day still free Serve any ads you want For more info, http://developer.yahoo.com/search/boss/
BOSS API to structured data Simple HTTP GET calls, no authentication You need an Application ID: register at developer.yahoo.com/search/boss/ http://boss.yahooapis.com/ysearch/web/v1/{query}?appid={appid}&format=xml&view=searchmonkey_feed Restrict your query using special words searchmonkey:com.yahoo.page.uf.{format} {format} is one of hcard, hcalendar, tag, adr, hresume etc. searchmonkey:com.yahoo.page.rdf.rdfa
Demo: resume search Search pages with resume data and given keywords   {keyword} searchmonkey:com.yahoo.page.uf.hresume Parse the results as DataRSS (XML) Extract information and display using YUI
Demo
Yahoo Query Language (YQL) Query web APIs as virtual tables  Mash-up data by joining tables Add an API by adding a table definition Example: select my friends and sort by nickname
PHP example : select the last 100 photos from Flickr with the word Austin <?php $url = &quot;http://query.yahooapis.com/v1/public/yql?q=&quot;; $q  = &quot;select * from flickr.photos.search(100) where text=’Austin'&quot;; $fmt = &quot;xml&quot;; $x = simplexml_load_file($url.urlencode($q).&quot;&format=$fmt&quot;); foreach($x->attributes('http://www.yahooapis.com/v1/base.rng') as $k=>$v) { $$k=(string)$v; } echo <<<EOB $count photos fetched from {$x->diagnostics->url} in  {$x->diagnostics->url['execution-time']} seconds<br> EOB; $flickr = &quot;http://static.flickr.com/&quot;; foreach($x->results->photo as $p) { echo &quot;<img src=\&quot;$flickr{$p['server']}/{$p['id']}_{$p['secret']}_s.jpg\&quot;/>\n&quot;; } ?>
YQL example  ( source )
That’s all there is to it! <?php $root = 'http://query.yahooapis.com/v1/public/yql?q='; $city = 'Barcelona'; $loc = 'Barcelona'; $yql = 'select * from html where url = \'http://en.wikipedia.org/wiki/'.$city.'\' and xpath=&quot;//div[@id=\'bodyContent\']/p&quot; limit 3'; $url = $root . urlencode($yql) . '&format=xml'; $info = getstuff($url); $info = preg_replace(&quot;/.*<results>|<\/results>.*/&quot;,'',$info); $info = preg_replace(&quot;/<\?xml version=\&quot;1\.0\&quot;&quot;. &quot; encoding=\&quot;UTF-8\&quot;\?>/&quot;,'',$info); $info = preg_replace(&quot;//&quot;,'',$info); $info = preg_replace(&quot;/\&quot;\/wiki/&quot;,'&quot;http://en.wikipedia.org/wiki',$info); $yql = 'select * from upcoming.events.bestinplace(5) where woeid in '. '(select woeid from geo.places where text=&quot;'.$loc.'&quot;)'. ' | unique(field=&quot;description&quot;)'; $url = $root . urlencode($yql) . '&format=json'; $events = getstuff($url); $events = json_decode($events); foreach($events->query->results->event as $e){ $evHTML.='<li><h3><a href=&quot;'.$e->ticket_url.'&quot;>'.$e->name.'</a></h3><p>'. substr($e->description,0,100).'&hellip;</p></li>'; } $yql = 'select * from flickr.photos.info where photo_id in '. '(select id from flickr.photos.search where woe_id in '. '(select woeid from geo.places where text=&quot;'.$loc.'&quot;)) limit 16'; $url = $root . urlencode($yql) . '&format=json'; $photos = getstuff($url); $photos = json_decode($photos); foreach($photos->query->results->photo as $s){ $src = &quot;http://farm{$s->farm}.static.flickr.com/{$s->server}/&quot;. &quot;{$s->id}_{$s->secret}_s.jpg&quot;; $phHTML.='<li><a href=&quot;'.$s->urls->url->content.'&quot;><img alt=&quot;'. $s->title.'&quot; src=&quot;'.$src.'&quot;></a></li>'; } $yql='select description from rss where '. ' url=&quot;http://weather.yahooapis.com/forecastrss?p=SPXX0015&u=c&quot;'; $url = $root . urlencode($yql) . '&format=json'; $weather = getstuff($url); $weather = json_decode($weather); $weHTML = $weather->query->results->item->description; function getstuff($url){ $curl_handle = curl_init(); curl_setopt($curl_handle, CURLOPT_URL, $url); curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2); curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1); $buffer = curl_exec($curl_handle); curl_close($curl_handle); if (empty($buffer)){ return 'Error retrieving data, please try later.'; } else { return $buffer; } }?>
Semantic Search and Navigation
Browsing Linked Data Browse the Linked Data graph by going from one URI to the next OpenLink’s Linked Data browser OpenLink’s Data Explorer Tabulator Disco   Marbles
Semantic Search Engines Natural Language search engines Hakia Powerset  (now built into Bing) TrueKnowledge Structured data search engines Searching open web data Sindice Sigma Searching closed world data Wolfram Alpha  (closed structured data + computation)
Exercise Build a SearchMonkey application Try a few YQL queries Try Zemanta online  You don’t need to install the plugin  Find data on the Web using search or navigation
Research on Semantic Search
Semantic Search Def.  matching the user’s query with the Web’s content at a conceptual level, often with the help of world knowledge Related disciplines Semantic Web, IR, Databases, NLP, IE As a field ISWC/ESWC/ASWC, WWW, SIGIR Exploring Semantic Annotations in Information Retrieval (ECIR08, WSDM09) Semantic Search Workshop (ESWC08, WWW09) Future of Web Search (FoWS09)
Hard searches Ambiguous searches Paris Hilton Multimedia search Images of Paris Hilton Imprecise or overly precise searches  Publications by Jim Hendler Find images of strong and adventurous people (Lenat) Searches for descriptions Search for yourself without using your name Product search (ads!) Searches that require aggregation Size of the Eiffer tower (Lenat) Public opinion on Britney Spears Queries that require a deeper understanding of the query, the content and/or the world at large Note:  some of these are so hard that users don’t even try them any more
Not just search…
In the best of cases… Matching the query intent with the document metadata can be trivial: <adjunct id=&quot;com.yahoo.query.intent&quot; version=&quot;0.5&quot;>  <type typeof=&quot; fb:music.artist foaf:Person &quot;> <meta property=&quot; foaf:name &quot;> Madonna </meta> </type>  </adjunct>  <adjunct id=&quot;com.yahoo.page.hcard&quot; version=&quot;0.5&quot;>  <type typeof=“ foaf:Person &quot;> <meta property=&quot; foaf:name &quot;> Madonna </meta> </type>  </adjunct>  Query: Document metadata: dna_checksum:AF514FE45DD33BB7CD8DCCC89AA dna_checksum:AF514FE45DD33BB7CD8DCCC89AA
Semantics at every step of the IR process bla bla bla? q=“bla” * 3 Document processing bla bla bla Ranking Query processing Result presentation The IR engine The Web bla bla bla bla bla bla “ bla” θ (q,d)
Study: improving text analysis using structured data Problems  Creating training data manually is expensive Existing taggers trained on financial-political news Idea:  Learn the correspondence between entities in text and metadata  Extend knowledge base and generate in-domain training data Learning to Tag and Tagging to Learn: A Case Study on Wikipedia  Peter Mika; Massimiliano Ciaramita; Hugo Zaragoza; Jordi Atserias, IEEE Intelligent Systems, 2008, 5
Study: processing metadata using cloud computing Question: Can we use Pig to effectively query and reason with large amounts of RDF data? Mapping SPARQL to PigLatin Forward-chaining RDF(S) reasoning Acknowledgement: Ben Reed Experimental results (LUBM) Useful for long running queries and reasoning Not useful for interactive queries (< 100 s) Co-organizing: Billion Triples Challenge at ISWC 2008, October 26-28, Karlsruhe, Germany Web Semantics in the Clouds  Peter Mika; Giovanni Tummarello, IEEE Intelligent Systems, 2008, 5
Study: metadata analysis What vocabularies are being used?  .  What microformats should we support?   How much vocabulary reuse/extension there is? Is there a convergence? What is the quality of metadata?   Datatype conformance  Logical consistency Conformance to common use wrt common attributes  How much spam is there?   Distribution of spamicity scores Do spamicity scores transfer to metadata?  Are there new schemas emerging through the combination of existing vocabularies?   What is the metadata coverage in terms of queries?   What percentage of queries from query logs would result in metadata?  How many would result in metadata that could answer the query? (by some approximation)
Study: Semantic Search Assist Observation: the same type of objects often have the same query context Users asking for the same aspect of the type Could we make query suggestions based on the type of the entity? Improvement for infrequent queries apple ipod nano  review  sony plasma tv  review jerry yang  biography  biography  tim berners lee   tim berners lee  blog peter mika  yahoo britney spears  shaves her head
Study: evaluation of semantic search Analysis of user needs How are these needs aligned with data on the Web? How do the vocabularies differ? Analysis of query types Object queries? Object-attribute queries? Relationship queries? What it means for an object or a set of triples to be relevant to a query? Show me the answer and only the answer Put me near the answer in the graph Show me the justification (or at least the source) of the answer … Semantic Search evaluation campaign planned for 2010
Challenges Future work in Semantic Web (Semi-)automated ways of metadata creation How do we go from 5% to 95%? Data quality We allow providing metadata for other people’s sites! Reasoning To what extent is reasoning useful? For example, how much would entity resolution or taxonomic reasoning help? Scale How do we exploit cluster computing techniques? What is between databases and IR engines? Fostering social agreements How do we get people to reuse vocabularies?
Challenges Future work in IR Query interpretation Ranking with metadata Evaluation of semantic search Personalization Semantic ads Constraints Users still want to see a document Keyword-based search cannot suffer Whole page relevance, monetization can only increase Established expectations  Query entry Result presentation
Contact Peter Mika [email_address] Come to Barcelona and stop by SearchMonkey developer.yahoo.com/searchmonkey/ mailing lists [email_address] [email_address] forums http://suggestions.yahoo.com/searchmonkey Semantic Web FAQ http://devel.yahoo.com/searchmonkey/smguide/faq.html
the monkey is out!
Application: query intent Paris Hilton is a person!
Application: query intent #2 Hugo is a person!
Ad

More Related Content

What's hot (20)

Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
Lukas Koster
 
Get on the Linked Data Web!
Get on the Linked Data Web!Get on the Linked Data Web!
Get on the Linked Data Web!
Armin Haller
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
Carl Hess
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
Sören Auer
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
Bernhard Haslhofer
 
Linked data for librarians
Linked data for librariansLinked data for librarians
Linked data for librarians
trevorthornton
 
Hacking with Semantic Web
Hacking with Semantic WebHacking with Semantic Web
Hacking with Semantic Web
Tom Praison Praison
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
Jon Voss
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking up
Davide Palmisano
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
Cason Snow
 
Metadata Training for Staff and Librarians for the New Data Environment
Metadata Training for Staff and Librarians for the New Data EnvironmentMetadata Training for Staff and Librarians for the New Data Environment
Metadata Training for Staff and Librarians for the New Data Environment
Diane Hillmann
 
Linked Data Technology and Status
Linked Data Technology and StatusLinked Data Technology and Status
Linked Data Technology and Status
Myungjin Lee
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
University of Toronto Libraries - Information Technology Services
 
when the link makes sense
when the link makes sensewhen the link makes sense
when the link makes sense
Fabien Gandon
 
The Semantic Web #4 - RDF (1)
The Semantic Web #4 - RDF (1)The Semantic Web #4 - RDF (1)
The Semantic Web #4 - RDF (1)
Myungjin Lee
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
adameq
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Ig Bittencourt
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
Armin Haller
 
Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic Web
Myungjin Lee
 
Semantic web assignment1
Semantic web assignment1Semantic web assignment1
Semantic web assignment1
BarryK88
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
Lukas Koster
 
Get on the Linked Data Web!
Get on the Linked Data Web!Get on the Linked Data Web!
Get on the Linked Data Web!
Armin Haller
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
Carl Hess
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
Sören Auer
 
Linked data for librarians
Linked data for librariansLinked data for librarians
Linked data for librarians
trevorthornton
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
Jon Voss
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking up
Davide Palmisano
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
Cason Snow
 
Metadata Training for Staff and Librarians for the New Data Environment
Metadata Training for Staff and Librarians for the New Data EnvironmentMetadata Training for Staff and Librarians for the New Data Environment
Metadata Training for Staff and Librarians for the New Data Environment
Diane Hillmann
 
Linked Data Technology and Status
Linked Data Technology and StatusLinked Data Technology and Status
Linked Data Technology and Status
Myungjin Lee
 
when the link makes sense
when the link makes sensewhen the link makes sense
when the link makes sense
Fabien Gandon
 
The Semantic Web #4 - RDF (1)
The Semantic Web #4 - RDF (1)The Semantic Web #4 - RDF (1)
The Semantic Web #4 - RDF (1)
Myungjin Lee
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
adameq
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Ig Bittencourt
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
Armin Haller
 
Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic Web
Myungjin Lee
 
Semantic web assignment1
Semantic web assignment1Semantic web assignment1
Semantic web assignment1
BarryK88
 

Similar to Semantic Web Austin Yahoo (20)

DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0
John Breslin
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011
sssw2011
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorial
Peter Mika
 
Making things findable
Making things findableMaking things findable
Making things findable
Peter Mika
 
Data Portability with SIOC and FOAF
Data Portability with SIOC and FOAFData Portability with SIOC and FOAF
Data Portability with SIOC and FOAF
Uldis Bojars
 
George thomas gtra2010
George thomas gtra2010George thomas gtra2010
George thomas gtra2010
George Thomas
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
Martin Necasky
 
Linked Data
Linked DataLinked Data
Linked Data
Danny Ayers
 
Future of Web 2.0 & The Semantic Web
Future of Web 2.0 & The Semantic WebFuture of Web 2.0 & The Semantic Web
Future of Web 2.0 & The Semantic Web
is20090
 
Semantic Web in Action
Semantic Web in ActionSemantic Web in Action
Semantic Web in Action
Sebastian Ryszard Kruk
 
ontology.ppt
ontology.pptontology.ppt
ontology.ppt
Prerak10
 
Web of data
Web of dataWeb of data
Web of data
Yves Raimond
 
Making the Web searchable
Making the Web searchableMaking the Web searchable
Making the Web searchable
Peter Mika
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
Jane Stevenson
 
Resource Browser
Resource BrowserResource Browser
Resource Browser
Sheila MacNeill
 
Semantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information SpacesSemantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information Spaces
John Breslin
 
Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashups
giurca
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?
milesw
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web
Morgan Briles
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
Laura Hollink
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0
John Breslin
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011
sssw2011
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorial
Peter Mika
 
Making things findable
Making things findableMaking things findable
Making things findable
Peter Mika
 
Data Portability with SIOC and FOAF
Data Portability with SIOC and FOAFData Portability with SIOC and FOAF
Data Portability with SIOC and FOAF
Uldis Bojars
 
George thomas gtra2010
George thomas gtra2010George thomas gtra2010
George thomas gtra2010
George Thomas
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
Martin Necasky
 
Future of Web 2.0 & The Semantic Web
Future of Web 2.0 & The Semantic WebFuture of Web 2.0 & The Semantic Web
Future of Web 2.0 & The Semantic Web
is20090
 
ontology.ppt
ontology.pptontology.ppt
ontology.ppt
Prerak10
 
Making the Web searchable
Making the Web searchableMaking the Web searchable
Making the Web searchable
Peter Mika
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
Jane Stevenson
 
Semantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information SpacesSemantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information Spaces
John Breslin
 
Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashups
giurca
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?
milesw
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web
Morgan Briles
 
Ad

More from Peter Mika (11)

What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?
Peter Mika
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in Practice
Peter Mika
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through Entities
Peter Mika
 
Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the Rise
Peter Mika
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistants
Peter Mika
 
Semantic Search at Yahoo
Semantic Search at YahooSemantic Search at Yahoo
Semantic Search at Yahoo
Peter Mika
 
Related Entity Finding on the Web
Related Entity Finding on the WebRelated Entity Finding on the Web
Related Entity Finding on the Web
Peter Mika
 
Hackathon s pb
Hackathon s pbHackathon s pb
Hackathon s pb
Peter Mika
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012
Peter Mika
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
Peter Mika
 
Investigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisInvestigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log Analysis
Peter Mika
 
What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?
Peter Mika
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in Practice
Peter Mika
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through Entities
Peter Mika
 
Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the Rise
Peter Mika
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistants
Peter Mika
 
Semantic Search at Yahoo
Semantic Search at YahooSemantic Search at Yahoo
Semantic Search at Yahoo
Peter Mika
 
Related Entity Finding on the Web
Related Entity Finding on the WebRelated Entity Finding on the Web
Related Entity Finding on the Web
Peter Mika
 
Hackathon s pb
Hackathon s pbHackathon s pb
Hackathon s pb
Peter Mika
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012
Peter Mika
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
Peter Mika
 
Investigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisInvestigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log Analysis
Peter Mika
 
Ad

Recently uploaded (20)

Accounting_Basics_Complete_Guide_By_CA_Suvidha_Chaplot (1).pdf
Accounting_Basics_Complete_Guide_By_CA_Suvidha_Chaplot (1).pdfAccounting_Basics_Complete_Guide_By_CA_Suvidha_Chaplot (1).pdf
Accounting_Basics_Complete_Guide_By_CA_Suvidha_Chaplot (1).pdf
CA Suvidha Chaplot
 
Solaris Resources Presentation - Corporate April 2025.pdf
Solaris Resources Presentation - Corporate April 2025.pdfSolaris Resources Presentation - Corporate April 2025.pdf
Solaris Resources Presentation - Corporate April 2025.pdf
pchambers2
 
Alec Lawler - A Passion For Building Brand Awareness
Alec Lawler - A Passion For Building Brand AwarenessAlec Lawler - A Passion For Building Brand Awareness
Alec Lawler - A Passion For Building Brand Awareness
Alec Lawler
 
Network Detection and Response (NDR): The Future of Intelligent Cybersecurity
Network Detection and Response (NDR): The Future of Intelligent CybersecurityNetwork Detection and Response (NDR): The Future of Intelligent Cybersecurity
Network Detection and Response (NDR): The Future of Intelligent Cybersecurity
GauriKale30
 
Petslify Turns Pet Photos into Hug-Worthy Memories
Petslify Turns Pet Photos into Hug-Worthy MemoriesPetslify Turns Pet Photos into Hug-Worthy Memories
Petslify Turns Pet Photos into Hug-Worthy Memories
Petslify
 
Disinformation in Society Report 2025 Key Findings
Disinformation in Society Report 2025 Key FindingsDisinformation in Society Report 2025 Key Findings
Disinformation in Society Report 2025 Key Findings
MariumAbdulhussein
 
CGG Deck English - Apr 2025-edit (1).pptx
CGG Deck English - Apr 2025-edit (1).pptxCGG Deck English - Apr 2025-edit (1).pptx
CGG Deck English - Apr 2025-edit (1).pptx
China_Gold_International_Resources
 
India Advertising Market Size & Growth | Industry Trends
India Advertising Market Size & Growth | Industry TrendsIndia Advertising Market Size & Growth | Industry Trends
India Advertising Market Size & Growth | Industry Trends
Aman Bansal
 
Influence of Career Development on Retention of Employees in Private Univers...
Influence of Career Development on Retention of  Employees in Private Univers...Influence of Career Development on Retention of  Employees in Private Univers...
Influence of Career Development on Retention of Employees in Private Univers...
publication11
 
Freeze-Dried Fruit Powder Market Trends & Growth
Freeze-Dried Fruit Powder Market Trends & GrowthFreeze-Dried Fruit Powder Market Trends & Growth
Freeze-Dried Fruit Powder Market Trends & Growth
chanderdeepseoexpert
 
TNR Gold Investor Summary - Building The Green Energy Metals Royalty and Gold...
TNR Gold Investor Summary - Building The Green Energy Metals Royalty and Gold...TNR Gold Investor Summary - Building The Green Energy Metals Royalty and Gold...
TNR Gold Investor Summary - Building The Green Energy Metals Royalty and Gold...
Kirill Klip
 
Web Design Creating User-Friendly and Visually Engaging Websites - April 2025...
Web Design Creating User-Friendly and Visually Engaging Websites - April 2025...Web Design Creating User-Friendly and Visually Engaging Websites - April 2025...
Web Design Creating User-Friendly and Visually Engaging Websites - April 2025...
TheoRuby
 
waterBeta white paper - 250202- two-column.docx
waterBeta white paper - 250202- two-column.docxwaterBeta white paper - 250202- two-column.docx
waterBeta white paper - 250202- two-column.docx
Peter Adriaens
 
Alan Stalcup - The Enterprising CEO
Alan  Stalcup  -  The  Enterprising  CEOAlan  Stalcup  -  The  Enterprising  CEO
Alan Stalcup - The Enterprising CEO
Alan Stalcup
 
www.visualmedia.com digital markiting (1).pptx
www.visualmedia.com digital markiting (1).pptxwww.visualmedia.com digital markiting (1).pptx
www.visualmedia.com digital markiting (1).pptx
Davinder Singh
 
Affinity.co Lifecycle Marketing Presentation
Affinity.co Lifecycle Marketing PresentationAffinity.co Lifecycle Marketing Presentation
Affinity.co Lifecycle Marketing Presentation
omiller199514
 
AlaskaSilver Corporate Presentation Apr 28 2025.pdf
AlaskaSilver Corporate Presentation Apr 28 2025.pdfAlaskaSilver Corporate Presentation Apr 28 2025.pdf
AlaskaSilver Corporate Presentation Apr 28 2025.pdf
Western Alaska Minerals Corp.
 
From Sunlight to Savings The Rise of Homegrown Solar Power.pdf
From Sunlight to Savings The Rise of Homegrown Solar Power.pdfFrom Sunlight to Savings The Rise of Homegrown Solar Power.pdf
From Sunlight to Savings The Rise of Homegrown Solar Power.pdf
Insolation Energy
 
NewBase 28 April 2025 Energy News issue - 1783 by Khaled Al Awadi_compressed...
NewBase 28 April 2025  Energy News issue - 1783 by Khaled Al Awadi_compressed...NewBase 28 April 2025  Energy News issue - 1783 by Khaled Al Awadi_compressed...
NewBase 28 April 2025 Energy News issue - 1783 by Khaled Al Awadi_compressed...
Khaled Al Awadi
 
Top 5 Mistakes to Avoid When Writing a Job Application
Top 5 Mistakes to Avoid When Writing a Job ApplicationTop 5 Mistakes to Avoid When Writing a Job Application
Top 5 Mistakes to Avoid When Writing a Job Application
Red Tape Busters
 
Accounting_Basics_Complete_Guide_By_CA_Suvidha_Chaplot (1).pdf
Accounting_Basics_Complete_Guide_By_CA_Suvidha_Chaplot (1).pdfAccounting_Basics_Complete_Guide_By_CA_Suvidha_Chaplot (1).pdf
Accounting_Basics_Complete_Guide_By_CA_Suvidha_Chaplot (1).pdf
CA Suvidha Chaplot
 
Solaris Resources Presentation - Corporate April 2025.pdf
Solaris Resources Presentation - Corporate April 2025.pdfSolaris Resources Presentation - Corporate April 2025.pdf
Solaris Resources Presentation - Corporate April 2025.pdf
pchambers2
 
Alec Lawler - A Passion For Building Brand Awareness
Alec Lawler - A Passion For Building Brand AwarenessAlec Lawler - A Passion For Building Brand Awareness
Alec Lawler - A Passion For Building Brand Awareness
Alec Lawler
 
Network Detection and Response (NDR): The Future of Intelligent Cybersecurity
Network Detection and Response (NDR): The Future of Intelligent CybersecurityNetwork Detection and Response (NDR): The Future of Intelligent Cybersecurity
Network Detection and Response (NDR): The Future of Intelligent Cybersecurity
GauriKale30
 
Petslify Turns Pet Photos into Hug-Worthy Memories
Petslify Turns Pet Photos into Hug-Worthy MemoriesPetslify Turns Pet Photos into Hug-Worthy Memories
Petslify Turns Pet Photos into Hug-Worthy Memories
Petslify
 
Disinformation in Society Report 2025 Key Findings
Disinformation in Society Report 2025 Key FindingsDisinformation in Society Report 2025 Key Findings
Disinformation in Society Report 2025 Key Findings
MariumAbdulhussein
 
India Advertising Market Size & Growth | Industry Trends
India Advertising Market Size & Growth | Industry TrendsIndia Advertising Market Size & Growth | Industry Trends
India Advertising Market Size & Growth | Industry Trends
Aman Bansal
 
Influence of Career Development on Retention of Employees in Private Univers...
Influence of Career Development on Retention of  Employees in Private Univers...Influence of Career Development on Retention of  Employees in Private Univers...
Influence of Career Development on Retention of Employees in Private Univers...
publication11
 
Freeze-Dried Fruit Powder Market Trends & Growth
Freeze-Dried Fruit Powder Market Trends & GrowthFreeze-Dried Fruit Powder Market Trends & Growth
Freeze-Dried Fruit Powder Market Trends & Growth
chanderdeepseoexpert
 
TNR Gold Investor Summary - Building The Green Energy Metals Royalty and Gold...
TNR Gold Investor Summary - Building The Green Energy Metals Royalty and Gold...TNR Gold Investor Summary - Building The Green Energy Metals Royalty and Gold...
TNR Gold Investor Summary - Building The Green Energy Metals Royalty and Gold...
Kirill Klip
 
Web Design Creating User-Friendly and Visually Engaging Websites - April 2025...
Web Design Creating User-Friendly and Visually Engaging Websites - April 2025...Web Design Creating User-Friendly and Visually Engaging Websites - April 2025...
Web Design Creating User-Friendly and Visually Engaging Websites - April 2025...
TheoRuby
 
waterBeta white paper - 250202- two-column.docx
waterBeta white paper - 250202- two-column.docxwaterBeta white paper - 250202- two-column.docx
waterBeta white paper - 250202- two-column.docx
Peter Adriaens
 
Alan Stalcup - The Enterprising CEO
Alan  Stalcup  -  The  Enterprising  CEOAlan  Stalcup  -  The  Enterprising  CEO
Alan Stalcup - The Enterprising CEO
Alan Stalcup
 
www.visualmedia.com digital markiting (1).pptx
www.visualmedia.com digital markiting (1).pptxwww.visualmedia.com digital markiting (1).pptx
www.visualmedia.com digital markiting (1).pptx
Davinder Singh
 
Affinity.co Lifecycle Marketing Presentation
Affinity.co Lifecycle Marketing PresentationAffinity.co Lifecycle Marketing Presentation
Affinity.co Lifecycle Marketing Presentation
omiller199514
 
From Sunlight to Savings The Rise of Homegrown Solar Power.pdf
From Sunlight to Savings The Rise of Homegrown Solar Power.pdfFrom Sunlight to Savings The Rise of Homegrown Solar Power.pdf
From Sunlight to Savings The Rise of Homegrown Solar Power.pdf
Insolation Energy
 
NewBase 28 April 2025 Energy News issue - 1783 by Khaled Al Awadi_compressed...
NewBase 28 April 2025  Energy News issue - 1783 by Khaled Al Awadi_compressed...NewBase 28 April 2025  Energy News issue - 1783 by Khaled Al Awadi_compressed...
NewBase 28 April 2025 Energy News issue - 1783 by Khaled Al Awadi_compressed...
Khaled Al Awadi
 
Top 5 Mistakes to Avoid When Writing a Job Application
Top 5 Mistakes to Avoid When Writing a Job ApplicationTop 5 Mistakes to Avoid When Writing a Job Application
Top 5 Mistakes to Avoid When Writing a Job Application
Red Tape Busters
 

Semantic Web Austin Yahoo

  • 1. Making the Web Searchable Peter Mika Researcher, Data Architect Yahoo! Research
  • 3. Yahoo! Research Barcelona Established January, 2006 Led by Ricardo Baeza-Yates Research areas Web Mining content, structure, usage Distributed Web retrieval Multimedia retrieval NLP and Semantics
  • 4. Yahoo! by numbers (April, 2007) There are approximately 500 million users of Yahoo! branded services, meaning we reach 50 percent – or 1 out of every 2 users – online, the largest audience on the Internet (Yahoo! Internal Data). Yahoo! is the most visited site online with nearly 4 billion visits and an average of 30 visits per user per month in the U.S. and leads all competitors in audience reach, frequency and engagement (comScore Media Metrix, US, Feb. 2007). Yahoo! accounts for the largest share of time Americans spend on the Internet with 12 percent (comScore Media Metrix, US, Feb. 2007) and approximately 8 percent of the world’s online time (comScore WorldMetrix, Feb. 2007). Yahoo! is the #1 home page with 85 million average daily visitors on Yahoo! homepages around the world, an increase of nearly 5 million visitors in a month (comScore WorldMetrix, Feb. 2007). Yahoo!’s social media properties (Flickr, delicious, Answers, 360, Video, MyBlogLog, Jumpcut and Bix) have 115 million unique visitors worldwide (comScore WorldMetrix, Feb. 2007). Yahoo! Answers is the largest collection of human knowledge on the Web with more than 90 million unique users and 250 million answers worldwide (Yahoo! Internal Data). There are more than 450 million photos in Flickr in total and 1 million photos are uploaded daily. 80 percent of the photos are public (Yahoo! Internal Data). Yahoo! Mail is the #1 Web mail provider in the world with 243 million users (comScore WorldMetrix, Feb. 2007) and nearly 80 million users in the U.S. (comScore Media Metrix, US, Feb. 2007) Interoperability between Yahoo! Messenger and Windows Live Messenger has formed the largest IM community approaching 350 million user accounts (Yahoo! Internal Data). Yahoo! Messenger is the most popular in time spent with an average of 50 minutes per user, per day (comScore WorldMetrix, Feb. 2007). Nearly 1 in 10 Internet users is a member of a Yahoo! Groups (Yahoo! Internal Data). Yahoo! is one of only 26 companies to be on both the Fortune 500 list and the Fortune’s “Best Place to Work” List (2006).
  • 5. Agenda Part 1 Publishing content on the Semantic Web Intro to RDF and the Semantic Web Six ways to publish data on the Semantic Web History of embedded metadata on the Web RDFa, best practices and tools Exercise Part 2 Semantic Web in use SearchMonkey BOSS and YQL Semantic Search and Navigation Part 3 Research in Semantic Search
  • 6. Motivation Why publish data on the Semantic Web? Multiply the value of your data by increasing content agility The potential for reuse and aggregation with other datasets Make your data more easily findable Why develop applications using semantic technologies? Content agility means you can more rapidly develop applications by reusing and recombining data. Content agility leads to increased agility and robustness of your application.
  • 7. Intro to the Semantic Web
  • 8. Basic RDF RDF has two basic types of entities: resources and literals Roughly objects and built-in types in Object Oriented Programming Resources are identified by a URI or otherwise called a blank node URIs are a generalization of URLs Notation: <http://www.example.org/Person> or ex:Person Literals have an optional language and datatype (string, integer etc.) Datatypes are identified by URIs, e.g. XML Schema datatypes Two literals are the same if their components are the same Notation: “Joe B.” or Joe@en^^http://…#string
  • 9. RDF models A triple aka a statement is a tuple of (subject, predicate, object) Example: (Joe, loves, Mary) Each triple gives the value of a property for a given resource or relates two objects to one another A predicate is always a resource with a URI A triple is also called a statement An RDF model is a set of triples Ordering of statements in an RDF document is irrelevant (unlike XML)
  • 10. Graphical and textual notation A number of text-based interchange formats for RDF RDF/XML, Turtle, N3, N-Triples Example: http://www.cs.vu.nl/~pmika/foaf.rdf my:Joe “ Joe A.” name foaf:Person type
  • 11. Ontologies Ontologies are collections of classes and properties used to describe objects in a particular domain Ontologies themselves are described in RDF or OWL (the Web Ontology Language), an extension of RDF Example: the Friend-Of-A-Friend (FOAF) ontology for personal profiles Classes can be described by sub- and superclasses, required properties Class membership in RDF is expressed using the rdf:type property An instance can have multiple classes (types) A class can have multiple superclasses Properties can be described by their domain, range, cardinalities, etc.
  • 12. Advanced topic: Resources vs Literals Resources are objects, Literals are strings Resources are instances of classes, Literals have datatypes Whether something is a resource or literal sometimes depends on the detail of modeling <meta property=“myvocab:knows”>Paris Hilton</meta> <item rel=“foaf:knows”> <meta property=“foaf:name”>Paris Hilton</meta> </item> You cannot make statements about literals (literals are always the object in a triple) Resources can carry a globally unique identifier, literals have no identity Web resources such as documents and images are resources <item rel=“rdfs:seeAlso” resource=“http://www.some.related.page.com/”/> <item rel=“foaf:img” resource=“http://photosite.example.org/photo.jpg”/> When in doubt: it’s a resource
  • 13. Advanced Topic: Informational resources vs. Conceptual resources Informational resource: an HTML document, image, any other file on the Web Retrievable in its entirety from the Web Retrieving it can return a 200 OK Conceptual (non-informational) resource: a person, an event, a place, etc. A description of it may be retrievable from the Web When identified by a URL, retrieving it should return a 303 Redirect Never confuse a webpage with what it describes! You are not your Facebook profile: one is a document, the other is a person. A document has properties such as byte-size, media-type etc, a person has name, age, etc. Make sure you don’t use the URL of an existing webpage as the URI of a resource
  • 14. RDF is designed for distributed systems URIs provide web-wide global identification across documents A resource may be described by multiple documents We know it’s the same resource because the same URI is used or through reasoning (advanced topic…) URIs are intented to be reused Unique, but not single identifiers: two URIs may denote the same thing URIs are dereferencable (can be retrieved) A well-behaved URI returns a description of the resource Provides authority: the definition of foaf:Person lives at that URI Ontologies can be looked up as well Typically at the root of the URIs, also known as the namespace Example: http://xmlns.com/foaf/0.1/Person redirects to the specification
  • 15. URIs implicitly link data together (#joe, #name, “Joe A.”) (#joe, #email, mailto:[email protected]) (#mary, name, “Mary B.”) (#mary, gender, “female”) (#joe, #loves, #mary) Joe’s homepage A dating site Mary’s homepage (#name, #type, #Property) (#name, #domain, #Person) Schema doc Linked Data : Following links from one document to another allows to discover the entire graph (data and ontologies)
  • 16. When put together, they form a single ‘global’ graph “ Joe A.” #joe #name “ [email protected]” #email #mary #loves “ Mary B.” “ female” #name #gender
  • 17. The even larger picture: entire datasets connected
  • 19. RDF on the Web II. Six ways of publishing RDF Standalone files (static or dynamically generated) Metadata inside webpages SPARQL endpoints Feeds XSLT/GRDDL Automated tools Note: these are non-exclusive
  • 20. Option 1: Standalone RDF documents RDF documents linked to other RDF documents Use rdfs:seeAlso to point to a related document It says: Go and look at that document if you want to know more Advantages: No change to the publishing of the HTML documents Data can be published by third party Tools RDB-to-RDF mappers such as D2RQ or Triplify Linked Data browsers Examples: Most datasets in the Linked Data cloud . . . #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
  • 21. Option 1: cntd. For discovery, the metadata is often linked from HTML pages < link rel=&quot;meta&quot; type=&quot;application/rdf+xml&quot; title=&quot;FOAF&quot; href=&quot;http://www.cs.vu.nl/~pmika/foaf.rdf&quot; /> Additional advantages: Discovery from the webpage It’s clear that the metadata is a machine representation of the human-targeted content of the page Examples: FOAF profiles, BestBuy . Peter Mika was born in Budapest. #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
  • 22. Option 2: Metadata inside web pages Using microformats, RDFa, MicroData (more later) Advantages: No separate database export required Browser plug-in friendly Search engine friendly Copy-paste friendly Tools: XML editors (e.g. Oxygen) Triplr RDFa Distiller RDFa bookmarklet Ubiquity RDFa plugin Optimus microformat parser Examples: many, including SlideShare, YouTube, LinkedIn, Digg, Myspace, Facebook… Peter Mika was born in Budapest. Peter Mika was born in Budapest. #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
  • 23. Option 3: SPARQL endpoints Query access to your RDF database Similar to exposing your database on the Web and giving someone read-only SQL access Advantages: Most flexible and best performing access from a consumer perspective Tools: Triple stores (Oracle, Virtuoso, Sesame, Jena, OWLIM etc.) RDB-to-RDF mappers such as D2RQ and Triplify #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
  • 24. Option 4: feeds The equivalent of a database dump No standard feed format for RDF Advantages Submit your data without making it public Yahoo! consumes: DataRSS GoogleBase feeds NewsML Submit your feed using SiteExplorer . #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population #PeterM #Bud born “ Peter Mika” label “ Budapest” label #Hun capital-of “ 2,000,000” population
  • 25. Option 5: XSLT Publish the transformation from HTML to structured data GRDDL is a standard for linking an HTML page to a transformation that produces RDF data Advantages No change to the page Disadvantages Transformation needs to be executed to get to the data Tools Intel MashMaker Dapper Glue API from AdaptiveBlue <XSLT> xx yy 1 2
  • 26. Option 6: Automatic markup Restricted mostly to tagging entities with identifiers Advantages Less manual effort Disadvantages Limited to finding relevant entities in text Tools OpenCalais Zemanta API Peter Mika was born in Budapest. <person>Peter Mika</person> was born in <location>Budapest</location>.
  • 27. Example: Zemanta A personal writing assistant for bloggers Plugin for popular blogging platforms and web mail clients Analyzes text as you type and suggests hyperlinks, tags, categories, images and related articles API available with the same functionality
  • 29. Brief history of the Annotated Web 1995: HTML meta tags 1996: Simple HTML Ontology Extensions (SHOE) 1998: RDF/XML RDF/XML in HTML RDF linked from HTML 2003: Web 2.0 Tagging Microformats Metadata in Wikipedia Machine tags in Flickr 2005: eRDF 2008: RDFa
  • 30. HTML meta tags <HTML> <HEAD profile=&quot;http://dublincore.org/documents/dcq-html/&quot;> <META name=&quot;DC.author &quot; content=&quot; Peter Mika &quot;> <LINK rel=&quot;DC.rights copyright&quot; href=&quot; http://www.example.org/rights.html &quot; /> <LINK rel=&quot;meta&quot; type=&quot;application/rdf+xml&quot; title=&quot;FOAF&quot; href= &quot; http://www.cs.vu.nl/~pmika/foaf.rdf &quot;> </HEAD> … </HTML>
  • 31. SHOE example (Hefflin & Hendler, 1996) <ONTOLOGY &quot;our-ontology&quot; VERSION=&quot;1.0&quot;> <ONTOLOGY-EXTENDS &quot;organization-ontology&quot; VERSION=&quot;2.1&quot; PREFIX=&quot;org&quot; URL=&quot;http://www.ont.org/orgont.html&quot;> <ONTDEF CATEGORY=&quot;Person&quot; ISA=&quot;org.Thing&quot;> <ONTDEF RELATION=&quot;lastName&quot; ARGS=&quot;Person STRING&quot;> <ONTDEF RELATION=&quot;firstName&quot; ARGS=&quot;Person STRING&quot;> <ONTDEF RELATION=&quot;marriedTo&quot; ARGS=&quot;Person Person&quot;> <ONTDEF RELATION=&quot;employee&quot; ARGS=&quot;org.Organization Person&quot;> </ONTOLOGY > <HEAD> <META HTTP-EQUIV=&quot;Instance-Key&quot; CONTENT=&quot;http://www.cs.umd.edu/~george&quot;> <USE-ONTOLOGY &quot;our-ontology&quot; VERSION=&quot;1.0&quot; PREFIX=&quot;our&quot; URL=&quot;http://ont.org/our-ont.html&quot;> </HEAD> <BODY> <CATEGORY &quot;our.Person&quot;> <RELATION &quot;our.marriedTo&quot; TO=&quot;http://www.cs.umd.edu/~helena&quot;> <RELATION &quot;our.employee&quot; FROM=&quot;http://www.cs.umd.edu&quot;> My name is <ATTRIBUTE &quot;our.firstName&quot;> George </ATTRIBUTE> <ATTRIBUTE &quot;our.lastName&quot;> Cook </ATTRIBUTE> and I live at...
  • 34. SHOE Graphical Query Interface
  • 35. Example: Creative Commons Embedding CC license in HTML (now deprecated): <HTML> <HEAD>… </HEAD> <BODY> … <!–- <rdf:RDF xmlns=&quot;http://creativecommons.org/ns#&quot; xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot; xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;> <Work rdf:about=&quot;http://www.yergler.net/averages/&quot;> <dc:title>The Law of Averages</dc:title> <dc:description>...because eventually i&apos;ll be right...</dc:description> <license rdf:resource=&quot;http://creativecommons.org/licenses/by-nc/1.0/&quot; /> </Work> <License rdf:about=&quot;http://creativecommons.org/licenses/by-nc/1.0/&quot;> <requires rdf:resource=&quot;http://web.resource.org/cc/Notice&quot; /> <permits rdf:resource=&quot;http://web.resource.org/cc/Reproduction&quot; /> <permits rdf:resource=&quot;http://web.resource.org/cc/Distribution&quot; /> <prohibits rdf:resource=&quot;http://web.resource.org/cc/CommercialUse&quot; /> </License> </rdf:RDF> -->
  • 36. Example: Creative Commons Current: rel attribute (HTML4) This work is licensed under a <a rel=&quot;license&quot; href=&quot;http://creativecommons.org/licenses/by/3.0/us/&quot;>Creative Commons Attribution 3.0 United States License</a>. Use of the “rel” attribute for semantic annotation is the birth of the microformat…
  • 37. Microformats (μf) Community centered around microformats.org Specifications and discussions are hosted there Agreements on the way to encode certain kinds metadata in HTML Reuse of semantic-bearing HTML elements Based on existing standards Minimality Microformats exist for a limited set of objects hCard (persons and organizations) hCalendar (events) hResume hProduct hRecipe Varying degrees of support and stability hCard and rel-tag are widely supported
  • 38. Microformats: limitations No shared syntax Each microformat has a separate syntax tailored to the vocabulary No formal schemas Limited reuse, extensibility of schemas Unclear which combinations are allowed No datatypes No namespaces, unique identifiers (URIs) no interlinking mapping between instances is required Relationship to page context is often unclear
  • 39. Example: microformats <cite class=&quot; vcard &quot;> <a class=&quot; fn url &quot; rel=&quot;friend colleague met&quot; href=&quot;http://meyerweb.com/&quot;> Eric Meyer </a> </cite> wrote a post ( <cite> <a href=&quot;http://meyerweb.com/eric/thoughts/2005/12/16/tax-relief/&quot;> Tax Relief </a></cite> ) about an unintentionally humorous letter he received from the <span class=&quot; vcard &quot;> <a class=&quot; fn org url &quot; href=&quot;http://irs.gov/&quot;> Internal Revenue Service </a> </span>. <div class=&quot; vcard &quot;> <a class=&quot; email fn &quot; href=&quot;mailto:[email protected]&quot;> Joe Friday </a> <div class=&quot; tel &quot;> +1-919-555-7878 </div> <div class=&quot; title &quot;> Area Administrator, Assistant </div> </div>
  • 40. Microformats vs. RDFa Choose microformats when you find a microformat that fits your needs and supported by Yahoo! Microformats are first option because they are simple We support all major microformats, see the documentation It’s a common misconception that RDFa requires XHTML: it doesn’t If you find none that perfectly fits your needs then you need RDFa Microformats have a fixed schema: you can not add your own attributes Example: a social networking site with user profiles VCard is a good candidate, but for example it doesn’t have a way to express the user’s social connections You either live without this, or go with RDFa The rest of this presentation is about RDFa, which is thus more powerful, but also more complex We will focus on the concepts that are hard to grasp
  • 41. Keep an eye on HTML5 Currently under standardization at the W3C Last Call this fall, keep an eye on it Introduces Microdata Similar to microformats Some predefined vocabularies with central registration Some of the flexibility of RDFa Introduce new terms using reverse domain names or full URIs Semantic HTML elements such as <time>, <video>, <article>…
  • 42. Microdata example <div item> <p>My name is <span itemprop=&quot; name &quot;> Neil </span>.</p> <p>My band is called <span itemprop=&quot; band &quot;> Four Parts Water </span>. I was born on <time itemprop=&quot; birthday &quot; datetime=&quot; 2009-05-10 &quot;>May 10th 2009</time>. <img itemprop=&quot; image &quot; src=” me.png &quot; alt=”me”> </p> </div
  • 43. Slides courtesy of Mark Birbeck Introduction to RDFa
  • 44. What does RDFa look like? There are some metadata features in HTML already... ...so we give them an RDF interpretation... ...then we generalise them... ...and then we add a few more.
  • 45. HTML's metadata features (1) <html>  <head>    <title>RDFa: Now everyone can have an API</title>    <meta name=&quot;author&quot; content=&quot;Mark Birbeck&quot; />    <meta name=&quot;created&quot; content=&quot;2009-05-09&quot; />    <link rel=&quot;license&quot;      href=&quot;http://creativecommons.org/licenses/by-sa/3.0/&quot; />  </head>  .  .  . </html>
  • 46. HTML's metadata features (2) <a href=&quot;http://creativecommons.org/licenses/by-sa/3.0/&quot;  >CC Attribution-ShareAlike</a> <a rel=&quot;license&quot;   href=&quot;http://creativecommons.org/licenses/by-sa/3.0/&quot;  >CC Attribution-ShareAlike</a>
  • 47. RDFa extends @rel/@href to images <img src=&quot;image01.png&quot; rel=&quot;license&quot;   href=“http://creativecommons.org/licenses/by-sa/3.0/” /> <img src=&quot;image02.png&quot; rel=&quot;license&quot;   href=“http://creativecommons.org/licenses/by-sa/3.0/” />
  • 48. RDFa extends meta/@content to body <html>  <head>    <title>RDFa: Now everyone can have an API</title>    <meta name=&quot;author&quot; content=&quot;Mark Birbeck&quot; />    <meta name=&quot;created&quot; content=&quot;2009-05-09&quot; />  </head>  <body>    <h1>RDFa: Now everyone can have an API</h1>    Author: <em>Mark Birbeck</em>    Created: <em>May 9th, 2009</em>  </body> </html>
  • 49. RDFa extends meta/@content to body <html>  <head>    <title>RDFa: Now everyone can have an API</title>  </head>  <body>    <h1>RDFa: Now everyone can have an API</h1>    Author: <em property=&quot;author&quot; content=&quot;Mark Birbeck&quot;     >Mark Birbeck</em>    Created: <em property=&quot;created&quot; content=&quot;2009-05-09&quot;     >May 9th, 2009</em>  </body> </html>
  • 50. RDFa extends meta/@content to body <html>  <head>    <title>RDFa: Now everyone can have an API</title>  </head>  <body>    <h1>RDFa: Now everyone can have an API</h1>    Author: <em property=&quot;author&quot;     >Mark Birbeck</em>    Created: <em property=&quot;created&quot; content=&quot;2009-05-09&quot;     >May 9th, 2009</em>  </body> </html>
  • 51. Vocabularies use CURIEs <html xmlns:dc=&quot;http://purl.org/dc/terms/&quot;>   <head>    <title>RDFa: Now everyone can have an API</title>  </head>  <body>    <h1>RDFa: Now everyone can have an API</h1>    Author: <em property=&quot;dc:creator&quot;     >Mark Birbeck</em>    Created: <em property=&quot;dc:created&quot; content=&quot;2009-05-09&quot;     >May 9th, 2009</em>  </body> </html>
  • 52. CURIEs, or Compact URIs Named after Marie Curie, who was the first person to receive two Nobel prizes, one for physics and one for chemistry. CURIEs allow a full URI to be expressed in a simple prefix:suffix form. The 'suffix' part is looser than in XML namespaces, supporting formulations such as abc:123.
  • 53. Properties can also apply to images <img src=&quot;image01.png” rel=&quot;license&quot;   href=“http://creativecommons.org/licenses/by-sa/3.0/” /> <img src=&quot;image02.png” rel=&quot;license&quot;   href=&quot;http://creativecommons.org/licenses/by-sa/3.0/” />
  • 54. Properties can also apply to images <img src=&quot;image01.png&quot; rel=&quot;license&quot;   href=&quot;http://creativecommons.org/licenses/by-sa/3.0/&quot; property=&quot;dc:creator&quot; content=&quot;Mark Birbeck” /> <img src=&quot;image02.png&quot; rel=&quot;license&quot;   href=&quot;http://creativecommons.org/licenses/by-sa/3.0/&quot; property=&quot;dc:creator&quot; content=&quot;Mark Birbeck&quot; />
  • 55. Relationships and properties on anything <a   href=&quot;http://www.slideshare.net/mark.birbeck/the-5-minute-guide-to-rdfain-only-6-minutes-40-seconds&quot;  >The 5 minute guide to RDFa...in only 6 minutes and 40 seconds</a>
  • 56. Relationships and properties on anything <a rel=&quot;license&quot;   href=&quot;http://www.slideshare.net/mark.birbeck/the-5-minute-guide-to-rdfain-only-6-minutes-40-seconds&quot;  >The 5 minute guide to RDFa...in only 6 minutes and 40 seconds</a> Doesn't say what we want.
  • 57. Relationships and properties on anything <a    href=&quot;http://www.slideshare.net/mark.birbeck/the-5-minute-guide-to-rdfain-only-6-minutes-40-seconds&quot;  >The 5 minute guide to RDFa...in only 6 minutes and 40 seconds</a> is licensed under <a   href=&quot;http://creativecommons.org/licenses/by-sa/2.5/&quot;   >CC BY SA</a>.
  • 58. Relationships and properties on anything <a    href=&quot;http://www.slideshare.net/mark.birbeck/the-5-minute-guide-to-rdfain-only-6-minutes-40-seconds&quot;  >The 5 minute guide to RDFa...in only 6 minutes and 40 seconds</a> is licensed under <a about=&quot;http://www.slideshare.net/mark.birbeck/the-5-minute-guide-to-rdfain-only-6-minutes-40-seconds&quot;   rel=&quot;license&quot;   href=&quot;http://creativecommons.org/licenses/by-sa/2.5/&quot;   >CC BY SA</a>.
  • 59. Relationships and properties on anything <a    href=&quot;http://www.slideshare.net/mark.birbeck/the-5-minute-guide-to-rdfain-only-6-minutes-40-seconds&quot;  >The 5 minute guide to RDFa...in only 6 minutes and 40 seconds</a> is licensed under <a about=&quot;http://www.slideshare.net/mark.birbeck/the-5-minute-guide-to-rdfain-only-6-minutes-40-seconds&quot;   rel=&quot;license&quot;   href=&quot;http://creativecommons.org/licenses/by-sa/2.5/&quot;   property=&quot;dc:creator&quot; content=&quot;Mark Birbeck> CC BY SA </a>.
  • 60. @about sets context <div about=&quot;http://www.slideshare.net/mark.birbeck/the-5-minute-guide-to-rdfain-only-6-minutes-40-seconds&quot;>     <h1>The 5 minute guide to RDFa...</h1>    Author: <em property=&quot;dc:creator&quot;     >Mark Birbeck</em>    Created: <em property=&quot;dc:created&quot; content=&quot;2009-05-09&quot;     >May 9th, 2009</em> </div>
  • 61. @about sets context <html xmlns:dc=&quot;http://purl.org/dc/terms/&quot;>   <head>    <title>RDFa: Now everyone can have an API</title>   </head>  <body>     <h1>RDFa: Now everyone can have an API</h1>       Author: <em property=&quot;dc:creator&quot;     >Mark Birbeck</em>    Created: <em property=&quot;dc:created&quot; content=&quot;2009-05-09&quot;     >May 9th, 2009</em>  </body> </html>
  • 62. Basics of RDFa generalise HTML's existing semantic features; add support for CURIEs for property and relationship names; add @about.
  • 63. Advanced RDFa use of @datatype to set the data type of @content; use of @typeof to set rdf:type; support for bnodes; support for XML literals; ability to chain statements together. Note that since RDFa supports all of the features you'll find in RDF, then it means that you can even mark-up OWL documents in HTML.
  • 64. The process of annotating with RDFa Invest in familiarizing with the RDFa syntax by reading the RDFa Primer It is also highly recommended that you read the RDF Primer . RDF is the data model used by RDFa. Choose a vocabulary from the SearchMonkey documentation that fits your needs A vocabulary describes a set of types and attributes within a given domain If you don’t fin d a good candidate , extend an existing one or create a new one Annotate your page. Before you start, you might want to validate your page for (X)HTML conformance using the W3C’s (X)HTML Validator to reduce the chance of errors. Choose Document Type XHTML + RDFa. No specific tool support. If you have an HTML or XML editor that supports DTDs, you will have syntax checking and highlighting. Use the RDFa Distiller to validate which data can be extracted from your page. If you fancy, use the RDF Validator to graphically visualize the RDF graph that is outputted. Put the annotated page online. The data will extracted the next time your page is crawled No need to explicitly submit anything No notification when your site is crawled See http://rdfa.info/rdfa-implementations for new tools and APIs
  • 65. RDFa pitfalls Validation problems can stop us from extracting data Use the W3C validator Use the right DOCTYPE declaration if using XHTML Set the encoding of your page properly (using HTTP headers or XML declaration) Prefixes need to be defined using the xmlns attribute Unless you are making statements about the document, set the subject using the about attribute Do not include HTML elements in literal values Incorrect: <div property=“foaf:name”><b>Peter Mika</b></div> Use absolute URIs as the value of the resource attribute Or make sure you specify HTML base
  • 66. More pitfalls: precedence rules Be careful when using rel and typeof in combination because of the precedence rules BAD example: <div about=“#id”> <span property=“foaf:name“>Peter Mika</span> <span rel=“foaf:img“ typeof=“foaf:Image”> <span property=“dc:format”>jpg</span> … </span </div> To correct, you need to put the typeof inside the <span> node with rel=“foaf:img”
  • 67. More pitfalls: the typeof attribute Typeof does two things at once: it creates a new subject resource and assigns the type to it BAD example: <div about=“#id”> <span property=“foaf:name“>Peter Mika</span> <span rel=“foaf:img“ resource=“http://www.example.org/photo.jpg”> <span typeof=“foaf:Image”> <span property=“dc:format”>jpg</span> </span </span </div> To correct, you have to repeat the resource attiribute on the span node with the typeof
  • 68. HTML markup pitfalls Marking up <h1>: <h1 property=“dc:title”>My homepage</h1> NOT: <h1><div property=“dc:title”>My homepage</h1>   Marking up an image: <a href=&quot;http://example.org/user/alex&quot;>      <span about=&quot;#user1&quot; rel=&quot;foaf:img media:image&quot;>         <img alt=&quot;Alex&quot; src=&quot;http://example.org/photos/alex.jpg&quot;/>      </span> </a> This doesn’t work: <img rel=“foaf:img” src=“photo.jpg/> In the header you need <meta property=“…” content=“…”> NOT <meta name=“…” content=“…”>
  • 69. More pitfalls: breaking up descriptions You can not break up a description like this: <span rel=“foaf:knows&quot;>    <span property=“foaf:name&quot;>Peter Mika</span> </span> …. <span rel=“foaf:knows&quot;>    <a rel=“foaf:email“ href=“mailto:[email protected] /> </span> This is not the same as: <span rel=“foaf:knows&quot;>    <span property=“foaf:name&quot;>Peter Mika</span>    <a rel=“foaf:email“ href=“mailto:[email protected] /> </span> In the first case there are two related resources, with one attribute each, in the second case there is a single related resource with two attributes.
  • 70. Tips Hiding information from being displayed Links without content will not be rendered Use <span property=“foaf:name” content=“Peter Mika”/> Use datatypes to provide the expected type of a literal. This helps validation because any tool can check whether the literal is indeed of that type.
  • 71. Choosing a vocabulary Look at SearchMonkey objects Video, Games, Presentations, Events, News, Businesses, Products, Discussion Search the Web or ask for advice on mailing lists [email_address] [email_address] Beware of people who claim to have the vocabulary of everything Preferably you want something small and targeted Never a 100% fit  you will need to introduce vocabulary terms (classes and properties) Do not introduce new classes/properties in existing namespaces Example: the namespace http://xmlns.com/foaf/0.1/ is used by the FOAF project. Try not to introduce a new term without contacting the owner, i.e. the membership of the FOAF mailing list.
  • 72. Advanced topic: creating a vocabulary Get advice on methodology vocamp.org and semanticweb.org Choose a namespace and a prefix Give sensible names, e.g. name it after your site, but don’t call it searchmonkey Namespace ends either with a slash or a hash Create an RDF or OWL document describing your classes and properties Use an ontology editor such as Protégé 4.0 Follow naming conventions Publish your vocabulary Make sure the URIs of your properties and classes are resolvable E.g. myvocab:digicam should resolve to a document containing the definition of myvocab:digicam Convince others to adopt your vocabulary If you are in fishing, convince other fishing businesses
  • 73. Exercise Explore data on the Web Microformats Search for pages on Yahoo using searchmonkey:com.yahoo.page.uf.hcard Try Operator Firefox Plug-in Try Optimus RDFa Search for pages on Yahoo using searchmonkey:com.yahoo.page.rdf.rdfa Try RDFa bookmarklet Try RDFa Distiller Mark up your webpage using RDFa See process on previous slides
  • 75. Microsearch Metadata is out there Just how much data is out there? What is the quality? Idea: bring metadata to the surface of search How does it work? User enters query Metadata is extracted dynamically Entity reconciliation Metadata is used to display rich abstracts, related pages spatial, temporal visualization Microsearch prototype
  • 76. Example: ivan herman Related pages based on metadata Events from personal calendar, Conferences, and bio from LinkedIn Geolocation Rich abstract
  • 77. Example: peter site:flickr.com Flickr users named “Peter” by geography
  • 78. Example: san francisco conference Conferences in San Francisco by date
  • 79. Example: greater st. peter Save to address book Call phone number (other actions)
  • 80. Lessons More metadata than we expected 53% of unique queries have at least one metadata-enabled page in top 10 (n=7848) Performance is poor Metadata needs to come from the index for performance ‘ Metacrap’ does exist Users have to see metadata to spot mistakes in their markup, warn others RDF templating (Fresnel) adds complexity Abstract needs to be customized to the particular site, query
  • 81. Applications Yahoo’s SearchMonkey and Google’s Rich Snippets BOSS and YQL Semantic search and navigation
  • 82. Creating an ecosystem of publishers, developers and end-users Motivating and helping publishers to implement semantic annotation Providing tools for developers to create compelling applications Focusing on end-user experience Rich abstracts as a first application Addressing the long tail of query and content production Standard Semantic Web technology dataRSS = Atom + RDFa Industry standard vocabularies http://developer.yahoo.com/searchmonkey/ SearchMonkey
  • 83. Before After an open platform for using structured data to build more useful and relevant search results What is SearchMonkey?
  • 84. image deep links name/value pairs or abstract Enhanced Result
  • 86. SearchMonkey Acme.com’s database Index RDF/Microformat Markup site owners/publishers share structured data with Yahoo!. 1 consumers customize their search experience with Enhanced Results or Infobars 3 site owners & third-party developers build SearchMonkey apps. 2 DataRSS feed Web Services Page Extraction Acme.com’s Web Pages
  • 87. Standard enhanced results Embed markup in your page, get an enhanced results without any programming
  • 88. Documentation Simple and advanced, examples, copy-paste code, validator
  • 89. DataRSS An Atom extension for structured data Why a new format? A feed format is required by publishers Exclusive content (e.g. partnerships, paid inclusion) No changes necessary to the web page No standard named graph format for the Semantic Web Needed to capture meta-metadata such as source and timestamp of information Not really a new format An Atom extension Use any RDFa parser to get the triples out cf. Google Base feeds
  • 90. DataRSS <?profile http://search.yahoo.com/searchmonkey-profile ?> <feed xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xsi:schemaLocation=&quot;http://www.w3.org/2005/Atom ../latest/xsd/datarss.xsd“> <id>http://www.linkedin.com/datarss/</id> <author> <name>Peter Mika ([email protected])</name> </author> <title>Example data feed for social</title> <updated>2007-11-14T04:05:06+07:00</updated> <entry> <!-- title field of entry is not used for anything --> <title>Peter Mika</title> <!--URL of the webpage extracted from --> <id>http://www.linkedin.com/ppl/webprofile?id=5054019</id> <updated>2007-11-14T04:05:06+07:00</updated> <content type=&quot;application/xml&quot;> <y:adjunct version=&quot;1.0&quot; name=&quot;social-simple&quot; xmlns:y=&quot;http://search.yahoo.com/datarss/&quot;> <y:item rel=&quot;dc:subject&quot;> <y:type typeof=&quot;foaf:Person&quot;> <y:meta property=&quot;foaf:name&quot;>John Doe</y:meta> <y:meta property=&quot;foaf:gender&quot;>male</y:meta> <y:item rel=&quot;foaf:homepage&quot; resource=&quot;http://www.joeisageek.com&quot;/> <y:item rel=&quot;foaf:mbox&quot; resource=&quot;mailto:[email protected]&quot;/> <y:item rel=&quot;foaf:weblog&quot; resource=&quot;http://johnblog.example.org&quot;/> <y:item rel=&quot;foaf:knows&quot;> <y:type typeof=&quot;foaf:Person&quot;> <y:meta property=&quot;foaf:name&quot;>Jane Doe</y:meta> <y:meta property=&quot;foaf:gender&quot;>female</y:meta> <y:item rel=&quot;foaf:mbox&quot; resource=&quot;mailto:[email protected]&quot;/> </y:type> </y:item> </y:type> </y:item> </y:adjunct> </entry> </feed> Atom 1.0 XML + RDFa
  • 91. The data part <adjunct version=&quot;1.0&quot; id=“com.yahoo.page.rdfa&quot; xmlns=&quot;http://search.yahoo.com/datarss/“ updated=“2007-11-14T04:05:06+07:00”> <item rel=&quot;dc:subject&quot;> <type typeof =&quot;foaf:Person&quot;> <meta property =&quot;foaf:name&quot;>John Doe</meta> <meta property=&quot;foaf:gender&quot;>male</meta> <item rel =&quot;foaf:homepage&quot; resource =&quot;http://www.joeisageek.com&quot;/> <item rel=&quot;foaf:mbox&quot; resource=&quot;mailto:[email protected]&quot;/> <item rel=&quot;foaf:weblog&quot; resource=&quot;http://johnblog.example.org&quot;/> <item rel=&quot;foaf:knows&quot;> <type typeof=&quot;foaf:Person&quot;> <meta property=&quot;foaf:name&quot;>Jane Doe</meta> <meta property=&quot;foaf:gender&quot;>female</meta> <item rel=&quot;foaf:mbox&quot; resource=&quot;mailto:[email protected]&quot;/> </type> </item> </type> </item> </adjunct>
  • 92. Developer tool: create custom presentations
  • 98. Example apps LinkedIn hCard plus feed data Creative Commons by Ben Adida CC in RDFa
  • 99. Example apps. II. Other me by Dan Brickley Google Social Graph API wrapped using a Web Service
  • 100. Google’s Rich Snippets Shares a subset of the features of SearchMonkey Encourages publishers to embed certain microformats and RDFa into webpages Currently reviews, people, products, business & organizations These are used to generate richer search results SearchMonkey is customizable Developers can develop applications themselves SearchMonkey is open Wide support for standard vocabularies API access
  • 101. API access to metadata Yahoo BOSS & YQL
  • 102. BOSS: Build your Own Search Service Ability to re-order results and blend-in addition content No restrictions on presentation No branding or attribution Access to multiple verticals (web search, image, news) 40+ supported language and region pairs Pricing (BOSS) Pay-by-usage 10,000 queries a day still free Serve any ads you want For more info, http://developer.yahoo.com/search/boss/
  • 103. BOSS API to structured data Simple HTTP GET calls, no authentication You need an Application ID: register at developer.yahoo.com/search/boss/ http://boss.yahooapis.com/ysearch/web/v1/{query}?appid={appid}&format=xml&view=searchmonkey_feed Restrict your query using special words searchmonkey:com.yahoo.page.uf.{format} {format} is one of hcard, hcalendar, tag, adr, hresume etc. searchmonkey:com.yahoo.page.rdf.rdfa
  • 104. Demo: resume search Search pages with resume data and given keywords {keyword} searchmonkey:com.yahoo.page.uf.hresume Parse the results as DataRSS (XML) Extract information and display using YUI
  • 105. Demo
  • 106. Yahoo Query Language (YQL) Query web APIs as virtual tables Mash-up data by joining tables Add an API by adding a table definition Example: select my friends and sort by nickname
  • 107. PHP example : select the last 100 photos from Flickr with the word Austin <?php $url = &quot;http://query.yahooapis.com/v1/public/yql?q=&quot;; $q = &quot;select * from flickr.photos.search(100) where text=’Austin'&quot;; $fmt = &quot;xml&quot;; $x = simplexml_load_file($url.urlencode($q).&quot;&format=$fmt&quot;); foreach($x->attributes('http://www.yahooapis.com/v1/base.rng') as $k=>$v) { $$k=(string)$v; } echo <<<EOB $count photos fetched from {$x->diagnostics->url} in {$x->diagnostics->url['execution-time']} seconds<br> EOB; $flickr = &quot;http://static.flickr.com/&quot;; foreach($x->results->photo as $p) { echo &quot;<img src=\&quot;$flickr{$p['server']}/{$p['id']}_{$p['secret']}_s.jpg\&quot;/>\n&quot;; } ?>
  • 108. YQL example ( source )
  • 109. That’s all there is to it! <?php $root = 'http://query.yahooapis.com/v1/public/yql?q='; $city = 'Barcelona'; $loc = 'Barcelona'; $yql = 'select * from html where url = \'http://en.wikipedia.org/wiki/'.$city.'\' and xpath=&quot;//div[@id=\'bodyContent\']/p&quot; limit 3'; $url = $root . urlencode($yql) . '&format=xml'; $info = getstuff($url); $info = preg_replace(&quot;/.*<results>|<\/results>.*/&quot;,'',$info); $info = preg_replace(&quot;/<\?xml version=\&quot;1\.0\&quot;&quot;. &quot; encoding=\&quot;UTF-8\&quot;\?>/&quot;,'',$info); $info = preg_replace(&quot;//&quot;,'',$info); $info = preg_replace(&quot;/\&quot;\/wiki/&quot;,'&quot;http://en.wikipedia.org/wiki',$info); $yql = 'select * from upcoming.events.bestinplace(5) where woeid in '. '(select woeid from geo.places where text=&quot;'.$loc.'&quot;)'. ' | unique(field=&quot;description&quot;)'; $url = $root . urlencode($yql) . '&format=json'; $events = getstuff($url); $events = json_decode($events); foreach($events->query->results->event as $e){ $evHTML.='<li><h3><a href=&quot;'.$e->ticket_url.'&quot;>'.$e->name.'</a></h3><p>'. substr($e->description,0,100).'&hellip;</p></li>'; } $yql = 'select * from flickr.photos.info where photo_id in '. '(select id from flickr.photos.search where woe_id in '. '(select woeid from geo.places where text=&quot;'.$loc.'&quot;)) limit 16'; $url = $root . urlencode($yql) . '&format=json'; $photos = getstuff($url); $photos = json_decode($photos); foreach($photos->query->results->photo as $s){ $src = &quot;http://farm{$s->farm}.static.flickr.com/{$s->server}/&quot;. &quot;{$s->id}_{$s->secret}_s.jpg&quot;; $phHTML.='<li><a href=&quot;'.$s->urls->url->content.'&quot;><img alt=&quot;'. $s->title.'&quot; src=&quot;'.$src.'&quot;></a></li>'; } $yql='select description from rss where '. ' url=&quot;http://weather.yahooapis.com/forecastrss?p=SPXX0015&u=c&quot;'; $url = $root . urlencode($yql) . '&format=json'; $weather = getstuff($url); $weather = json_decode($weather); $weHTML = $weather->query->results->item->description; function getstuff($url){ $curl_handle = curl_init(); curl_setopt($curl_handle, CURLOPT_URL, $url); curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2); curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1); $buffer = curl_exec($curl_handle); curl_close($curl_handle); if (empty($buffer)){ return 'Error retrieving data, please try later.'; } else { return $buffer; } }?>
  • 110. Semantic Search and Navigation
  • 111. Browsing Linked Data Browse the Linked Data graph by going from one URI to the next OpenLink’s Linked Data browser OpenLink’s Data Explorer Tabulator Disco Marbles
  • 112. Semantic Search Engines Natural Language search engines Hakia Powerset (now built into Bing) TrueKnowledge Structured data search engines Searching open web data Sindice Sigma Searching closed world data Wolfram Alpha (closed structured data + computation)
  • 113. Exercise Build a SearchMonkey application Try a few YQL queries Try Zemanta online You don’t need to install the plugin Find data on the Web using search or navigation
  • 115. Semantic Search Def. matching the user’s query with the Web’s content at a conceptual level, often with the help of world knowledge Related disciplines Semantic Web, IR, Databases, NLP, IE As a field ISWC/ESWC/ASWC, WWW, SIGIR Exploring Semantic Annotations in Information Retrieval (ECIR08, WSDM09) Semantic Search Workshop (ESWC08, WWW09) Future of Web Search (FoWS09)
  • 116. Hard searches Ambiguous searches Paris Hilton Multimedia search Images of Paris Hilton Imprecise or overly precise searches Publications by Jim Hendler Find images of strong and adventurous people (Lenat) Searches for descriptions Search for yourself without using your name Product search (ads!) Searches that require aggregation Size of the Eiffer tower (Lenat) Public opinion on Britney Spears Queries that require a deeper understanding of the query, the content and/or the world at large Note: some of these are so hard that users don’t even try them any more
  • 118. In the best of cases… Matching the query intent with the document metadata can be trivial: <adjunct id=&quot;com.yahoo.query.intent&quot; version=&quot;0.5&quot;> <type typeof=&quot; fb:music.artist foaf:Person &quot;> <meta property=&quot; foaf:name &quot;> Madonna </meta> </type> </adjunct> <adjunct id=&quot;com.yahoo.page.hcard&quot; version=&quot;0.5&quot;> <type typeof=“ foaf:Person &quot;> <meta property=&quot; foaf:name &quot;> Madonna </meta> </type> </adjunct> Query: Document metadata: dna_checksum:AF514FE45DD33BB7CD8DCCC89AA dna_checksum:AF514FE45DD33BB7CD8DCCC89AA
  • 119. Semantics at every step of the IR process bla bla bla? q=“bla” * 3 Document processing bla bla bla Ranking Query processing Result presentation The IR engine The Web bla bla bla bla bla bla “ bla” θ (q,d)
  • 120. Study: improving text analysis using structured data Problems Creating training data manually is expensive Existing taggers trained on financial-political news Idea: Learn the correspondence between entities in text and metadata Extend knowledge base and generate in-domain training data Learning to Tag and Tagging to Learn: A Case Study on Wikipedia Peter Mika; Massimiliano Ciaramita; Hugo Zaragoza; Jordi Atserias, IEEE Intelligent Systems, 2008, 5
  • 121. Study: processing metadata using cloud computing Question: Can we use Pig to effectively query and reason with large amounts of RDF data? Mapping SPARQL to PigLatin Forward-chaining RDF(S) reasoning Acknowledgement: Ben Reed Experimental results (LUBM) Useful for long running queries and reasoning Not useful for interactive queries (< 100 s) Co-organizing: Billion Triples Challenge at ISWC 2008, October 26-28, Karlsruhe, Germany Web Semantics in the Clouds Peter Mika; Giovanni Tummarello, IEEE Intelligent Systems, 2008, 5
  • 122. Study: metadata analysis What vocabularies are being used? . What microformats should we support? How much vocabulary reuse/extension there is? Is there a convergence? What is the quality of metadata? Datatype conformance Logical consistency Conformance to common use wrt common attributes How much spam is there? Distribution of spamicity scores Do spamicity scores transfer to metadata? Are there new schemas emerging through the combination of existing vocabularies? What is the metadata coverage in terms of queries? What percentage of queries from query logs would result in metadata? How many would result in metadata that could answer the query? (by some approximation)
  • 123. Study: Semantic Search Assist Observation: the same type of objects often have the same query context Users asking for the same aspect of the type Could we make query suggestions based on the type of the entity? Improvement for infrequent queries apple ipod nano review sony plasma tv review jerry yang biography biography tim berners lee tim berners lee blog peter mika yahoo britney spears shaves her head
  • 124. Study: evaluation of semantic search Analysis of user needs How are these needs aligned with data on the Web? How do the vocabularies differ? Analysis of query types Object queries? Object-attribute queries? Relationship queries? What it means for an object or a set of triples to be relevant to a query? Show me the answer and only the answer Put me near the answer in the graph Show me the justification (or at least the source) of the answer … Semantic Search evaluation campaign planned for 2010
  • 125. Challenges Future work in Semantic Web (Semi-)automated ways of metadata creation How do we go from 5% to 95%? Data quality We allow providing metadata for other people’s sites! Reasoning To what extent is reasoning useful? For example, how much would entity resolution or taxonomic reasoning help? Scale How do we exploit cluster computing techniques? What is between databases and IR engines? Fostering social agreements How do we get people to reuse vocabularies?
  • 126. Challenges Future work in IR Query interpretation Ranking with metadata Evaluation of semantic search Personalization Semantic ads Constraints Users still want to see a document Keyword-based search cannot suffer Whole page relevance, monetization can only increase Established expectations Query entry Result presentation
  • 127. Contact Peter Mika [email_address] Come to Barcelona and stop by SearchMonkey developer.yahoo.com/searchmonkey/ mailing lists [email_address] [email_address] forums http://suggestions.yahoo.com/searchmonkey Semantic Web FAQ http://devel.yahoo.com/searchmonkey/smguide/faq.html
  • 128. the monkey is out!
  • 129. Application: query intent Paris Hilton is a person!
  • 130. Application: query intent #2 Hugo is a person!

Editor's Notes

  • #38: Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns
  • #46: HTML allows us to place metadata in the head of the document. The metadata can be both properties (as a string) and relationships to other documents.
  • #47: HTML also allows us to put metadata in the body of the document, using @rel and @rev on anchors.
  • #48: RDFa extends the @rel/@href technique to allow licenses to be attached to images. Say we have a list of images -- perhaps from a Flickr search -- here we see that we can attach a license to each of them.
  • #49: HTML allows relationships (the @rel/@href combination) to be used in both the head and the body, but text properties can only be added in the head (via @content on &lt;meta&gt;.
  • #50: RDFa extends the use of @content to the body. Note a small twist -- we have to use @property instead of @name, since the latter attribute is already used for other stuff. Key thing here is that we&apos;ve moved the machine-readable data closer to its human-readable version, which makes it a lot easier to publish.
  • #51: Why would we do this? Well, first of all it&apos;s much easier to control the generation of the machine-readable data if it&apos;s close to the human-readable data. But second, once you put it close to the human-readable data, there are many situations where the human-readable version will also suffice for the machine-readable one, and so we can avoid duplication. Note that using @content for the date, illustrates a different point; in that case we preserve the distinction between the human- and machine-readable forms, because the machine-readable version is very precise.
  • #52: Actually I cheated a little in the last slide. There is no such property as &apos;author&apos; or &apos;created&apos;, they just happen to have been used in &lt;head&gt; over the years by a sort of convention. @rel=&amp;quot;license&apos; does exist, however, and there are a few other relationship values (&apos;next&apos;, &apos;prev&apos;, and so on). But essentially, for other relationship values, and all property values, we need to use CURIEs. The advantage of this is that there are many pre-existing vocabularies that can immediately be used. Also, anyone can create a new vocabulary without having to ask anyone. Commontags was devised a few weeks ago, for example, and they didn&apos;t have to ask anyone&apos;s permission.
  • #54: Recall that we added the relationship attributes to an image, so that we can specify license information...
  • #55: ...we can also add properties to the image.
  • #56: HTML already supported relationships and properties that apply to the document, and we&apos;ve seen how RDFa adds relationships and properties for images. Now lets look at how RDFa lets us add relationships and properties for  anything . Let&apos;s say we have a link to a SlideShare presentation.
  • #57: We know that if we put the @rel attribute onto the &lt;a&gt; tag as normal, it implies that the current document has a license, and that the presentation itself is the license. So this is no good.
  • #58: The answer is to firstly create a link to the desired license...
  • #59: ...and then to indicate that this license is attached to the presentation. We still use @rel, but now we&apos;re using it with the new attribute that RDFa adds -- @about.
  • #60: And of course, we can also add properties.
  • #61: Using @about sets the context for any further RDFa, not just on the current element.
  • #62: Once you are in the new context, then everything works exactly as normal, so compare this to the previous slide; the only difference is that the previous slide uses @about to set the context, whilst this example has the &apos;current document&apos; to set the context.
  • #63: We&apos;ve gone into a lot of detail on the basics of RDFa to show how it builds upon HTML&apos;s existing semantic features, but there are many more features. The main thing to emphasise is that HTML already had some useful semantic features, but what they meant was never formalised; RDFa did that. RDFa also adds to these features, but does so by applying the same approach.
  • #64: There is much more we could have said, but suggest that interested readers look at the RDFa Primer, and other tutorials and articles. In passing, would say though that RDFa supports all of RDF&apos;s more advanced features too, such as datatype of literals, rdf:type , bnodes, XML literals, and so on. Advanced RDFa also allows quite elaborate chaining of statements allowing people to be connected to companies, reviews to businesses, and so on.
  • #84: As Vish discussed, SearchMonkey is all about building richer, more useful search results. Here’s a few examples Enhanced Results.
  • #86: And it allows the user to add the movie directly to their online movie rental queue
  • #87: [will be animated]
  • #93: [will be animated]
  • #94: [will be animated]
  • #95: [will be animated]
  • #96: [will be animated]
  • #97: [will be animated]
  • #98: [will be animated]
  • #116: SW: Representing and reasoning with structured data on the Web Both a relational and graph view on information IR:: Aggregating information at a document-level based on ad-hoc information needs DB: Representing and querying information in a relational model NLP: from text to information
  • #118: Results are good, but consider the ads: First ad says: Virgins. Looking for virgins? Find exactly what you want today. Ebay.com Second ad: Virgins. …Find cheap tickets for Virgins. Third ad: Adspam… these people buy Yahoo! traffic and sell it to Google.