Here are some options for completing your query:
- Freddie Mercury was the lead singer of Queen
- Brian May was the guitarist for Queen
- Queen was a British rock band formed in 1970
- Freddie Mercury died in 1991 from complications due to AIDS
This document discusses semantic search and how it can improve traditional information retrieval systems. It provides examples of how semantic search uses structured data and schemas to better understand user intent and content meaning. This allows semantic search to enhance various stages of the information retrieval process from query interpretation to result presentation. The document also outlines the growing adoption of semantic web standards like RDFa and schema.org to expose structured data on webpages.
The document discusses semantic search capabilities at Yahoo. It describes how Yahoo has developed techniques to extract structured data and metadata from webpages to power enhanced search results. This includes information extraction, data fusion, and curating knowledge in a graph. Yahoo uses this knowledge to better understand search queries and present relevant entities and attributes in results. Semantic search remains an active area of research.
Semantic Search tutorial at SemTech 2012Peter Mika
ย
This document provides an introduction to a semantic search tutorial given by Peter Mika and Tran Duc Thanh. The agenda covers semantic web data, including the RDF data model and publishing RDF data. It also covers query processing, ranking, result presentation, evaluation, and a question period. The document discusses why semantic search is needed to address poorly solved queries and enable novel search tasks using structured data and background knowledge.
This document describes related entity finding on the web and semantic search. It discusses using the structure of semantic data and ontologies to better understand user intent and the meaning of queries and content. This can help improve search accuracy and enable new types of searches beyond traditional keyword matching. The document provides examples of related entity recommendations during web searches and outlines the workflow used to extract features from query and interaction data to identify and rank related entities.
Semantic search: from document retrieval to virtual assistantsPeter Mika
ย
This document summarizes a presentation on semantic search given by Peter Mika, a senior research scientist at Yahoo Labs. It discusses the history and goals of semantic search, including improving query understanding and bridging the semantic gap. It also describes Yahoo's research into semantic search applications for web search, including enhancing search results, entity retrieval and recommendations, and question answering. Semantic representations of queries and documents are key to these applications.
Talk at the 2nd Summer Workshop of the Center for Semantic Web Research (January 16, 2016, Santiago, Chile) about the construction of Yahoo's Knowledge Graph and associated research challenges.
This document discusses semantic search and how thesauri can improve search experiences. It describes different types of semantic searches and demands for smarter searches. PoolParty Semantic Search is presented as a solution that leverages thesauri to provide auto-complete, query expansion, faceted search, and integration of linked data from multiple sources. A live demo of PoolParty Semantic Search is available online.
The document summarizes the history and impact of the Semantic Web. It discusses how the Semantic Web was originally envisioned as a way to make information on the web more machine-readable through semantic annotations. While early work showed promise, widespread adoption lagged behind expectations. Key impacts included positive but limited effects on web search through knowledge graphs, the rise of centralized social networks rather than distributed semantic social media, and limited use in e-commerce. Ongoing work continues on standards and applications while addressing challenges around centralization.
Semantic search uses language processing to analyze the meaning of content and search queries to return more relevant results. It involves classifying content using taxonomies, identifying named entities, extracting relationships between entities, and matching these based on meaning. Implementing semantic search requires preparing content through classification, metadata, and information architecture, as well as technologies for semantic tagging, entity extraction, triple stores, and integrating these capabilities with existing search and content management systems.
The document discusses the evolution of search engines from basic keyword search to semantic search using knowledge graphs and structured data. It provides examples of how search engines like Google are now able to provide direct answers to queries by searching structured data rather than just documents. It emphasizes the importance of representing web content as structured data using schemas like schema.org to be discoverable in semantic search and knowledge graphs.
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Bradley Allen
ย
The document describes a system for faceted navigation of multimedia content using semantic web technologies. It discusses using ontologies expressed in RDF(S) and OWL to represent metadata, BBC rush footage used as a case study, and visual facets for color, texture and combinations that were generated through MPEG-7 feature extraction and self-organizing map clustering. The system allows retrieval of clips and shots based on textual and visual facet filtering of the RDF represented multimedia data.
The document discusses semantic search and summarizes some key points:
1. Semantic search aims to improve search by exploiting structured data and metadata to better understand user intent and content meaning.
2. It can make use of information extraction techniques to extract implicit metadata from unstructured web pages, or rely on publishers exposing structured data using semantic web formats.
3. Semantic search can enhance different stages of the information retrieval process like query interpretation, indexing, ranking, and evaluation.
Reflected Intelligence: Real world AI in Digital TransformationTrey Grainger
ย
The goal of most digital transformations is to create competitive advantage by enhancing customer experience and employee success, so giving these stakeholders the ability to find the right information at their moment of need is paramount. Employees and customers increasingly expect an intuitive, interactive experience where they can simply type or speak their questions or keywords into a search box, their intent will be understood, and the best answers and content are then immediately presented.
Providing this compelling experience, however, requires a deep understanding of your content, your unique business domain, and the collective and personalized needs of each of your users. Modern artificial intelligence (AI) approaches are able to continuously learn from both your content and the ongoing stream of user interactions with your applications, and to automatically reflect back that learned intelligence in order to instantly and scalably deliver contextually-relevant answers to employees and customers.
In this talk, we'll discuss how AI is currently being deployed across the Fortune 1000 to accomplish these goals, both in the digital workplace (helping employees more efficiently get answers and make decisions) and in digital commerce (understanding customer intent and connecting them with the best information and products). We'll separate fact from fiction as we break down the hype around AI and show how it is being practically implemented today to power many real-world digital transformations for the next generation of employees and customers.
The document summarizes a presentation given by Bill Slawski at the Semantic Technology & Business Conference in San Jose. The presentation discussed how adding semantic information and structuring content around entities can help websites better optimize for search engines and provide more relevant experiences for users. It also provided several examples of how search engines are using entities and knowledge graphs to enhance search results and anticipate related queries.
From queries to answers in the Web document discusses:
- How web search has evolved from primarily returning links to now attempting to directly answer queries.
- Future trends in search include more personalized, social, contextual and anticipatory search capabilities.
- Semantic search aims to understand user intent and resources using semantic models to improve matching and ranking.
Presentation of the Semantic Knowledge Graph research paper at the 2016 IEEE 3rd International Conference on Data Science and Advanced Analytics (Montreal, Canada - October 18th, 2016)
AbstractโThis paper describes a new kind of knowledge representation and mining system which we are calling the Semantic Knowledge Graph. At its heart, the Semantic Knowledge Graph leverages an inverted index, along with a complementary uninverted index, to represent nodes (terms) and edges (the documents within intersecting postings lists for multiple terms/nodes). This provides a layer of indirection between each pair of nodes and their corresponding edge, enabling edges to materialize dynamically from underlying corpus statistics. As a result, any combination of nodes can have edges to any other nodes materialize and be scored to reveal latent relationships between the nodes. This provides numerous benefits: the knowledge graph can be built automatically from a real-world corpus of data, new nodes - along with their combined edges - can be instantly materialized from any arbitrary combination of preexisting nodes (using set operations), and a full model of the semantic relationships between all entities within a domain can be represented and dynamically traversed using a highly compact representation of the graph. Such a system has widespread applications in areas as diverse as knowledge modeling and reasoning, natural language processing, anomaly detection, data cleansing, semantic search, analytics, data classification, root cause analysis, and recommendations systems. The main contribution of this paper is the introduction of a novel system - the Semantic Knowledge Graph - which is able to dynamically discover and score interesting relationships between any arbitrary combination of entities (words, phrases, or extracted concepts) through dynamically materializing nodes and edges from a compact graphical representation built automatically from a corpus of data representative of a knowledge domain.
Search and social patents for 2012 and beyondBill Slawski
ย
The document summarizes Bill Slawski's presentation on search and social media patents from 2012 and beyond. It discusses various patents Google has acquired related to search, social media, hardware, fiber optic networks, and more. It also outlines patents for phrase-based indexing, concept-based indexing, ranking pages based on user interactions, building a knowledge graph, and developing a planet-scale distributed search index. Slawski suggests Google may expand into hardware, entertainment, internet service provision, and more based on its patent portfolio.
The document discusses the history and development of the Semantic Web over the past 20 years. It begins with Tim Berners-Lee originally conceiving of the Semantic Web in 1994 with a vision of machines being able to understand web documents and perform tasks like property transfers. Since then, there has been over 200 talks on the Semantic Web but the focus was initially on technologies like XML, RDF, and OWL. More recently, Linked Data and RDFa have seen the most usage in applications while the ontology story remains unclear. Moving forward, bridging the gaps between linked data and formal ontology views will require addressing challenges like modeling incomplete and decentralized data at web-scale.
Slides for the iDB summer school (Sapporo, Japan) http://db-event.jpn.org/idb2013/
Typically, Web mining approaches have focused on enhancing or learning about user seeking behavior, from query log analysis and click through usage, employing the web graph structure for ranking to detecting spam or web page duplicates. Lately, there's a trend on mining web content semantics and dynamics in order to enhance search capabilities by either providing direct answers to users or allowing for advanced interfaces or capabilities. In this tutorial we will look into different ways of mining textual information from Web archives, with a particular focus on how to extract and disambiguate entities, and how to put them in use in various search scenarios. Further, we will discuss how web dynamics affects information access and how to exploit them in a search context.
Semantic seo and the evolution of queriesBill Slawski
ย
This document summarizes how Google search results are evolving to include more semantic data through direct answers, structured snippets, and rich snippets. It provides examples of direct answers being extracted from authoritative sources using natural language queries and intent templates. It also discusses how including structured data like tables, schemas, and markup can help search engines understand and display page content in a more standardized way. While knowledge-based trust is an interesting concept, current search ranking still primarily relies on link analysis and does not consider factual correctness.
Ranking in Google Since The Advent of The Knowledge GraphBill Slawski
ย
A Two Person Panel Discussion/Presentation by Bill Slawski and Barbara Starr On June 23, 2015
The Lotico Semantic Web of San Diego
The SEO San Diego Meetup
The SEM San Diego Meetup
http://www.meetup.com/InternetMarketingSanDiego/events/222788495/
User experience drives search engines, and hence their results. Search Engine Result Presentation/Placements naturally follow that route.
This means that search results are no longer exclusively based on just ranking criteria. Amongst other critical factors is understanding the notion of 'ordering vs ranking', the impact of context and many others.
The document summarizes recent developments in semantic search engines. It discusses the principles of the semantic web and languages like RDF, RDFS, and OWL. It then summarizes the Falcons semantic search engine and how it indexes and searches semantic web objects. It also discusses efforts by Google, Yahoo, and Microsoft to incorporate semantic data through rich snippets, SearchMonkey, and Schema.org. Finally, it introduces the Kngine search engine as a new promising engine that aims to go beyond existing sources by indexing structured information on the web.
The document summarizes research in semantic search and its applications. It discusses the evolution of semantic search from early work on the semantic web to current applications using knowledge graphs. It outlines key challenges in semantic search like query understanding and how mobile search is driving new areas like conversational agents and task completion. The use of semantic representations and knowledge bases is helping to improve search quality and enable new interactive applications.
Semantic mark-up with schema.org: helping search engines understand the WebPeter Mika
ย
This document discusses semantic markup with schema.org to help search engines understand web pages better. It describes how schema.org was created as a collaborative effort by major search engines to define a shared set of schemas. This allows publishers to markup their content in a consistent way so it can be understood by different search engines and applications. The document outlines how schema.org has grown significantly in adoption and detail over time. It also discusses how schema.org builds on semantic web standards and can describe actions websites can take to help with task completion.
Semantic search: from document retrieval to virtual assistantsPeter Mika
ย
This document summarizes a presentation on semantic search given by Peter Mika, a senior research scientist at Yahoo Labs. It discusses the history and goals of semantic search, including improving query understanding and bridging the semantic gap. It also describes Yahoo's research into semantic search applications for web search, including enhancing search results, entity retrieval and recommendations, and question answering. Semantic representations of queries and documents are key to these applications.
Talk at the 2nd Summer Workshop of the Center for Semantic Web Research (January 16, 2016, Santiago, Chile) about the construction of Yahoo's Knowledge Graph and associated research challenges.
This document discusses semantic search and how thesauri can improve search experiences. It describes different types of semantic searches and demands for smarter searches. PoolParty Semantic Search is presented as a solution that leverages thesauri to provide auto-complete, query expansion, faceted search, and integration of linked data from multiple sources. A live demo of PoolParty Semantic Search is available online.
The document summarizes the history and impact of the Semantic Web. It discusses how the Semantic Web was originally envisioned as a way to make information on the web more machine-readable through semantic annotations. While early work showed promise, widespread adoption lagged behind expectations. Key impacts included positive but limited effects on web search through knowledge graphs, the rise of centralized social networks rather than distributed semantic social media, and limited use in e-commerce. Ongoing work continues on standards and applications while addressing challenges around centralization.
Semantic search uses language processing to analyze the meaning of content and search queries to return more relevant results. It involves classifying content using taxonomies, identifying named entities, extracting relationships between entities, and matching these based on meaning. Implementing semantic search requires preparing content through classification, metadata, and information architecture, as well as technologies for semantic tagging, entity extraction, triple stores, and integrating these capabilities with existing search and content management systems.
The document discusses the evolution of search engines from basic keyword search to semantic search using knowledge graphs and structured data. It provides examples of how search engines like Google are now able to provide direct answers to queries by searching structured data rather than just documents. It emphasizes the importance of representing web content as structured data using schemas like schema.org to be discoverable in semantic search and knowledge graphs.
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Bradley Allen
ย
The document describes a system for faceted navigation of multimedia content using semantic web technologies. It discusses using ontologies expressed in RDF(S) and OWL to represent metadata, BBC rush footage used as a case study, and visual facets for color, texture and combinations that were generated through MPEG-7 feature extraction and self-organizing map clustering. The system allows retrieval of clips and shots based on textual and visual facet filtering of the RDF represented multimedia data.
The document discusses semantic search and summarizes some key points:
1. Semantic search aims to improve search by exploiting structured data and metadata to better understand user intent and content meaning.
2. It can make use of information extraction techniques to extract implicit metadata from unstructured web pages, or rely on publishers exposing structured data using semantic web formats.
3. Semantic search can enhance different stages of the information retrieval process like query interpretation, indexing, ranking, and evaluation.
Reflected Intelligence: Real world AI in Digital TransformationTrey Grainger
ย
The goal of most digital transformations is to create competitive advantage by enhancing customer experience and employee success, so giving these stakeholders the ability to find the right information at their moment of need is paramount. Employees and customers increasingly expect an intuitive, interactive experience where they can simply type or speak their questions or keywords into a search box, their intent will be understood, and the best answers and content are then immediately presented.
Providing this compelling experience, however, requires a deep understanding of your content, your unique business domain, and the collective and personalized needs of each of your users. Modern artificial intelligence (AI) approaches are able to continuously learn from both your content and the ongoing stream of user interactions with your applications, and to automatically reflect back that learned intelligence in order to instantly and scalably deliver contextually-relevant answers to employees and customers.
In this talk, we'll discuss how AI is currently being deployed across the Fortune 1000 to accomplish these goals, both in the digital workplace (helping employees more efficiently get answers and make decisions) and in digital commerce (understanding customer intent and connecting them with the best information and products). We'll separate fact from fiction as we break down the hype around AI and show how it is being practically implemented today to power many real-world digital transformations for the next generation of employees and customers.
The document summarizes a presentation given by Bill Slawski at the Semantic Technology & Business Conference in San Jose. The presentation discussed how adding semantic information and structuring content around entities can help websites better optimize for search engines and provide more relevant experiences for users. It also provided several examples of how search engines are using entities and knowledge graphs to enhance search results and anticipate related queries.
From queries to answers in the Web document discusses:
- How web search has evolved from primarily returning links to now attempting to directly answer queries.
- Future trends in search include more personalized, social, contextual and anticipatory search capabilities.
- Semantic search aims to understand user intent and resources using semantic models to improve matching and ranking.
Presentation of the Semantic Knowledge Graph research paper at the 2016 IEEE 3rd International Conference on Data Science and Advanced Analytics (Montreal, Canada - October 18th, 2016)
AbstractโThis paper describes a new kind of knowledge representation and mining system which we are calling the Semantic Knowledge Graph. At its heart, the Semantic Knowledge Graph leverages an inverted index, along with a complementary uninverted index, to represent nodes (terms) and edges (the documents within intersecting postings lists for multiple terms/nodes). This provides a layer of indirection between each pair of nodes and their corresponding edge, enabling edges to materialize dynamically from underlying corpus statistics. As a result, any combination of nodes can have edges to any other nodes materialize and be scored to reveal latent relationships between the nodes. This provides numerous benefits: the knowledge graph can be built automatically from a real-world corpus of data, new nodes - along with their combined edges - can be instantly materialized from any arbitrary combination of preexisting nodes (using set operations), and a full model of the semantic relationships between all entities within a domain can be represented and dynamically traversed using a highly compact representation of the graph. Such a system has widespread applications in areas as diverse as knowledge modeling and reasoning, natural language processing, anomaly detection, data cleansing, semantic search, analytics, data classification, root cause analysis, and recommendations systems. The main contribution of this paper is the introduction of a novel system - the Semantic Knowledge Graph - which is able to dynamically discover and score interesting relationships between any arbitrary combination of entities (words, phrases, or extracted concepts) through dynamically materializing nodes and edges from a compact graphical representation built automatically from a corpus of data representative of a knowledge domain.
Search and social patents for 2012 and beyondBill Slawski
ย
The document summarizes Bill Slawski's presentation on search and social media patents from 2012 and beyond. It discusses various patents Google has acquired related to search, social media, hardware, fiber optic networks, and more. It also outlines patents for phrase-based indexing, concept-based indexing, ranking pages based on user interactions, building a knowledge graph, and developing a planet-scale distributed search index. Slawski suggests Google may expand into hardware, entertainment, internet service provision, and more based on its patent portfolio.
The document discusses the history and development of the Semantic Web over the past 20 years. It begins with Tim Berners-Lee originally conceiving of the Semantic Web in 1994 with a vision of machines being able to understand web documents and perform tasks like property transfers. Since then, there has been over 200 talks on the Semantic Web but the focus was initially on technologies like XML, RDF, and OWL. More recently, Linked Data and RDFa have seen the most usage in applications while the ontology story remains unclear. Moving forward, bridging the gaps between linked data and formal ontology views will require addressing challenges like modeling incomplete and decentralized data at web-scale.
Slides for the iDB summer school (Sapporo, Japan) http://db-event.jpn.org/idb2013/
Typically, Web mining approaches have focused on enhancing or learning about user seeking behavior, from query log analysis and click through usage, employing the web graph structure for ranking to detecting spam or web page duplicates. Lately, there's a trend on mining web content semantics and dynamics in order to enhance search capabilities by either providing direct answers to users or allowing for advanced interfaces or capabilities. In this tutorial we will look into different ways of mining textual information from Web archives, with a particular focus on how to extract and disambiguate entities, and how to put them in use in various search scenarios. Further, we will discuss how web dynamics affects information access and how to exploit them in a search context.
Semantic seo and the evolution of queriesBill Slawski
ย
This document summarizes how Google search results are evolving to include more semantic data through direct answers, structured snippets, and rich snippets. It provides examples of direct answers being extracted from authoritative sources using natural language queries and intent templates. It also discusses how including structured data like tables, schemas, and markup can help search engines understand and display page content in a more standardized way. While knowledge-based trust is an interesting concept, current search ranking still primarily relies on link analysis and does not consider factual correctness.
Ranking in Google Since The Advent of The Knowledge GraphBill Slawski
ย
A Two Person Panel Discussion/Presentation by Bill Slawski and Barbara Starr On June 23, 2015
The Lotico Semantic Web of San Diego
The SEO San Diego Meetup
The SEM San Diego Meetup
http://www.meetup.com/InternetMarketingSanDiego/events/222788495/
User experience drives search engines, and hence their results. Search Engine Result Presentation/Placements naturally follow that route.
This means that search results are no longer exclusively based on just ranking criteria. Amongst other critical factors is understanding the notion of 'ordering vs ranking', the impact of context and many others.
The document summarizes recent developments in semantic search engines. It discusses the principles of the semantic web and languages like RDF, RDFS, and OWL. It then summarizes the Falcons semantic search engine and how it indexes and searches semantic web objects. It also discusses efforts by Google, Yahoo, and Microsoft to incorporate semantic data through rich snippets, SearchMonkey, and Schema.org. Finally, it introduces the Kngine search engine as a new promising engine that aims to go beyond existing sources by indexing structured information on the web.
The document summarizes research in semantic search and its applications. It discusses the evolution of semantic search from early work on the semantic web to current applications using knowledge graphs. It outlines key challenges in semantic search like query understanding and how mobile search is driving new areas like conversational agents and task completion. The use of semantic representations and knowledge bases is helping to improve search quality and enable new interactive applications.
Semantic mark-up with schema.org: helping search engines understand the WebPeter Mika
ย
This document discusses semantic markup with schema.org to help search engines understand web pages better. It describes how schema.org was created as a collaborative effort by major search engines to define a shared set of schemas. This allows publishers to markup their content in a consistent way so it can be understood by different search engines and applications. The document outlines how schema.org has grown significantly in adoption and detail over time. It also discusses how schema.org builds on semantic web standards and can describe actions websites can take to help with task completion.
Making the Web Searchable - Keynote ICWE 2015Peter Mika
ย
This document discusses making the web more searchable through semantic technologies. It begins with an overview of how web search currently works and its limitations, and then discusses how the semantic web aims to address these issues by adding explicit meaning and relationships between data on the web. It describes early skepticism of the semantic web from the information retrieval community and how it has become more practical over time. It also outlines research into semantic search done at Yahoo, including developing a knowledge graph and using semantic information to enhance search results. Finally, it discusses how semantic technologies are now being adopted more widely through efforts like schema.org.
(Keynote) Peter Mika - โMaking the Web Searchableโicwe2015
ย
This document discusses making web search more intelligent through semantic search techniques. It begins by describing how current web search works but has limitations due to not understanding context and meaning. The promise of the semantic web to address this through shared identifiers and structured data is then presented. However, challenges have prevented it from being fully realized. The document outlines research at Yahoo on semantic search, including exploiting semantic models and metadata to enhance search results. This involves techniques such as knowledge graphs, which can provide important entity information to better satisfy user search needs.
LinkedIn is the premiere professional social network with over 60 million users and a new user joining every second. One of LinkedIn's strategic advantages is their unique data. While most organizations consider data as a service function, LinkedIn considers data a cornerstone of their product portfolio.
To rapidly develop these products LinkedIn leverages a number of technologies including open source, 3rd party solutions, and some we've had to invent along the way.
This LinkedIn talk at the NYC Hadoop Meetup held 3/18 at ContextWeb focused on best practices for quickly uncovering patterns, visualizing trends, and generating actionable insights from large datasets.
This document discusses the evolution of search and the future of Microsoft Search in Bing. It begins with an overview of how search has evolved from classic to modern experiences. It then discusses some of the key challenges with classic intranet search experiences. The document outlines features of Microsoft Search including its use of the Microsoft Graph to provide personalized, intelligent search results across Microsoft 365 apps and an organization's intranet. It positions Microsoft Search in Bing as a familiar entry point for search and discusses how it provides both work and web results with enterprise-grade security. The document concludes with next steps for enabling Microsoft Search.
AI, Search, and the Disruption of Knowledge ManagementTrey Grainger
ย
Trey Grainger discussed how search has evolved from basic keyword search to more advanced capabilities like understanding user intent, providing personalized search, and augmented search using machine learning and AI. He explained the concept of "reflected intelligence" where user interactions with search results are used to continuously improve search quality through techniques like signals boosting, learning to rank, and collaborative filtering. Grainger also outlined how knowledge graphs can help power semantic search by modeling relationships between entities to better understand queries and provide more relevant results.
This document discusses extracting insights from data exhaust, or unused data. It provides 10 lessons for doing so: choose a meaningful problem, find relevant data, raw data is better than processed, guide user input, solve easier problems first, create a quick baseline model, test on sample data, use continuous integration, pick the right tools, and prioritize developer productivity. As a case study, it analyzes skills data from attendees of the Strata conference to understand the audience and identify skills clusters. Visualization tools like Gephi are used to analyze similarities between attendees based on their skill vectors.
Social Networks and the Semantic Web: a retrospective of the past 10 yearsPeter Mika
ย
The document summarizes the past 10 years of social networks and the Semantic Web. It discusses how early visions of a decentralized, interoperable Social-Semantic Web did not fully materialize due to social networks consolidating user data into silos. However, work continues through standards bodies to develop vocabularies and building blocks that could still enable a federated social web. It also notes that while online social science is now widespread, challenges remain around access to social data and the ability to generalize findings over time and platforms.
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Connotate
ย
This document discusses how web data can reveal information about employees, business partners, and persons of interest. It outlines the business case for using web data to conduct background checks and screenings. It also discusses challenges like collecting good data from various sources and analyzing large amounts of unstructured data. Advanced text analytics solutions that use entity resolution and relationship extraction are presented as helping to understand web data. The document concludes by describing how these techniques were applied in a project with Thorn to detect child sex trafficking online.
Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint SummitJoel Oleson
ย
The Four Pillars of Search really help you focus your search planning. In this session we dig into the context, content, metadata and UX or user experience that really matter. We also dig into a variety of publicly accessible SharePoint 2013 real world search pages to demonstrate the value.
South Big Data Hub: Text Data Analysis PanelTrey Grainger
ย
Slides from Trey's opening presentation for the South Big Data Hub's Text Data Analysis Panel on December 8th, 2016. Trey provided a quick introduction to Apache Solr, described how companies are using Solr to power relevant search in industry, and provided a glimpse on where the industry is heading with regard to implementing more intelligent and relevant semantic search.
This document provides information on advanced Google searching techniques. It discusses how search engines work and user expectations. Various search operators and strategies are described, such as phrase searches, Boolean operators, title searches, URL searches, and site-limited searches. The document recommends beginning with a title field search using Boolean expressions that is limited to a top-level domain or specific website to find the most relevant information.
This document discusses building an effective enterprise search strategy and discusses the 4 pillars of search strategy: context, content, metadata, and user experience (UX). It provides an overview of each pillar, with context focusing on understanding the user's role and task, content discussing the need to only index authoritative sources, metadata outlining classes of search users, and UX examining the importance of continuous optimization through analysis and tuning. The overall message is that enterprise search projects often fail because they do not meet user expectations or provide a remarkable experience.
This document discusses key factors to consider when evaluating a search engine, including:
1) Understanding the type of search engine (e.g. free text, directory, meta search) and its search functionality/operators.
2) Benchmarking a search engine by running sample searches and comparing results to preferred engines.
3) Analyzing how search results are ranked and algorithms are evaluated/updated.
4) Noting difficulties in evaluating search results due to ambiguity in search intents.
The document describes a method for focused crawling to retrieve structured data from web pages. It involves using an online classifier trained on URL features to identify pages containing structured data. A bandit-based selection strategy is used to balance exploration and exploitation. Experiments show the adaptive approach retrieves 26% more relevant pages than static classification, and 66% more when focused on a specific objective. Decaying the bandit randomness over time improved results further. The method was able to retrieve hundreds of millions of structured data pages from billions of web pages.
The document discusses challenges related to managing information and metadata across SharePoint and Office 365 environments. It notes that without effective governance, most technology-focused metadata projects will fail. It highlights issues like inconsistent tagging of content by end users, which compromises search and accessibility of information. The document advocates augmenting Microsoft tools with third-party applications that can automatically generate and apply conceptual metadata to content. This helps improve search, records management, data security, compliance, and other information governance capabilities across hybrid environments.
What is the current status quo of the Semantic Web as first mentioned by Tim Berners Lee in 2001?
Not only 10 blue links can drive you traffic anymore, Google has added many so called Knowlegde cards and panels to answer the specific informational need of their users. Sounds complicated, but it isnโt. If you ask for information, Google will try to answer it within the result pages.
I'll share my research from a theoretical point of view through exploring patents and papers, and actual testing cases in the live indices of Google. Getting your site listed as the source of an Answer Card can result in an increase of CTR as much as 16%. How to get listed? Come join my session and I'll shine some light on the factors that come into play when optimizing for Google's Knowledge graph.
Search Analytics: Conversations with Your Customersrichwig
ย
1. The document discusses analyzing search logs to understand how users interact with search engines and how to improve search and site organization based on these insights.
2. Key insights that can be gained from search log analysis include popular search terms, queries that return no results, frequently clicked search results, and patterns in search behavior over time and between user groups.
3. Information from search log analysis can be used to improve search features, results presentation, site navigation, metadata, and content.
The document discusses the Semantic Web and linked data. It describes the Semantic Web as a way to publish information that is easier for machines to process by adding meaning through common formats and shared schemas. It outlines key Semantic Web standards like RDF, OWL, and SPARQL. The document also discusses how data is published on the Semantic Web through linked data, metadata in HTML, SPARQL endpoints, and feeds. It provides examples of publishing and consuming RDF data on the Semantic Web.
Investigating the Semantic Gap through Query Log AnalysisPeter Mika
ย
The document investigates the mismatch between data available on the web and users' information needs through analyzing query logs. It finds that while there is a lot of semantic data available, much of it does not serve to answer the types of questions users ask. It proposes using the contexts and prefixes/postfixes from queries targeting specific types of entities to help identify potential attributes and relationships for those entity types in ontologies.
The document summarizes semantic technologies that can be used to make web search and content more intelligent. It discusses how search and online media are converging, and how semantic markup like RDFa, microformats, and microdata can be used to embed structured data in web pages. This allows search engines and other applications to better understand page content and provide more sophisticated features like entity search, personalized results, and content aggregation.
The document discusses several options for publishing data on the Semantic Web. It describes Linked Data as the preferred approach, which involves using URIs to identify things and including links between related data to improve discovery. It also outlines publishing metadata in HTML documents using standards like RDFa and Microdata, as well as exposing SPARQL endpoints and data feeds.
This document discusses the Semantic Web and Linked Data. It provides an overview of key Semantic Web technologies like RDF, URIs, and SPARQL. It also describes several popular Linked Data datasets including DBpedia, Freebase, Geonames, and government open data. Finally, it discusses the Yahoo BOSS search API and WebScope data for building search applications.
The document discusses different options for publishing metadata on the Semantic Web, including standalone RDF documents, embedding metadata in web pages using techniques like RDFa, providing SPARQL endpoints, publishing feeds, and using automated tools. It provides examples and discusses the advantages of each approach. A brief history of metadata publishing efforts is also presented, from early initiatives like HTML meta tags and SHOE to current standards like RDFa and microformats.
Year of the Monkey: Lessons from the first year of SearchMonkeyPeter Mika
ย
This document discusses publishing content on the Semantic Web. It introduces basic concepts of RDF and the Semantic Web like resources, literals, and triples. It then describes six main ways to publish RDF data on the web: 1) standalone RDF documents, 2) metadata inside webpages using techniques like RDFa, 3) SPARQL endpoints, 4) feeds, 5) XSLT transformations, and 6) automatic markup tools. Finally, it briefly discusses the history of embedding metadata in HTML and examples of metadata standards.
This document discusses publishing content on the Semantic Web. It introduces basic concepts of RDF and the Semantic Web like resources, literals, and triples. It then describes six main ways to publish RDF data on the web: 1) standalone RDF documents, 2) metadata inside webpages using formats like RDFa, 3) SPARQL endpoints, 4) feeds, 5) XSLT transformations, and 6) automatic markup tools. Finally, it briefly reviews the history of embedding metadata in HTML and examples of formats used.
Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity
ย
This session is designed to equip developers with the skills needed to build mission-critical, end-to-end processes that seamlessly orchestrate agents, people, and robots.
๐ Here's what you can expect:
- Modeling: Build end-to-end processes using BPMN.
- Implementing: Integrate agentic tasks, RPA, APIs, and advanced decisioning into processes.
- Operating: Control process instances with rewind, replay, pause, and stop functions.
- Monitoring: Use dashboards and embedded analytics for real-time insights into process instances.
This webinar is a must-attend for developers looking to enhance their agentic automation skills and orchestrate robust, mission-critical processes.
๐จโ๐ซ Speaker:
Andrei Vintila, Principal Product Manager @UiPath
This session streamed live on April 29, 2025, 16:00 CET.
Check out all our upcoming Dev Dives sessions at https://community.uipath.com/dev-dives-automation-developer-2025/.
What is Model Context Protocol(MCP) - The new technology for communication bw...Vishnu Singh Chundawat
ย
The MCP (Model Context Protocol) is a framework designed to manage context and interaction within complex systems. This SlideShare presentation will provide a detailed overview of the MCP Model, its applications, and how it plays a crucial role in improving communication and decision-making in distributed systems. We will explore the key concepts behind the protocol, including the importance of context, data management, and how this model enhances system adaptability and responsiveness. Ideal for software developers, system architects, and IT professionals, this presentation will offer valuable insights into how the MCP Model can streamline workflows, improve efficiency, and create more intuitive systems for a wide range of use cases.
Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark?
At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. ๐
Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! ๐
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company
ย
Explore the benefits and features of advanced logistics management software for businesses in Riyadh. This guide delves into the latest technologies, from real-time tracking and route optimization to warehouse management and inventory control, helping businesses streamline their logistics operations and reduce costs. Learn how implementing the right software solution can enhance efficiency, improve customer satisfaction, and provide a competitive edge in the growing logistics sector of Riyadh.
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025BookNet Canada
ย
Book industry standards are evolving rapidly. In the first part of this session, weโll share an overview of key developments from 2024 and the early months of 2025. Then, BookNetโs resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about whatโs next.
Link to recording, transcript, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55
ย
Weโre bringing the TDX energy to our community with 2 power-packed sessions:
๐ ๏ธ Workshop: MuleSoft for Agentforce
Explore the new version of our hands-on workshop featuring the latest Topic Center and API Catalog updates.
๐ Talk: Power Up Document Processing
Dive into smart automation with MuleSoft IDP, NLP, and Einstein AI for intelligent document workflows.
Procurement Insights Cost To Value Guide.pptxJon Hansen
ย
Procurement Insights integrated Historic Procurement Industry Archives, serves as a powerful complement โ not a competitor โ to other procurement industry firms. It fills critical gaps in depth, agility, and contextual insight that most traditional analyst and association models overlook.
Learn more about this value- driven proprietary service offering here.
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPathCommunity
ย
Join this UiPath Community Berlin meetup to explore the Orchestrator API, Swagger interface, and the Test Manager API. Learn how to leverage these tools to streamline automation, enhance testing, and integrate more efficiently with UiPath. Perfect for developers, testers, and automation enthusiasts!
๐ Agenda
Welcome & Introductions
Orchestrator API Overview
Exploring the Swagger Interface
Test Manager API Highlights
Streamlining Automation & Testing with APIs (Demo)
Q&A and Open Discussion
Perfect for developers, testers, and automation enthusiasts!
๐ Join our UiPath Community Berlin chapter: https://community.uipath.com/berlin/
This session streamed live on April 29, 2025, 18:00 CET.
Check out all our upcoming UiPath Community sessions at https://community.uipath.com/events/.
HCL Nomad Web โ Best Practices and Managing Multiuser Environmentspanagenda
ย
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/
HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed โautomaticallyโ in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client.
Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience.
In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including
- Accessing the console
- Locating and interpreting log files
- Accessing the data folder within the browserโs cache (using OPFS)
- Understand the difference between single- and multi-user scenarios
- Utilizing Client Clocking
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock
ย
Building 10x Organizations with Modern Productivity Metrics
10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, โThe Coding War Games.โ
Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method we invent for the delivery of products, whether physical or virtual, we reinvent productivity philosophies to go alongside them.
But which of these approaches actually work? DORA? SPACE? DevEx? What should we invest in and create urgency behind today, so that we donโt find ourselves having the same discussion again in a decade?
Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo
ย
From predictive maintenance to robotic automation, AI is driving the future of manufacturing. But without high-quality annotated data, even the smartest models fall short.
Discover how data annotation services are powering accuracy, safety, and efficiency in AI-driven manufacturing systems.
Precision in data labeling = Precision on the production floor.
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john
ย
Analyze the growth of meme coins from mere online jokes to potential assets in the digital economy. Explore the community, culture, and utility as they elevate themselves to a new era in cryptocurrency.
Quantum Computing Quick Research Guide by Arthur MorganArthur Morgan
ย
This is a Quick Research Guide (QRG).
QRGs include the following:
- A brief, high-level overview of the QRG topic.
- A milestone timeline for the QRG topic.
- Links to various free online resource materials to provide a deeper dive into the QRG topic.
- Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic.
QRGs planned for the series:
- Artificial Intelligence QRG
- Quantum Computing QRG
- Big Data Analytics QRG
- Spacecraft Guidance, Navigation & Control QRG (coming 2026)
- UK Home Computing & The Birth of ARM QRG (coming 2027)
Any questions or comments?
- Please contact Arthur Morgan at [email protected].
100% human made.
How Can I use the AI Hype in my Business Context?Daniel Lehner
ย
๐๐จ ๐ผ๐ ๐๐ช๐จ๐ฉ ๐๐ฎ๐ฅ๐? ๐๐ง ๐๐จ ๐๐ฉ ๐ฉ๐๐ ๐๐๐ข๐ ๐๐๐๐ฃ๐๐๐ง ๐ฎ๐ค๐ช๐ง ๐๐ช๐จ๐๐ฃ๐๐จ๐จ ๐ฃ๐๐๐๐จ?
Everyoneโs talking about AI but is anyone really using it to create real value?
Most companies want to leverage AI. Few know ๐ต๐ผ๐.
โ What exactly should you ask to find real AI opportunities?
โ Which AI techniques actually fit your business?
โ Is your data even ready for AI?
If youโre not sure, youโre not alone. This is a condensed version of the slides I presented at a Linkedin webinar for Tecnovy on 28.04.2025.
How Can I use the AI Hype in my Business Context?Daniel Lehner
ย
Semantic Search on the Rise
1. Semantic Search on the Rise
P e t e r M i k a | Y a h o o L a b s
T r a n D u c T h a n h | L y f e L i n e C o r p o r a t i o n
2. About the speakers
๏ง Peter Mika
โบ Senior Research Scientist
โบ Head of Semantic Search group at Yahoo! Labs
โบ Expertise: Semantic Web, Information Retrieval,
Natural Language Processing
๏ง Tran Duc Thanh
โบ CTO of LyfeLine Corporation, Tech Startup, Santa Clara
โบ Assistant Professor San Jose State University (on leave),
โบ Served as Assistant Professor for
Stanford University and Karlsruhe Institute of Technology
โบ Expertise: Semantic Search, Semantic / Linked Data Management
3. Agenda
3
๏ง What is Semantic Search?
๏ง Semantic Search technology
๏ง Applications
๏ง Beyond Web Search
๏ง Q&A
5. Why Semantic Search? Part I.
๏ง Improvements in IR are harder and harder to come by
โบ Basic relevance models are well established
โบ Machine learning using hundreds of features
โบ Heavy investment in computational power, e.g. real-time indexing and instant search
๏ง Remaining challenges are not computational, but in modeling user
cognition
โบ Modeling the relationships between:
โข the query
โข the content
โข the world at large
6. ๏ง Semantic gap
โบ Ambiguity
โข jaguar
โข paris hilton
โบ Secondary meaning
โข george bush (and I mean the beer brewer
in Arizona)
โบ Subjectivity
โข reliable digital camera
โข paris hilton sexy
โบ Imprecise or overly precise searches
โข jim hendler
๏ง Complex needs
โบ Missing information
โข brad pitt zombie
โข florida man with 115 guns
โข 35 year old computer scientist living in barcelona
โบ Category queries
โข countries in africa
โข barcelona nightlife
โบ Relational, transactional or computational
queries
โข Friends of peter who knows VCs in the Bay Area
โข 120 dollars in euros
โข digital camera under 300 dollars
โข world temperature in 2020
Poorly solved information needs remain
Are there even
true keyword
queries?
Users may
have stopped
asking them
9. What itโs like to be a machine?
โตโ๏ถโฤฃ
๏ซโฮโฌโฌลฃฤโโ๏ฑยงยฎรฤคฤชโโ ๏คโฌโโโ
ลฃฤโ ๏คโ๏จ๏ฉ๏ฉ๏ฉ๏ฉ๏ฑ
๏ตโชโโฮฮคฮลจลธรฤรฯฯ ฯฯโ โ โ โซ๏ค๏
โ =โ ยฉยงโ โโชฮฮฮ๏ฑ๏ฑ๏คโ
๏ขโฮโซโ ยฑ๏ฉโโตโ๏ถโฤฃฤฤฤฮผฮปฮบฯฯฯ๏ฎ
๏ฎ๏ฎ๏ฎ๏ตโ๏ฑโฅยฐยถยงฮฅฮฆฮฆฮฆโโโ๏ถ๏ต๏ท๏ท๏ท๏ท๏ท
10. Why Semantic Search? Part II.
๏ง The Semantic Web is now a reality
โบ Emerging agreements around schemas
โข Facebookโs Open Graph Protocol (OGP)
โข Schema.org
โบ Large amounts of data published in RDF
โข As Linked Data
โข Inside HTML pages
โข Inside email text messages
โบ Private Knowledge Graphs inside corporations
๏ง Semantic data exploited by search engines
โบ Better document presentation and ranking
โบ Advanced search functionality
11. Metadata in HTML: schema.org
11
๏ง Agreement on a shared set of schemas for common types of web
content
โบ Bing, Google, and Yahoo! as initial founders (June, 2011), joined by Yandex later
โบ Similar in intent to sitemaps.org
โข Use a single format to communicate the same information to all three search engines
<div vocab="http://schema.org/" typeof="Movie">
<h1 property="name">Pirates of the Carribean: On Stranger Tides (2011)</h1>
<span property="description">Jack Sparrow and Barbossa embark on a quest to
find the elusive fountain of youth, only to discover that Blackbeard and
his daughter are after it too.</span>
Director: <div property="directorโ typeof="Person">
<span property="name">Rob Marshall</span>
</div>
</div>
12. Substantial adoption of schema.org markup
12
๏ง Over 15% of all pages now have schema.org markup
๏ง Over 5 million sites, over 25 billion entity references
๏ง In other words: same order of magnitude as the web
โบ Source: R.V. Guha: Light at the end of the tunnel, ISWC 2013 keynote
๏ง See also
โบ P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012
โข Based on Bing US corpus
โข 31% of webpages, 5% of domains contain some metadata (including Facebookโs OGP)
โบ WebDataCommons
โข Based on CommonCrawl Nov 2013
โข 26% of webpages, 14% of domains contain some metadata (including Facebookโs OGP)
14. ๏ง Def. Semantic Search is any
retrieval method where
โบ User intent and resources are
represented in a semantic model
โข A set of concepts or topics that generalize
over tokens/phrases
โข Additional structure such as a hierarchy
among concepts, relationships among
concepts etc.
โบ Semantic representations of the query
and the user intent are exploited in
some part of the retrieval process
๏ง As a research field
โบ Workshops
โข ESAIR (2008-2014) at CIKM, Semantic
Search (SemSearch) workshop series
(2008-2011) at ESWC/WWW, EOS
workshop (2010-2011) at SIGIR, JIWES
workshop (2012) at SIGIR, Semantic
Search Workshop (2011-2014) at VLDB
โบ Special Issues of journals
โบ Surveys
โข Christos L. Koumenides, Nigel R.
Shadbolt: Ranking methods for entity-
oriented semantic web search.
JASIST 65(6): 1091-1106 (2014)
14
Semantic Search
15. Semantic models: implicit vs. explicit
16
๏ง Implicit/internal semantics
โบ Models of text extracted from a corpus of queries, documents or interaction logs
โข Query reformulation, term dependency models, translation models, topic models, latent space
models, learning to match (PLS)
โบ See
โข Hang Li and Jun Xu: Semantic Matching in Search. Foundations and Trends in Information
Retrieval Vol 7 Issue 5, 2013, pp 343-469
๏ง Explicit/external semantics
โบ Explicit linguistic or ontological structures extracted from text and linked to external
knowledge
โบ Obtained using IE techniques or acquired from Semantic Web markup
16. Entity Linking vs. Entity Retrieval
17
๏ง Entity Linking
โบ Recognizing entities that are explicitly mentioned in queries and linking them to a KB
๏ง Entity Retrieval
โบ Ranking entities in a KB, given a query
โบ Result may not be explicitly mentioned in the query
17. What it is like to be a machine?
โตโ๏ถโฤฃ
๏ซโฮโฌโฌลฃฤโโ๏ฑยงยฎรฤคฤชโโ ๏คโฌโโโ
ลฃฤโ ๏คโ๏จ๏ฉ๏ฉ๏ฉ๏ฉ๏ฑ
๏ตโชโโฮฮคฮลจลธรฤรฯฯ ฯฯโ โ โ โซ๏ค๏
โ =โ ยฉยงโ โโชฮฮฮ๏ฑ๏ฑ๏คโ
๏ขโฮโซโ ยฑ๏ฉโโตโ๏ถโฤฃฤฤฤฮผฮปฮบฯฯฯ๏ฎ
๏ฎ๏ฎ๏ฎ๏ตโ๏ฑโฅยฐยถยงฮฅฮฆฮฆฮฆโโโ๏ถ๏ต๏ท๏ท๏ท๏ท๏ท
20. The role of entities in queries
21
๏ง Entities play an important role
โบ ~70% of queries contain a named entity (entity mention queries) and
~50% of queries have an entity focus (entity seeking queries)
โข brad pitt attacked by fans
โบ ~10% of queries are looking for a class of entities
โข brad pitt movies
โบ See
โข Jeffrey Pound, Peter Mika, Hugo Zaragoza: Ad-hoc object retrieval in the web of data. WWW
2010: 771-780
โข Thomas Lin, Patrick Pantel, Michael Gamon, Anitha Kannan, Ariel Fuxman: Active objects:
actions for entity-centric search. WWW 2012: 589-598
21. Entity linking in queries
๏ง Common structure to entity mention queries:
query = <entity> + <intent>
โบ Intent is typically an additional word or phrase to
โข Disambiguate, e.g. brad pitt actor
โข Specify action or aspect e.g. brad pitt net worth, brad pitt download
๏ง Entity linking in queries
โบ Tutorial: Entity Linking and Retrieval by Edgar Meij, Krisztiรกn Balog and Daan Odijk
โบ Microsoft Entity Linking challenge
โบ Yahoo WebScope dataset L24 - Yahoo Search Query Log To Entities, version 1.0
๏ง Session-level analysis
โบ Recognize entities and intents at the session level
โบ Laura Hollink, Peter Mika, Roi Blanco: Web usage mining with semantic analysis. WWW 2013: 561-570
22. Entity Retrieval
๏ง Keyword search over entity graphs
โบ see Pound et al. WWW08 for a definition
โบ No common benchmark until 2010
๏ง SemSearch Challenge 2010/2011
โข 50 entity-mention queries Selected from the Search Query Tiny Sample v1.0 dataset (Yahoo!
Webscope)
โข Billion Triples Challenge 2009 data set
โข Evaluation using Mechanical Turk
โบ See report:
โข Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson,
Thanh Tran: Repeatable and reliable semantic search evaluation. J. Web Sem. 21: 14-29 (2013)
23. Question Answering
26
๏ง Question Answering over Linked Data competition
โบ 2011-2014
โบ Data
โข Dbpedia and MusicBrainz in RDF
โบ Queries
โข Full natural language questions of different forms, written by the organizers
โข Multi-lingual
โข Give me all actors starring in Batman Begins
โบ Results are defined by an equivalent SPARQL query
โข Systems are free to return list of results or a SPARQL query
26. Exploiting Semantic Web markup
(Yahoo internal prototype, 2007)
Personal and
private
homepage
of the same
person
(clear from the
snippet but it
could be also
automatically
de-duplicated)
Conferences
he plans to attend
and his vacations
from homepage
plus bio events
from LinkedIn
Geolocation
27. Search snippets using Semantic Web markup
๏ง Summarization of HTML is a hard task
โข Template detection
โข Selecting relevant snippets
โข Composing readable text
โบ Efficiency constraints
๏ง Yahoo SearchMonkey (2008)
โบ Enhanced results using structured data from the page
โข Key/value pairs
โข Deep links
โข Image or Video
28. Effectiveness of enhanced results (Yahoo)
๏ง Explicit user feedback
โบ Side-by-side editorial evaluation (A/B testing)
โข Editors are shown a traditional search result and enhanced result for the same page
โข Users prefer enhanced results in 84% of the cases and traditional results in 3% (N=384)
๏ง Implicit user feedback
โบ Click-through rate analysis
โข Long dwell time limit of 100s (Ciemiewicz et al. 2010)
โข 15% increase in โgoodโ clicks
โบ User interaction model
โข Enhanced results lead users to relevant documents
โ even though less likely to clicked than textual results
โข Enhanced results effectively reduce bad clicks!
๏ง See
โบ Kevin Haas, Peter Mika, Paul Tarjan, Roi Blanco: Enhanced results for web search. SIGIR 2011:
725-734
29. Enhanced results at other search providers
๏ง Google announces Rich Snippets - June, 2009
โบ Faceted search for recipes - Feb, 2011
๏ง Bing tiles โ Feb, 2011
๏ง Facebookโs Like button and the Open Graph Protocol (2010)
โบ Shows up in profiles and news feed
โบ Site owners can later reach users who have liked an object
30. Moving beyond entity markup
33
๏ง We would like to help our users in task completion
โบ But we have trained our users to talk in nouns
โข Retrieval performance decreases by adding verbs to queries
โบ Markup for actions/intents could potentially help
๏ง Modeling actions
โบ Understand what actions can be taken on a page
โบ Help users in mapping their query to potential actions
โบ Applications in web search, email etc.
THING
THING
Schema.org v1.2
including Actions
vocabulary
published
April 16, 2014
32. Personalized content and native ads (Yahoo)
๏ง User profiling based on entities recognized in the content consumed
๏ง News and ads personalized to the user
33. ๏ง Entity retrieval
โบ Which entity does a keyword query
refer to, if any?
๏ง Related entities
โบ Which entity would the user visit next?
โข Roi Blanco, B. Barla Cambazoglu, Peter
Mika, Nicolas Torzec:
Entity Recommendations in Web Search.
ISWC 2013
Entity displays in web search
(Bing/Google/Yahoo)
35. โmy friends, who is member of queenโ
{band}
[id:Queen1]
Queen1
queen
[member-of-v]
is member of
member()
member
[member-vp]
is member of [id:1]
member(x,Queen1)
[who]
who
-
friends
[user-filter]
who is member of [id:1]
member(x,Queen1)
[start]
my friends, who is member of [id:Queen1]
friends(x,me), member(x,Queen1)
[user-head]
my friends
friends(x,me)
Grammar: set of production rules,
capturing all possible connections,
i.e. the search space of all parse
trees
[start] ๏ [users]
[users] ๏ my friends
friends(x, me)
[โฆ] ๏ is member of [bands]
member(x, $1)
[bands] ๏ {band}
$1
โฆ
Grammar-based Query
Translation: which combination of
production rules results in a parse
tree that connects the recognized
entities and relationships?
Relational Search (Facebook Graph Search)
36. Sem. Auto-completion
- Entity + relationships
- Multi-source
- Domain-independent
- Low manual effort
Freddie Mercury
Brian
May
Queen
Queen Elizabeth 1
Liar 197
1
single
PersonArtist Single
writer
Query Translation
Semantic Search (Graphinder)
37. Freddie
Mercury Queen
Queen
Elizabeth 1 single
Singlewriter
single from freddy mercury que
Data
Index
Schema
Index
Keyword Interpretation
- Imprecise / fuzzy matching
- Match every keyword
Token rewriting via syntactic distance
Relational Query Rewriting
1) single from freddie mercury queen
โฆ
Token rewriting via semantic distance
1) single writer freddie mercury queen
โฆ
Freddie
Mercury Queen
Singlewriter
Data
Index
Schema
Index
Query segmentation
1) single writer โfreddie mercuryโ queen
โฆ
Result Retrieval & Ranking
Keyword / Key Phrase Interpretation:
- Precise matching
- Match keyword and key phrases
Benefits:
- Higher selectivity of query terms (quality)
- Reduced number of query terms (efficiency)
- Better search experienceโฆ
Challenges: many rewrite candidates, some are
semantically not โvalidโ in the relational setting
single (marital status) writer โfreddie mercuryโ queen (the
queen of UK)
Relational Query Rewriting (Graphinder)
41. Beyond Web search: mobile interaction
46
๏ง Interaction
โบ Question-answering
โบ Support for interactive retrieval
โบ Spoken-language access
โบ Task completion
๏ง Contextualization
โบ Personalization
โบ Geo
โบ Context (work/home/travel)
โข Try getaviate.com
42. Interactive, conversational voice search
๏ง Parlance EU project
โบ Complex dialogs within a domain
โข Requires complete semantic understanding
๏ง Complete system (mixed license)
โบ Automated Speech Recognition (ASR)
โบ Spoken Language Understanding (SLU)
โบ Interaction Management
โบ Knowledge Base
โบ Natural Language Generation (NLG)
โบ Text-to-Speech (TTS)
๏ง Video
43. Conclusions
48
๏ง Semantic Search
โบ Explicit understanding for queries and documents
through links to external knowledge
โข Using methods of Information Extraction or
explicit annotations (markup) in webpages
โข Semantic Web as a source of external knowledge
๏ง Increasing level of understanding
โบ Early focus on entities and their attributes
โข Applications in web search: rich results,
entity displays, entity recommendation
โบ Moving toward modeling intents/actions
โบ Adding human-like interaction