For the technically oriented reader, this brief paper describes the technical foundation of the Knowledge Correlation Search Engine - patented by Make Sence, Inc.
Livio Costantini Tovek presented on tools for accessing unstructured information including Tovek Tools, an enterprise search engine and analytical system. The presentation covered basic information retrieval concepts, the Verity Query Language, and Topic Trees which allow searching for concepts through a predefined hierarchical structure defined by subject experts. Topic Trees address the semantic ambiguity of text by establishing relationships between keywords and providing rules for evaluating documents.
This document provides an overview of text mining and web mining. It defines data mining and describes the common data mining tasks of classification, clustering, association rule mining and sequential pattern mining. It then discusses text mining, defining it as the process of analyzing unstructured text data to extract meaningful information and structure. The document outlines the seven practice areas of text mining as search/information retrieval, document clustering, document classification, web mining, information extraction, natural language processing, and concept extraction. It provides brief descriptions of the problems addressed within each practice area.
Tovek Tools provides software for discovering information hidden in textual data. It was founded in 1993 in the Czech Republic to help users find, understand, and utilize information through advanced search and analysis tools. Tovek Tools includes desktop applications like Tovek Agent for querying indexes and viewing results as well as a server product for automatically indexing and profiling content from various sources in real-time. The goal is to help analysts reduce the time spent searching, analyzing, and disseminating information from both structured and unstructured data sources.
This document provides an overview of information retrieval models, including vector space models, TF-IDF, Doc2Vec, and latent semantic analysis. It begins with basic concepts in information retrieval like document indexing and relevance scoring. Then it discusses vector space models and how documents and queries are represented as vectors. TF-IDF weighting is explained as assigning higher weight to rare terms. Doc2Vec is introduced as an extension of word2vec to learn document embeddings. Latent semantic analysis uses singular value decomposition to project documents to a latent semantic space. Implementation details and examples are provided for several models.
Automatic indexing is the process of analyzing documents to extract information to be included in an index. This can be done through statistical, natural language, concept-based, or hypertext linkage techniques. Statistical techniques are the most common, identifying words and phrases to index documents. Natural language techniques perform additional parsing of text. Concept indexing correlates words to concepts, while hypertext linkages create connections between documents. The goal of automatic indexing is to preprocess documents to allow for relevant search results by representing concepts in the index.
This document provides a full syllabus with questions and answers related to the course "Information Retrieval" including definitions of key concepts, the historical development of the field, comparisons between information retrieval and web search, applications of IR, components of an IR system, and issues in IR systems. It also lists examples of open source search frameworks and performance measures for search engines.
The document discusses information retrieval models. It describes the Boolean retrieval model, which represents documents and queries as sets of terms combined with Boolean operators. Documents are retrieved if they satisfy the Boolean query, but there is no ranking of results. The Boolean model has limitations including difficulty expressing complex queries, controlling result size, and ranking results. It works best for simple, precise queries when users know exactly what they are searching for.
Automated indexing involves both human and machine efforts, with humans performing intellectual indexing and machines performing mechanical indexing. Automatic indexing is performed entirely by machine. While automated indexing may result in superior intellectual indexing compared to automatic, automatic indexing is more time and cost efficient due to the complete mechanization of the process. However, automatic indexing also provides more consistent results than automated indexing since human indexers can be inconsistent.
The document discusses different theories used in information retrieval systems. It describes cognitive or user-centered theories that model human information behavior and structural or system-centered theories like the vector space model. The vector space model represents documents and queries as vectors of term weights and compares similarities between queries and documents. It was first used in the SMART information retrieval system and involves assigning term vectors and weights to documents based on relevance.
The document provides an overview of the key components and objectives of an information retrieval system. It discusses how an IR system aims to minimize the time a user spends locating needed information by facilitating search generation, presenting search results in a relevant order, and processing incoming documents through normalization, indexing, and selective dissemination to users. The major measures of an IR system's effectiveness are precision and recall.
Content Analyst - Conceptualizing LSI Based Text Analytics White PaperJohn Felahi
- The document discusses the evolution of text analytics technologies from early keyword indexing to more advanced mathematical approaches like latent semantic indexing (LSI).
- It explains that early keyword indexing focused only on word frequencies and occurrences, which could lead to false positives and did not capture the conceptual meaning of documents.
- More advanced approaches like LSI use linear algebraic calculations to analyze word co-occurrences across large document sets and derive the conceptual relationships between terms and topics in a way that better mirrors human understanding.
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibEl Habib NFAOUI
This document provides an overview of conceptual foundations and preprocessing steps for text mining. It discusses the differences between syntax and semantics in text, and presents a general framework for text analytics including preprocessing, representation, and knowledge discovery. For text representation, it describes bag-of-words models and vector space models, including frequency vectors, one-hot encoding, and TF-IDF weighting. It also provides an introduction to n-grams for representing sequential data.
This document provides an overview of information retrieval models. It begins with definitions of information retrieval and how it differs from data retrieval. It then discusses the retrieval process and logical representations of documents. A taxonomy of IR models is presented including classic, structured, and browsing models. Boolean, vector, and probabilistic models are explained as examples of classic models. The document concludes with descriptions of ad-hoc retrieval and filtering tasks and formal characteristics of IR models.
Comparison of Semantic and Syntactic Information Retrieval System on the basi...Waqas Tariq
In this paper information retrieval system for local databases are discussed. The approach is to search the web both semantically and syntactically. The proposal handles the search queries related to the user who is interested in the focused results regarding a product with some specific characteristics. The objective of the work will be to find and retrieve the accurate information from the available information warehouse which contains related data having common keywords. This information retrieval system can eventually be used for accessing the internet also. Accuracy in information retrieval that is achieving both high precision and recall is difficult. So both semantic and syntactic search engine are compared for information retrieval using two parameters i.e. precision and recall.
This document provides an overview of an information retrieval course. The course will cover topics related to information retrieval models, techniques, and systems. Students will complete exams, assignments, and a major project to build a search engine using both text-based and semantic retrieval techniques. The document defines key concepts in information retrieval and discusses different types of information retrieval systems and techniques.
An information retrieval system provides search and browse capabilities to help users locate relevant information. Search capabilities allow Boolean logic, proximity, phrase matching, fuzzy searches, masking, numeric ranges, concept expansion, and natural language queries. Browse capabilities help users evaluate search results and focus on potentially relevant items through ranking, zoning of display fields, and highlighting of search terms.
Recent analysis of litigation outcomes suggests that nearly half of the patents litigated to judgment were held invalid. Commonly available patent search software is predominantly keyword based and takes a “one-size-fits-all” approach leaving much to be desired from a practitioner’s perspective. We discuss opportunities for using text mining and information retrieval in the domain of patent litigation. We focus on post-grant inter partes review process, where a company can challenge the validity of an issued patent in order, for example, to protect its product from being viewed as infringing on the patent in question. We discuss both possibilities and obstacles to assistance with such a challenge using a text analytic solution. A range of issues need to be overcome for semantic search and analytic solutions to be of value, ranging from text normalization, support for semantic and faceted search, to predictive analytics. In this context, we evaluate our novel and top performing semantic search solution. For experiments, we use data from the database USPTO Final Decisions of the Patent Trial and Appeal Board. Our experiments and analysis point to limitations of generic semantic search and text analysis tools. We conclude by presenting some research ideas that might help overcome these deficiencies, such as interactive, semantic search, support for a multi- stage approach that distinguishes between a divergent and convergent mode of operation and textual entailment.
Information retrieval (IR) is the process of searching for and retrieving relevant documents from a large collection based on a user's query. Key aspects of IR include:
- Representing documents and queries in a way that allows measuring their similarity, such as the vector space model.
- Ranking retrieved documents by relevance to the query using factors like term frequency and inverse document frequency.
- Allowing for similarity-based retrieval where documents similar to a given document are retrieved.
This document provides an overview of an information retrieval system. It defines an information retrieval system as a system capable of storing, retrieving, and maintaining information such as text, images, audio, and video. The objectives of an information retrieval system are to minimize the overhead for a user to locate needed information. The document discusses functions like search, browse, indexing, cataloging, and various capabilities to facilitate querying and retrieving relevant information from the system.
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...ISAR Publications
Mobile search engine is a meta search engine that imprisonments the user’s favorite in
the form of concepts by mining their click through data. But the search query is limited to small
words unlike those used when interacting with search engines through computers. It has become
popular because of presence of huge number of applications. Smartphone’s carry large amount of
personal information, such as user’s personal details, contacts, messages, emails, credit card
information, etc. User type specific search and finally Ontology based Search. Moreover opinion
mining is conducted to provide feedback and valuable suggestions given by the mobile users. Due
to the different characteristics of the content concepts and location concepts, use different
techniques for their concept extraction and ontology formulation. Moreover the individual users
can use this search engine, which runs on android platform. They can give feedbacks and
suggestions about the search result. Based on the feedback other users can get valuable
information about the services available in their location or nearby location.
The document discusses different types of information retrieval systems such as traditional query-based systems, text categorization systems, text routing systems, and text filtering systems. It also describes some common techniques used in information retrieval systems like inverted indexing, stopword removal, stemming, and vector space models. Finally, it discusses opportunities for integrating information retrieval techniques with natural language processing to develop more accurate and effective retrieval systems.
1. The document describes a patent application for phrase-based indexing in information retrieval systems. It involves identifying phrases in documents, indexing documents based on these phrases, ranking documents based on phrase matching, and using phrases to generate document descriptions.
2. Phrases are identified based on their ability to predict other related phrases. Documents are indexed with lists of the phrases they contain. Ranking considers how well document phrases match query phrases.
3. The system can identify related phrases and extensions when searching, detect duplicate and spam documents, and generate snippets for search results using highly ranked sentences.
This document provides an overview of information retrieval systems, including their definition, objectives, and key functional processes. An information retrieval system aims to minimize the time and effort users spend locating needed information by supporting search generation, presenting relevant results, and allowing iterative refinement of searches. The major functional processes involve normalizing input items, selectively disseminating new items to users, searching archived documents and user-created indexes. Information retrieval systems differ from database management systems in their handling of unstructured text-based information rather than strictly structured data.
Information retrieval is concerned with searching for documents and metadata about documents. Documents contain information to be retrieved. There is overlap between terms like data retrieval, document retrieval, information retrieval, and text retrieval. Automated information retrieval systems are used to reduce information overload. Libraries and universities use IR systems to provide access to materials. Web search engines are a prominent example of IR applications. The idea of using computers for information retrieval was popularized in 1945. Early automated systems emerged in the 1950s and large-scale systems in the 1970s such as the Lockheed Dialog system. Many measures exist for evaluating IR system performance including precision, recall, and precision-recall curves.
Information Retrieval on Text using Concept Similarityrahulmonikasharma
This document summarizes a research paper on concept-based information retrieval using semantic analysis and WordNet. It discusses some of the challenges with keyword-based retrieval, such as synonymy and polysemy problems. Concept-based retrieval aims to address these issues by mapping documents and queries to semantic concepts rather than keywords. The paper proposes extracting concepts from text documents using WordNet to identify synonyms, hypernyms and hyponyms. It involves calculating term frequencies to determine a hierarchy of important concepts. The methodology is implemented using Java and WordNet to extract concepts from sample input documents.
Konsep Dasar Information Retrieval - Edi faizal EdiFaizal2
This document discusses key concepts in information retrieval including the differences between information retrieval, recommender systems, and search engines. It also covers different types of information retrieval models such as set theoretic, algebraic, probabilistic, classical, non-classical, and alternative models. The document will next cover topics related to preparing information retrieval systems like crawling, indexing, natural language processing, and text representation.
Stress and publicity dr. shriniwas kashalikarBadar Daimi
This document discusses the stress that can come from seeking fame and popularity. While fame can boost pride temporarily, it often leads to depression when it fades and can traumatically disillusion. The wise suggest that true happiness comes from within and not from outside validation. The practice of NAMASMARAN is said to make one buoyant and fulfilled from within, but it requires immense commitment that is difficult even when basic needs are met. NAMASMARAN encompasses the core of the universe, so it is not about petty achievements but blossoming one's personality such that their soul's fragrance encompasses the universe, preempting dependence on fame. One's enlightenment through NAMASMARAN is universal
The document discusses three main points about namasmaran or remembering God. The first point is that forgetting yourself in memory of God means becoming free from obsessive thoughts about oneself, which namasmaran can help with. The second point is that giving up relationships to focus only on God means developing a more objective view of relationships over time through namasmaran. The third point is that different philosophical perspectives can be harsh, so it's important to ask questions, seek answers through one's own experiences, and verify answers through sadhana like namasmaran with patience as in a scientific experiment.
The document provides information about registering and participating in a Living Nativity performance at Saints John and Paul church. It outlines the practice dates in November and December, and describes the roles for middle school, elementary, and preschool students. Middle schoolers will have main roles and receive service hours. Elementary students will portray animals and younger students will be angels and sing songs. The director encourages participation and hopes it will be a fun faith-building experience for the children.
Developing for Next Gen Identity ServicesForgeRock
The document summarizes a presentation given at the 2013 Open Stack Identity Summit in France. It discusses ForgeRock's development of a next generation identity services product suite with a common REST API and user interfaces built using modern front-end frameworks. ForgeRock aims to provide a standardized set of operations for managing users, groups, and other identity resources through RESTful services.
The document discusses different theories used in information retrieval systems. It describes cognitive or user-centered theories that model human information behavior and structural or system-centered theories like the vector space model. The vector space model represents documents and queries as vectors of term weights and compares similarities between queries and documents. It was first used in the SMART information retrieval system and involves assigning term vectors and weights to documents based on relevance.
The document provides an overview of the key components and objectives of an information retrieval system. It discusses how an IR system aims to minimize the time a user spends locating needed information by facilitating search generation, presenting search results in a relevant order, and processing incoming documents through normalization, indexing, and selective dissemination to users. The major measures of an IR system's effectiveness are precision and recall.
Content Analyst - Conceptualizing LSI Based Text Analytics White PaperJohn Felahi
- The document discusses the evolution of text analytics technologies from early keyword indexing to more advanced mathematical approaches like latent semantic indexing (LSI).
- It explains that early keyword indexing focused only on word frequencies and occurrences, which could lead to false positives and did not capture the conceptual meaning of documents.
- More advanced approaches like LSI use linear algebraic calculations to analyze word co-occurrences across large document sets and derive the conceptual relationships between terms and topics in a way that better mirrors human understanding.
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibEl Habib NFAOUI
This document provides an overview of conceptual foundations and preprocessing steps for text mining. It discusses the differences between syntax and semantics in text, and presents a general framework for text analytics including preprocessing, representation, and knowledge discovery. For text representation, it describes bag-of-words models and vector space models, including frequency vectors, one-hot encoding, and TF-IDF weighting. It also provides an introduction to n-grams for representing sequential data.
This document provides an overview of information retrieval models. It begins with definitions of information retrieval and how it differs from data retrieval. It then discusses the retrieval process and logical representations of documents. A taxonomy of IR models is presented including classic, structured, and browsing models. Boolean, vector, and probabilistic models are explained as examples of classic models. The document concludes with descriptions of ad-hoc retrieval and filtering tasks and formal characteristics of IR models.
Comparison of Semantic and Syntactic Information Retrieval System on the basi...Waqas Tariq
In this paper information retrieval system for local databases are discussed. The approach is to search the web both semantically and syntactically. The proposal handles the search queries related to the user who is interested in the focused results regarding a product with some specific characteristics. The objective of the work will be to find and retrieve the accurate information from the available information warehouse which contains related data having common keywords. This information retrieval system can eventually be used for accessing the internet also. Accuracy in information retrieval that is achieving both high precision and recall is difficult. So both semantic and syntactic search engine are compared for information retrieval using two parameters i.e. precision and recall.
This document provides an overview of an information retrieval course. The course will cover topics related to information retrieval models, techniques, and systems. Students will complete exams, assignments, and a major project to build a search engine using both text-based and semantic retrieval techniques. The document defines key concepts in information retrieval and discusses different types of information retrieval systems and techniques.
An information retrieval system provides search and browse capabilities to help users locate relevant information. Search capabilities allow Boolean logic, proximity, phrase matching, fuzzy searches, masking, numeric ranges, concept expansion, and natural language queries. Browse capabilities help users evaluate search results and focus on potentially relevant items through ranking, zoning of display fields, and highlighting of search terms.
Recent analysis of litigation outcomes suggests that nearly half of the patents litigated to judgment were held invalid. Commonly available patent search software is predominantly keyword based and takes a “one-size-fits-all” approach leaving much to be desired from a practitioner’s perspective. We discuss opportunities for using text mining and information retrieval in the domain of patent litigation. We focus on post-grant inter partes review process, where a company can challenge the validity of an issued patent in order, for example, to protect its product from being viewed as infringing on the patent in question. We discuss both possibilities and obstacles to assistance with such a challenge using a text analytic solution. A range of issues need to be overcome for semantic search and analytic solutions to be of value, ranging from text normalization, support for semantic and faceted search, to predictive analytics. In this context, we evaluate our novel and top performing semantic search solution. For experiments, we use data from the database USPTO Final Decisions of the Patent Trial and Appeal Board. Our experiments and analysis point to limitations of generic semantic search and text analysis tools. We conclude by presenting some research ideas that might help overcome these deficiencies, such as interactive, semantic search, support for a multi- stage approach that distinguishes between a divergent and convergent mode of operation and textual entailment.
Information retrieval (IR) is the process of searching for and retrieving relevant documents from a large collection based on a user's query. Key aspects of IR include:
- Representing documents and queries in a way that allows measuring their similarity, such as the vector space model.
- Ranking retrieved documents by relevance to the query using factors like term frequency and inverse document frequency.
- Allowing for similarity-based retrieval where documents similar to a given document are retrieved.
This document provides an overview of an information retrieval system. It defines an information retrieval system as a system capable of storing, retrieving, and maintaining information such as text, images, audio, and video. The objectives of an information retrieval system are to minimize the overhead for a user to locate needed information. The document discusses functions like search, browse, indexing, cataloging, and various capabilities to facilitate querying and retrieving relevant information from the system.
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...ISAR Publications
Mobile search engine is a meta search engine that imprisonments the user’s favorite in
the form of concepts by mining their click through data. But the search query is limited to small
words unlike those used when interacting with search engines through computers. It has become
popular because of presence of huge number of applications. Smartphone’s carry large amount of
personal information, such as user’s personal details, contacts, messages, emails, credit card
information, etc. User type specific search and finally Ontology based Search. Moreover opinion
mining is conducted to provide feedback and valuable suggestions given by the mobile users. Due
to the different characteristics of the content concepts and location concepts, use different
techniques for their concept extraction and ontology formulation. Moreover the individual users
can use this search engine, which runs on android platform. They can give feedbacks and
suggestions about the search result. Based on the feedback other users can get valuable
information about the services available in their location or nearby location.
The document discusses different types of information retrieval systems such as traditional query-based systems, text categorization systems, text routing systems, and text filtering systems. It also describes some common techniques used in information retrieval systems like inverted indexing, stopword removal, stemming, and vector space models. Finally, it discusses opportunities for integrating information retrieval techniques with natural language processing to develop more accurate and effective retrieval systems.
1. The document describes a patent application for phrase-based indexing in information retrieval systems. It involves identifying phrases in documents, indexing documents based on these phrases, ranking documents based on phrase matching, and using phrases to generate document descriptions.
2. Phrases are identified based on their ability to predict other related phrases. Documents are indexed with lists of the phrases they contain. Ranking considers how well document phrases match query phrases.
3. The system can identify related phrases and extensions when searching, detect duplicate and spam documents, and generate snippets for search results using highly ranked sentences.
This document provides an overview of information retrieval systems, including their definition, objectives, and key functional processes. An information retrieval system aims to minimize the time and effort users spend locating needed information by supporting search generation, presenting relevant results, and allowing iterative refinement of searches. The major functional processes involve normalizing input items, selectively disseminating new items to users, searching archived documents and user-created indexes. Information retrieval systems differ from database management systems in their handling of unstructured text-based information rather than strictly structured data.
Information retrieval is concerned with searching for documents and metadata about documents. Documents contain information to be retrieved. There is overlap between terms like data retrieval, document retrieval, information retrieval, and text retrieval. Automated information retrieval systems are used to reduce information overload. Libraries and universities use IR systems to provide access to materials. Web search engines are a prominent example of IR applications. The idea of using computers for information retrieval was popularized in 1945. Early automated systems emerged in the 1950s and large-scale systems in the 1970s such as the Lockheed Dialog system. Many measures exist for evaluating IR system performance including precision, recall, and precision-recall curves.
Information Retrieval on Text using Concept Similarityrahulmonikasharma
This document summarizes a research paper on concept-based information retrieval using semantic analysis and WordNet. It discusses some of the challenges with keyword-based retrieval, such as synonymy and polysemy problems. Concept-based retrieval aims to address these issues by mapping documents and queries to semantic concepts rather than keywords. The paper proposes extracting concepts from text documents using WordNet to identify synonyms, hypernyms and hyponyms. It involves calculating term frequencies to determine a hierarchy of important concepts. The methodology is implemented using Java and WordNet to extract concepts from sample input documents.
Konsep Dasar Information Retrieval - Edi faizal EdiFaizal2
This document discusses key concepts in information retrieval including the differences between information retrieval, recommender systems, and search engines. It also covers different types of information retrieval models such as set theoretic, algebraic, probabilistic, classical, non-classical, and alternative models. The document will next cover topics related to preparing information retrieval systems like crawling, indexing, natural language processing, and text representation.
Stress and publicity dr. shriniwas kashalikarBadar Daimi
This document discusses the stress that can come from seeking fame and popularity. While fame can boost pride temporarily, it often leads to depression when it fades and can traumatically disillusion. The wise suggest that true happiness comes from within and not from outside validation. The practice of NAMASMARAN is said to make one buoyant and fulfilled from within, but it requires immense commitment that is difficult even when basic needs are met. NAMASMARAN encompasses the core of the universe, so it is not about petty achievements but blossoming one's personality such that their soul's fragrance encompasses the universe, preempting dependence on fame. One's enlightenment through NAMASMARAN is universal
The document discusses three main points about namasmaran or remembering God. The first point is that forgetting yourself in memory of God means becoming free from obsessive thoughts about oneself, which namasmaran can help with. The second point is that giving up relationships to focus only on God means developing a more objective view of relationships over time through namasmaran. The third point is that different philosophical perspectives can be harsh, so it's important to ask questions, seek answers through one's own experiences, and verify answers through sadhana like namasmaran with patience as in a scientific experiment.
The document provides information about registering and participating in a Living Nativity performance at Saints John and Paul church. It outlines the practice dates in November and December, and describes the roles for middle school, elementary, and preschool students. Middle schoolers will have main roles and receive service hours. Elementary students will portray animals and younger students will be angels and sing songs. The director encourages participation and hopes it will be a fun faith-building experience for the children.
Developing for Next Gen Identity ServicesForgeRock
The document summarizes a presentation given at the 2013 Open Stack Identity Summit in France. It discusses ForgeRock's development of a next generation identity services product suite with a common REST API and user interfaces built using modern front-end frameworks. ForgeRock aims to provide a standardized set of operations for managing users, groups, and other identity resources through RESTful services.
Este documento presenta lineamientos sobre la educación sexual para niños. Aborda temas como la importancia de la educación sexual, las etapas en las que debe enfocarse (primera infancia, segunda infancia, adolescencia) y los temas claves a tratar en cada etapa como la anatomía, la pubertad y las relaciones. Recomienda que los padres sean los principales responsables de brindar esta educación para proteger la salud e integridad de los niños.
An Industry Overview: Enterprise Risk Services and Productss0P5a41b
The document provides an overview of the enterprise risk management industry. It discusses how recent events like the global recession and BP oil spill have brought risk management to the forefront for companies. It describes the four categories of enterprise risk: hazard, operational, financial, and strategic. It explains that enterprise risk management aims to identify, analyze, and monitor risks in order to implement internal controls. Overall, the document outlines the enterprise risk management field and discusses the roles of risk personnel, software providers, and how companies approach risk management.
A trihybrid cross involves three traits and uses a Punnett square with 64 boxes to demonstrate that Mendel's principles of segregation and independent assortment apply to the inheritance of multiple traits. The document discusses using forked-line and branch diagram methods to break down a trihybrid cross into a series of monohybrid crosses in order to calculate genotypic and phenotypic ratios.
Extracting and Reducing the Semantic Information Content of Web Documents to ...ijsrd.com
This document discusses various techniques for semantic document retrieval and summarization. It begins by introducing the challenges of semantic search and techniques like word sense disambiguation that aim to improve search relevance. It then discusses using ontologies and semantic networks to reduce the semantic content of documents in order to support semantic document retrieval. Finally, it proposes using relationship-based data cleaning techniques to disambiguate references between entities by analyzing their features and relationships.
Context Based Indexing in Search Engines Using Ontology: Reviewiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
This document proposes a context-based indexing structure for search engines using ontology. It discusses how current search engines index documents based on terms, but not context. The proposed approach extracts contexts from documents using a context repository, thesaurus and ontology repository. Documents are then indexed based on these contexts rather than terms. An algorithm is presented where documents are first preprocessed and contexts extracted. Contexts are matched to an ontology to determine the specific document context. The index contains three fields - context, related terms, and document IDs. This allows searching the index based on context provided in the query, improving search quality by returning more relevant documents.
The whitepaper addresses the challenges in the data–driven organizations, medical research and health care. It summarizes how the context-enabled and semantic enrichment can transform the traditional method to search optimum data. 3RDi has advanced content enrichment with Named Entity Recognition, Semantic similarity, Content classification and Content summarization. Get the right data at the right time that helps medical researchers and health care practitioners.
Semantic Knowledge Representation for Information Retrieval Winfried Gödertjibinokkas
Semantic Knowledge Representation for Information Retrieval Winfried Gödert
Semantic Knowledge Representation for Information Retrieval Winfried Gödert
Semantic Knowledge Representation for Information Retrieval Winfried Gödert
The document discusses web-based information retrieval and summarizes some key challenges, including: managing large amounts of hyperlinked web pages, crawling the web to find relevant sites to index, and measuring the quality and authority of information. It also covers techniques for text representation in information retrieval systems, including the inverted file approach and using probability methods.
An Improved Web Explorer using Explicit Semantic Similarity with ontology and...INFOGAIN PUBLICATION
The Improved Web Explorer aims at extraction and selection of the best possible hyperlinks and retrieving more accurate search results for the entered search query. The hyperlinks that are more preferable to the entered search query are evaluated by taking into account weighted values of frequencies of words in search string that are present in anchor texts and plain texts available in title and body tags of various hyperlink pages respectively to retrieve relevant hyperlinks from all available links. Then the concept of ontology is used to gain insights of words in search string by finding their hypernyms, hyponyms and synsets to reach to the source and context of the words in search string. The Explicit Semantic Similarity analysis along with Naïve Bayes method is used to find the semantic similarity between lexically different terms using Wikipedia and Google as explicit semantic analysis tools and calculating the probabilities of occurrence of words in anchor and body texts .Vector Space Model is being used to calculate Term frequency and Inverse document frequency values, and then calculate cosine similarities between the entered Search query and extracted relevant hyperlinks to get the most appropriate relevance wise ranked search results to the entered search string
Classification of News and Research Articles Using Text Pattern MiningIOSR Journals
This document summarizes a research paper that proposes a method for classifying news and research articles using text pattern mining. The method involves preprocessing text to remove stop words and perform stemming. Frequent and closed patterns are then discovered from the preprocessed text. These patterns are structured into a taxonomy and deployed to classify new documents. The method also involves evolving patterns by reshuffling term supports within patterns to reduce the effects of noise from negative documents. Over 80% of documents were successfully classified using this pattern-based approach.
Chapter 1: Introduction to Information Storage and Retrievalcaptainmactavish1996
Course material for 3rd year Information Technology students. Information Storage and Retrieval Course. Chapter 1: Introduction to Information storage and retrieval
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
Abstract—The Information retrieval system is taking an important role in current search engine which performs searching operation based on keywords which results in an enormous amount of data available to the user, from which user cannot figure out the essential and most important information. This limitation may be overcome by a new web architecture known as the semantic web which overcome the limitation of the keyword based search technique called the conceptual or the semantic search technique. Natural language processing technique is mostly implemented in a QA system for asking user’s questions and several steps are also followed for conversion of questions to the query form for retrieving an exact answer. In conceptual search, search engine interprets the meaning of the user’s query and the relation among the concepts that document contains with respect to a particular domain that produces specific answers instead of showing lists of answers. In this paper, we proposed the ontology based semantic information retrieval system and the Jena semantic web framework in which, the user enters an input query which is parsed by Standford Parser then the triplet extraction algorithm is used. For all input queries, the SPARQL query is formed and further, it is fired on the knowledge base (Ontology) which finds appropriate RDF triples in knowledge base and retrieve the relevant information using the Jena framework.
This document describes a proposed concept-based mining model that aims to improve document clustering and information retrieval by extracting concepts and semantic relationships rather than just keywords. The model uses natural language processing techniques like part-of-speech tagging and parsing to extract concepts from text. It represents concepts and their relationships in a semantic network and clusters documents based on conceptual similarity rather than term frequency. The model is evaluated using singular value decomposition to increase the precision of key term and phrase extraction.
Henry stewart dam2010_taxonomicsearch_markohurstWIKOLO
Marko Hurst presented on leveraging taxonomy and metadata for superior search relevancy. He defined taxonomy as hierarchical relationships between categories and subcategories, metadata as data that describes other data, and ontology as associative relationships between concepts. Hurst explained that taxonomy can aid search by restricting it to relevant categories, expanding it to related terms through synonyms and mappings, and providing did-you-mean suggestions. Leveraging both taxonomy and semantic search provides the best results, while taxonomy alone allows searching across metadata and obscure relationships not found through pure text searches.
International Journal of Computational Engineering Research(IJCER) ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Semantic Annotation: The Mainstay of Semantic WebEditor IJCATR
Given that semantic Web realization is based on the critical mass of metadata accessibility and the representation of data with formal
knowledge, it needs to generate metadata that is specific, easy to understand and well-defined. However, semantic annotation of the
web documents is the successful way to make the Semantic Web vision a reality. This paper introduces the Semantic Web and its
vision (stack layers) with regard to some concept definitions that helps the understanding of semantic annotation. Additionally, this
paper introduces the semantic annotation categories, tools, domains and models
Searching and Analyzing Qualitative Data on Personal ComputerIOSR Journals
This document presents the design and implementation of a desktop search system using Lucene. It describes the key components of indexing, analyzing text, storing indexes, and searching. For indexing, it discusses how documents are preprocessed, tokenized, and stored in an inverted index. For searching, it explains how queries are analyzed and the index is searched to return results. The system allows users to search for files on their personal computer. It includes a user interface to input queries and view results. Lucene provides an open-source toolkit to add full-text search capabilities to applications.
This document describes a method for enriching search results using ontology. It begins with an abstract discussing how keyword searches often return irrelevant documents due to the large amount of information available online. It then introduces the concept of using ontology to allow for more sophisticated semantic searches. The paper presents an architecture that augments keyword search results with additional documents that are semantically relevant based on ontology mappings. Documents in the search results are then ranked based on both keyword frequency and semantic relevance to improve search accuracy.
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...IJwest
This document describes a proposed system for automatic semantic annotation of web documents based on ontology elements and relationships. It begins with an introduction to semantic web and annotation. The proposed system architecture matches topics in text to entities in an ontology document. It utilizes WordNet as a lexical ontology and ontology resources to extract knowledge from text and generate annotations. The main components of the system include a text analyzer, ontology parser, and knowledge extractor. The system aims to automatically generate metadata to improve information retrieval for non-technical users.
Semantic Search of E-Learning Documents Using Ontology Based Systemijcnes
The keyword searching mechanism is traditionally used for information retrieval from Web based systems. However, this system fails to meet the requirements in Web searching of the expert knowledge base based on the popular semantic systems. Semantic search of E-learning documents based on ontology is increasingly adopted in information retrieval systems. Ontology based system simplifies the task of finding correct information on the Web by building a search system based on the meaning of keyword instead of the keyword itself. The major function of the ontology based system is the development of specification of conceptualization which enhances the connection between the information present in the Web pages with that of the background knowledge.The semantic gap existing between the keyword found in documents and those in query can be matched suitably using Ontology based system. This paper provides a detailed account of the semantic search of E-learning documents using ontology based system by making comparison between various ontology systems. Based on this comparison, this survey attempts to identify the possible directions for future research.
This document proposes a BOT virtual guide that will extract educational web content based on topics recently taught using web crawling techniques. It will use a domain ontology, DOM parsing, and concept-focused crawling to find relevant documents from the web. The documents will be ranked based on their concept similarity to the topic. The filtered and crawled data will then be provided to students as speech output through a text-to-speech system to serve as an automated virtual guide for supplemental learning materials.
Building a Correlation Technology Platform Applications0P5a41b
Building a software application is a challenging undertaking in any vertical market. This is a step-by-step guide for entrepreneurs and others interested in implementing a software application layer on top of the Correlation Technology Platform to bring their startup visions to reality.
Correlation Technology Business Solutions: Market Researchs0P5a41b
This is a no-nonsense business-to-business document containing an in-depth analysis of the market research industry, its competitive landscape, major players, and complete SWOT analysis. Specific problems currently facing the industry are identified, and the disruptive impact of Correlation Technology when used to provide new dynamic solutions to traditional market research challenges. Update: This document and accompanying SWOT analysis has been updated to reflect changes to the competitive landscape in the industry created by the acquisition of Synovate by IPSOS in 2011.
State-of-the-Art: Industry Challenges in ERMs0P5a41b
1. Make Sence Florida has identified several challenges that organizations face when implementing enterprise risk management practices and software solutions. These challenges include an inability to effectively handle large amounts of risk data, degraded data quality, data being forced to fit predefined models rather than reflecting real risks, ineffective filtering of data leading to missed risks, poor communication between different parts of the organization, and silos working independently without oversight.
2. Current risk management software provides some benefits like data aggregation but does not fully address these challenges. Manual processes used to select and analyze risk data can propagate human biases and errors. Without comprehensive solutions, organizations remain exposed to significant risks going unnoticed.
This 2008 study of the market for Internet Search includes original research by Make Sence, Inc. supporting the finding that in 2008, up to 15% of all queries made to the then leading search engines were in fact N-Dimensional Queries. We also demonstrate that most of those queries were not handled well by existing techniques. In addition, our original research supported the hypothesis that the then current demand and latent demand for Search could be modeled using the same techniques applied to estimation of current and latent demand for transportation, called "induced travel", and projected that an effective means of handling N-Dimensional Queries (such as Correlation Technology) could grow Search traffic by an additional 15% - a market worth millions of dollars.
1) Enterprise risk management (ERM) and governance-risk-compliance (GRC) are approaches that have emerged in the past decade but there is no consensus on how they relate.
2) Currently, GRC is seen as a top-down process that sets risk requirements, while ERM identifies and reports on risks, but the document argues this view is flawed.
3) The document contends that ERM should drive risk assessment and response, informing governance and compliance, rather than the other way around. With ERM in charge of holistic risk management, conflicts can be reduced and risks better addressed.
Make Sence controls the licensing of its correlation technology platform. It identifies problems across different vertical markets that can be solved using this platform. Once opportunities are identified, Make Sence will seek to enter those markets through licensing agreements or forming new ventures. The correlation technology platform uses patented components like discovery, acquisition, correlation, and refinement to analyze data and discover relationships across multidimensional problems. Make Sence develops specialized versions of this platform tailored for different industries and partners with companies through various business models like licensing, revenue sharing, or equity sharing agreements.
This is an under-the-hood look at the Correlation Technology Platform in action. All of Wikipedia's 3.5 million articles have been converted to "Knowledge Fragments." Frame-by-frame, with in-depth notations, Correlation Technology is used in this actual online demonstration to reveal how connections from "population density" to "terrorism" are discovered and presented.
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtLynda Kane
Slide Deck from Buckeye Dreamin' 2024 presentation Assessing and Resolving Technical Debt. Focused on identifying technical debt in Salesforce and working towards resolving it.
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB
I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPathCommunity
Join this UiPath Community Berlin meetup to explore the Orchestrator API, Swagger interface, and the Test Manager API. Learn how to leverage these tools to streamline automation, enhance testing, and integrate more efficiently with UiPath. Perfect for developers, testers, and automation enthusiasts!
📕 Agenda
Welcome & Introductions
Orchestrator API Overview
Exploring the Swagger Interface
Test Manager API Highlights
Streamlining Automation & Testing with APIs (Demo)
Q&A and Open Discussion
Perfect for developers, testers, and automation enthusiasts!
👉 Join our UiPath Community Berlin chapter: https://community.uipath.com/berlin/
This session streamed live on April 29, 2025, 18:00 CET.
Check out all our upcoming UiPath Community sessions at https://community.uipath.com/events/.
Automation Dreamin' 2022: Sharing Some Gratitude with Your UsersLynda Kane
Slide Deck from Automation Dreamin'2022 presentation Sharing Some Gratitude with Your Users on creating a Flow to present a random statement of Gratitude to a User in Salesforce.
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc
Most consumers believe they’re making informed decisions about their personal data—adjusting privacy settings, blocking trackers, and opting out where they can. However, our new research reveals that while awareness is high, taking meaningful action is still lacking. On the corporate side, many organizations report strong policies for managing third-party data and consumer consent yet fall short when it comes to consistency, accountability and transparency.
This session will explore the research findings from TrustArc’s Privacy Pulse Survey, examining consumer attitudes toward personal data collection and practical suggestions for corporate practices around purchasing third-party data.
Attendees will learn:
- Consumer awareness around data brokers and what consumers are doing to limit data collection
- How businesses assess third-party vendors and their consent management operations
- Where business preparedness needs improvement
- What these trends mean for the future of privacy governance and public trust
This discussion is essential for privacy, risk, and compliance professionals who want to ground their strategies in current data and prepare for what’s next in the privacy landscape.
Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark?
At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. 🌍
Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! 🚀
Rock, Paper, Scissors: An Apex Map Learning JourneyLynda Kane
Slide Deck from Presentations to WITDevs (April 2021) and Cleveland Developer Group (6/28/2023) on using Rock, Paper, Scissors to learn the Map construct in Salesforce Apex development.
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Impelsys Inc.
Impelsys provided a robust testing solution, leveraging a risk-based and requirement-mapped approach to validate ICU Connect and CritiXpert. A well-defined test suite was developed to assess data communication, clinical data collection, transformation, and visualization across integrated devices.
Procurement Insights Cost To Value Guide.pptxJon Hansen
Procurement Insights integrated Historic Procurement Industry Archives, serves as a powerful complement — not a competitor — to other procurement industry firms. It fills critical gaps in depth, agility, and contextual insight that most traditional analyst and association models overlook.
Learn more about this value- driven proprietary service offering here.
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55
We’re bringing the TDX energy to our community with 2 power-packed sessions:
🛠️ Workshop: MuleSoft for Agentforce
Explore the new version of our hands-on workshop featuring the latest Topic Center and API Catalog updates.
📄 Talk: Power Up Document Processing
Dive into smart automation with MuleSoft IDP, NLP, and Einstein AI for intelligent document workflows.
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john
Analyze the growth of meme coins from mere online jokes to potential assets in the digital economy. Explore the community, culture, and utility as they elevate themselves to a new era in cryptocurrency.
Mobile App Development Company in Saudi ArabiaSteve Jonas
EmizenTech is a globally recognized software development company, proudly serving businesses since 2013. With over 11+ years of industry experience and a team of 200+ skilled professionals, we have successfully delivered 1200+ projects across various sectors. As a leading Mobile App Development Company In Saudi Arabia we offer end-to-end solutions for iOS, Android, and cross-platform applications. Our apps are known for their user-friendly interfaces, scalability, high performance, and strong security features. We tailor each mobile application to meet the unique needs of different industries, ensuring a seamless user experience. EmizenTech is committed to turning your vision into a powerful digital product that drives growth, innovation, and long-term success in the competitive mobile landscape of Saudi Arabia.
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes
Technical Whitepaper: A Knowledge Correlation Search Engine
1. A Knowledge Correlation Search Engine
Technical White Paper
Search engines are widely acknowledged to be part of the Information Retrieval (IR) domain
of knowledge. IR methods are directed to locating resources (typically documents) that are
relevant to a question called a query. That query can take forms ranging from a single search term
to a complex sentence composed in a natural language such as English. The collection of potential
resources that are searched is called a corpus (body), and different techniques have been
developed to search each type of corpus. For example, techniques used to search the set of
articles contained in a digitized encyclopedia differ from the techniques used by a web search
engine. Regardless of the techniques utilized, the core issue in IR is relevance – that is, the
relevance of the documents retrieved to the original query. Formal metrics are applied to
compare the effectiveness of the various IR methods. Common IR effectiveness metrics include
precision, which is the proportion of relevant documents retrieved to all retrieved documents;
recall, which is the proportion of relevant documents retrieved to all relevant documents in the
corpus; and fall-out, which is the proportion of irrelevant documents retrieved to all irrelevant
documents in the corpus. Post retrieval, documents deemed relevant are (in most IR systems)
assigned a relevance rank, again using a variety of techniques, and results are returned. Although
most commonly the query is submitted by – and the results returned to - a human being called a
user, the user can be another software process.
Text retrieval is a type of IR that is typically concerned with locating relevant documents
which are composed of text, and document retrieval is concerned with locating specific fragments
of text documents, particularly those documents composed of unstructured (or “free”) text.
The related knowledge domain of data retrieval differs from IR in that data retrieval is
concerned with rapid, accurate retrieval of specific data items, such as records from a SQL
database.
Information extraction (IE) is another type of IR which is has the purpose of automatic
extraction of information from unstructured (usually text) documents into data structures such as a
template of name/value pairs. From such templates, the information can subsequently correctly
update or be inserted into a relational database.
The Knowledge Correlation Search Engine differs from existing search engines because the
Knowledge Correlation process attempts to construct an exhaustive collection of paths describing
all connections - called correlations - between one term, phrase, or concept referred to as X (or
“origin”) and a minimum of a second term, phrase or concept referred to as Y (or “destination”).
If one or more such correlations can in fact be constructed, the Knowledge Correlation Search
Engine identifies as relevant all resources which contributed to constructing the correlation(s).
Unlike existing search engines, relevancy in the Knowledge Correlation Search Engine applies not to
individual terms, phrases or concepts in isolation but instead to the answer space of correlations
that includes not only the X and the Y, but to all the terms, phrases and concepts encountered in
constructing the correlations. Because of these novel characteristics, the Knowledge Correlation
Search Engine is uniquely capable of satisfying user queries for which can not be answered using
the content of a single web page or document.
1
Contact: Mark Bobick [email protected] 702.882.5664 Copyright 2007 Make Sence Florida, Inc. All Rights Reserved.
Proprietary and Confidential. Dissemination without permission strictly prohibited
2. A Knowledge Correlation Search Engine
Search engines that have been described in the literature or released as software products
use a number of forms of input, ranging from individual keywords, to phrases, sentences,
paragraphs, concepts and data objects. Although the meanings of keyword, sentence, and
paragraph conform to the common understanding of the terms, the meanings of phrase, concept,
and data object varies by implementation. Sometimes, the word phrase is defined using its
traditional meaning in grammar. In this use, types of phrases include Prepositional Phrases (PP),
Noun Phrases (NP), Verb Phrases (VP), Adjective Phrases, and Adverbial Phrases. For other
implementations, the word phrase may be defined as any proper name (for example “New York
City”). Most definitions require that a phrase contain multiple words, although at least one
definition permits even a single word to be considered a phrase.
Some search engine
implementations utilize a lexicon (a pre-canned list) of phrases. The WordNet Lexical Database is
a common source of phrases.
When used in conjunction with search engines, the word concept generally refers to one of
two constructs. The first construct is concept as a cluster of related words, similar to a thesaurus,
associated with a keyword. In a number of implementations, this cluster is made available to a
user - via a Graphic User Interface (GUI) for correction and customization. The user can tailor the
cluster of words until the resulting concept is most representative of the user’s understanding and
intent. The second construct is concept as a localized semantic net of related words around a
keyword. Here, a local or public ontology and taxonomy is consulted to create a semantic net
around the keyword. Some implementations of concept include images and other non-text
elements.
Topics in general practice need to be identified or “detected” from a applying a specific set
of operations against a body of text. Different methodologies for identification and/or detection
of topics have been described in the literature. Use of a topic as input to a search engine
therefore usually means that a body of text is input, and a required topic identification or topic
detection function is invoked. Depending upon the format and length of the resulting topic, an
appropriate relevancy function can then be invoked by the search engine.
Data objects as input to a search engine can take forms including a varying length set of free
form sentences, to full length text documents, to meta-data documents such as XML documents.
The Object Oriented (OO) paradigm dictates that OO systems accept objects as inputs. Some
software function is almost always required to process the input object so that the subsequent
relevance function of the search engine can proceed.
Input to the Knowledge Correlation Search Engine differs from current uses because all input
modes of the Knowledge Correlation Search Engine must present a minimum of two (2) nonidentical terms, phrases, or concepts. “Non-identical” in this usage means lexical or semantic
overlap or disjunction is required. The minimum two terms, phrases, or concepts are referred to
as X and Y (or “origin” and “destination”). No input process can result in synonymy, identity, or
idempotent X and Y term, phrases or concepts.
2
Contact: Mark Bobick [email protected] 702.882.5664 Copyright 2007 Make Sence Florida, Inc. All Rights Reserved.
Proprietary and Confidential. Dissemination without permission strictly prohibited
3. A Knowledge Correlation Search Engine
As with existing art, text objects and data objects can be accepted (in the Knowledge
Correlation Search Engine, as either X or Y) and the topics and/or concepts can be extracted prior
to submission to the Knowledge Correlation process. However, unlike most (if not all) existing
search engines, the form of the input (term, phrase, concept, or object) is not constrained in the
Knowledge Correlation Search Engine. This is possible because the relevancy function (Knowledge
Correlation) does not utilize similarity measures to establish relevancy. This characteristic will
allow the Knowledge Correlation Search Engine to be seamlessly integrated with many existing IR
applications.
Regardless of the forms or methods of input, the purpose of Knowledge Correlation in the
Knowledge Correlation Search Engine is to establish document relevancy. Currently, relevancy is
established in IR using three general approaches: set-theoretic models which represent documents
by sets; algebraic models which represent documents as vectors or matrices; and probabilistic
models which use probabilistic theorems to learn document attributes (such as topic). Each model
provides a means of determining if one or more documents are similar and thereby, relevant, to a
given input. For example, the most basic set-theoretic model uses the standard Boolean approach
to relevancy – does an input word appear in the document? If yes, the document is relevant. If no,
then the document is not relevant. Algebraic models utilize techniques such as vector space
models where documents represented as vectors of terms are compared to the input query
represented as a vector of terms. Similarity of the vectors implies relevancy of the documents.
For probabilistic models, relevancy is determined by the compared probabilities of input and
document.
As described above, the Knowledge Correlation Search Engine establishes relevancy by an
entirely different process, using an entirely different criteria than any existing search engine.
However, the Knowledge Correlation Search Engine is dependent upon Discovery and Acquisition of
“relevant” sources within the corpus (especially if that corpus is the WWW). For this reason, any
form of the existing art can be utilized without restriction during the Discovery phase to assist in
identifying candidate resources for input to the Knowledge Correlation process.
For all search engines, simply determining relevancy of a given document to a given input is
necessary but not sufficient. After all – using the standard Boolean approach to relevancy as an
example – for any query against the WWW which contained the word “computer”, tens of millions
of documents would qualify as relevant. If the user was actually interested only in documents
describing a specific application of “computer”, such a large result set would prove unusable. As a
practical matter, users require that search engines rank their results from most relevant to least
relevant. Typically, users prefer to have the relevant documents presented in order of decreasing
relevance – with the most relevant result first. Because most relevance functions produce real
number values, a natural way to rank any search engine result set is to rank the members of the
result set by their respective relevance scores.
3
Contact: Mark Bobick [email protected] 702.882.5664 Copyright 2007 Make Sence Florida, Inc. All Rights Reserved.
Proprietary and Confidential. Dissemination without permission strictly prohibited
4. A Knowledge Correlation Search Engine
Ranked result sets have been the key to marketplace success for search engines. The
current dominance of the Google search engine (a product of Google, Inc.) is due to the PageRank
system used in Google that lets (essentially) the popularity of a given document dictate result rank.
Popularity in the Google example applies to the number of links and to the preferences of Google
users who input any given search term or phrase. These rankings permit Google to optimize
searches by returning only those documents with ranks above a certain threshold (called k). Other
methods used by web search engines to rank results include “Hubs & Authorities” which counts
links into and out of a given web page or document, Markov chains, and random walks.
The Knowledge Correlation Search Engine utilizes a ranking method that is novel because it
is a function of the degree to which a given document or resource contributed to the correlation
“answer space”. That answer space is constructed from data structures called nodes, which in turn
are created by decomposition of relevant resources. Even the most naïve ranking function of the
Knowledge Correlation Search Engine – which counts the frequency of node occurrence in the
answer space – can identify documents that uniquely or strongly relevant to the original user query.
More sophisticated ranking mechanisms can dramatically improve that outcome.
The Knowledge Correlation Search Engine is a new and novel form of search engine which
utilizes a computer implemented method to identify at least one resource, referenced by that
resource’s unique URI (Uniform Resource Identifier) or referenced by that resource’s URL (Uniform
Resource Locator), such resource being significant to any given user question, subject, or topic of a
digital information object. For the Knowledge Correlation Search Engine, the user question or
subject or topic acts as input. The input is utilized by a software function which attempts to
construct or discover logical structures within a collection of data objects, each data object being
associated with the resource that contributed the data object, and the constructed or discovered
logical structures being strongly associated with the input. That software function is a knowledge
correlation function and the logical structure is a form of directed acyclic graph termed a quiver of
paths. If such logical structures strongly associated with the input are in fact constructed or
discovered, the data object members of such logical structures become an answer space. Using
the answer space, another software function is then able to determine with a high degree of
confidence which of the resources that contributed to the answer space are the most significant
contributors to the answer space, and thereby identify URLs and URIs most significant to the input
question, subject or topic. Finally, a software function is used to rank in significance to the input
each of the URL and URI referenced resources that contributed data objects to the answer space.
4
Contact: Mark Bobick [email protected] 702.882.5664 Copyright 2007 Make Sence Florida, Inc. All Rights Reserved.
Proprietary and Confidential. Dissemination without permission strictly prohibited