Application of Ontology in Semantic Information Retrieval
by Prof Shahrul Azman from FSTM, UKM
Presentation for MyREN Seminar 2014
Berjaya Hotel, Kuala Lumpur
27 November 2014
Ontology Learning from Text
Ontology construction ‘Layer Cake’
Knowledge representation and knowledge management systems
Subtasks in ontology learning
Most Popular Ontology Learning Tools
The document discusses the basics of ontologies, including their origin in philosophy, definitions, types, benefits and application areas. Some key points are:
- An ontology is a formal specification of a conceptualization used to help humans and programs share knowledge. It establishes a shared vocabulary for exchanging information.
- Ontologies describe domain knowledge and provide an agreed-upon understanding of a domain through concepts and relations. They help solve problems of ambiguity and enable knowledge sharing.
- Ontologies benefit applications like information retrieval, digital libraries, knowledge engineering and natural language processing by facilitating semantic search and integration of data.
The document summarizes a seminar on ontology mapping presented by Samhati Soor. The seminar covered the need for ontology mapping due to the proliferation of ontologies, and the purpose of mapping ontologies to achieve interoperability and sharing knowledge. It defined ontologies and ontology mapping and discussed categories of mapping including between global and local ontologies, between local ontologies, and for merging ontologies. Tools for ontology mapping discussed included GLUE and SAM. Evaluation criteria and challenges of ontology mapping were also summarized along with conclusions and references.
Introduction to Ontology Concepts and TerminologySteven Miller
The document introduces an ontology tutorial that will cover basic concepts of the Semantic Web, Linked Data, and the Resource Description Framework data model as well as the ontology languages RDFS and OWL. The tutorial is intended for information professionals who want to gain an introductory understanding of ontologies, ontology concepts, and terminology. The tutorial will explain how to model and structure data as RDF triples and create basic RDFS ontologies.
The document provides an overview of ontology and its various aspects. It discusses the origin of the term ontology, which derives from Greek words meaning "being" and "science," so ontology is the study of being. It distinguishes between scientific and philosophical ontologies. Social ontology examines social entities. Perspectives on ontology include philosophy, library and information science, artificial intelligence, linguistics, and the semantic web. The goal of ontology is to encode knowledge to make it understandable to both people and machines. It provides motivations for developing ontologies such as enabling information integration and knowledge management. The document also discusses ontology languages, uniqueness of ontologies, purposes of ontologies, and provides references.
The document introduces ontology and describes what it is from both philosophical and computer science perspectives. An ontology in computers consists of a vocabulary to describe a domain, specifications of the meaning of terms, and constraints capturing additional knowledge about the domain. It then provides an example ontology and discusses applications of ontologies such as for the semantic web. It also discusses important considerations for building ontologies such as collaboration, versioning, and ease of use.
This document discusses evaluation methods for information retrieval systems. It begins by outlining different types of evaluation, including retrieval effectiveness, efficiency, and user-based evaluation. It then focuses on retrieval effectiveness, describing commonly used measures like precision, recall, and discounted cumulative gain. It discusses how these measures are calculated and their limitations. The document also introduces other evaluation metrics like R-precision, average precision, and normalized discounted cumulative gain that provide single value assessments of system performance.
Yang Yu is proposing research on improving machine learning based ontology mapping by automatically obtaining training samples from the web. The proposed system would parse two input ontologies to generate queries to search engines and collect documents to use as samples for each ontology class. These samples would then be used to train text classifiers, which would produce probabilistic mappings between classes in the two ontologies. The results would be evaluated by comparing to mappings from human experts. Current work involves exploring alternative text classification tools and ways to utilize the probabilistic mapping values generated by the classifiers.
The document discusses the agenda for a presentation on the Semantic Web. The agenda includes an overview of the World Wide Web, an introduction to the Semantic Web, tools and applications for the Semantic Web, Linking Open Data, the Social Semantic Web, and Open Government. Each section provides details on the topic covered.
1) The document discusses information retrieval and search engines. It describes how search engines work by indexing documents, building inverted indexes, and allowing users to search indexed terms.
2) It then focuses on Elasticsearch, describing it as a distributed, open source search and analytics engine that allows for real-time search, analytics, and storage of schema-free JSON documents.
3) The key concepts of Elasticsearch include clusters, nodes, indexes, types, shards, and documents. Clusters hold the data and provide search capabilities across nodes.
This document provides a full syllabus with questions and answers related to the course "Information Retrieval" including definitions of key concepts, the historical development of the field, comparisons between information retrieval and web search, applications of IR, components of an IR system, and issues in IR systems. It also lists examples of open source search frameworks and performance measures for search engines.
This workshop presentation from Enterprise Knowledge team members Joe Hilger, Founder and COO, and Sara Nash, Technical Analyst, was delivered on June 8, 2020 as part of the Data Summit 2020 virtual conference. The 3-hour workshop provided an interdisciplinary group of participants with a definition of what a knowledge graph is, how it is implemented, and how it can be used to increase the value of your organization’s datas. This slide deck gives an overview of the KM concepts that are necessary for the implementation of knowledge graphs as a foundation for Enterprise Artificial Intelligence (AI). Hilger and Nash also outlined four use cases for knowledge graphs, including recommendation engines and natural language query on structured data.
The document discusses probabilistic retrieval models in information retrieval. It introduces three influential probabilistic models: (1) Maron and Kuhns' 1960 model which calculates the probability of relevance based on historical user data; (2) Salton's model which estimates the probability of term occurrence in relevant documents; (3) A model that ranks documents by the probability of relevance and considers retrieval as a decision between costs of retrieving non-relevant vs. not retrieving relevant documents. The document provides background on the development of probabilistic IR models and challenges of estimating probabilities for evaluation.
This document discusses machine learning algorithms for ranking problems. It introduces supervised learning to rank methods including pointwise, pairwise and listwise approaches. Pointwise methods predict relevance scores independently but don't consider order. Pairwise approaches consider relative order but have high computational costs. Listwise methods aim to optimize entire orderings but have complexity issues. Practical challenges include defining objective metrics, generating training labels, and handling new items with limited data. Semi-supervised learning and matrix factorization can help address labeling problems.
The document provides an introduction to Prof. Dr. Sören Auer and his background in knowledge graphs. It discusses his current role as a professor and director focusing on organizing research data using knowledge graphs. It also briefly outlines some of his past roles and major scientific contributions in the areas of technology platforms, funding acquisition, and strategic projects related to knowledge graphs.
INTRODUCTION TO INFORMATION RETRIEVAL
This lecture will introduce the information retrieval problem, introduce the terminology related to IR, and provide a history of IR. In particular, the history of the web and its impact on IR will be discussed. Special attention and emphasis will be given to the concept of relevance in IR and the critical role it has played in the development of the subject. The lecture will end with a conceptual explanation of the IR process, and its relationships with other domains as well as current research developments.
INFORMATION RETRIEVAL MODELS
This lecture will present the models that have been used to rank documents according to their estimated relevance to user given queries, where the most relevant documents are shown ahead to those less relevant. Many of these models form the basis for many of the ranking algorithms used in many of past and today’s search applications. The lecture will describe models of IR such as Boolean retrieval, vector space, probabilistic retrieval, language models, and logical models. Relevance feedback, a technique that either implicitly or explicitly modifies user queries in light of their interaction with retrieval results, will also be discussed, as this is particularly relevant to web search and personalization.
basis of infromation retrival part 1 retrival toolsSaroj Suwal
This document discusses various tools for retrieving literature, including catalogs, indexes, registers, and online databases. It describes the purpose and format of each tool. Catalogs provide access to collections and contain descriptive metadata. Indexes arrange information alphabetically and by subject but do not provide location details. Registers function like catalogs for museum collections. Bibliographic databases contain searchable references to published works. Secondary publications abstract and index primary documents to help users find relevant information.
This document discusses ontology-based data access. It begins by defining ontology as a representation of concepts and relationships that define a domain. It then provides examples of ontology elements like concepts, attributes, and relations. It describes how ontologies can be used to share understanding, enable knowledge reuse, and separate domain from operational knowledge. The document outlines the process for developing ontologies including scope, capture, encoding, integration, and evaluation. It discusses using ontologies to provide a user-oriented view of data and facilitate query access across data sources. The document concludes by discussing ongoing work on semantic query analysis and graphical ontology mapping tools.
The document provides an introduction to knowledge graphs. It discusses how knowledge graphs are being used by large enterprises and intelligent agents to capture concepts, entities, and relationships within domains to drive business, generate insights, and enhance relationships. The presentation will cover an overview of what knowledge graphs are, who uses them, why they are used, and how to use them. It then provides some examples of how knowledge graphs are applied, including in intelligent agents, semantic web, search engines, social networks, biology, enterprise knowledge management, and more.
Broad introduction to information retrieval and web search, used to teaching at the Yahoo Bangalore Summer School 2013. Slides are a mash-up from my own and other people's presentations.
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
The slides from the Learning to Rank for Recommender Systems tutorial given at ACM RecSys 2013 in Hong Kong by Alexandros Karatzoglou, Linas Baltrunas and Yue Shi.
The document summarizes a technical seminar on web-based information retrieval systems. It discusses information retrieval architecture and approaches, including syntactical, statistical, and semantic methods. It also covers web search analysis techniques like web structure analysis, content analysis, and usage analysis. The document outlines the process of web crawling and types of crawlers. It discusses challenges of web structure, crawling and indexing, and searching. Finally, it concludes that as unstructured online information grows, information retrieval techniques must continue to improve to leverage this data.
The document provides an overview of knowledge graphs and introduces metaphactory, a knowledge graph platform. It discusses what knowledge graphs are, examples like Wikidata, and standards like RDF. It also outlines an agenda for a hands-on session on loading sample data into metaphactory and exploring a knowledge graph.
The Semantic Web #9 - Web Ontology Language (OWL)Myungjin Lee
This is a lecture note #9 for my class of Graduate School of Yonsei University, Korea.
It describes Web Ontology Language (OWL) for authoring ontologies.
Knowledgebases differ from databases in three key ways:
1. Knowledgebases capture human knowledge and place it in a system that can solve complex problems using that knowledge, while databases simply store and organize data.
2. Knowledgebases are dynamic and can learn over time as new knowledge is added, whereas databases do not learn or change based on new information.
3. Knowledgebases can use the stored knowledge to provide answers, recommendations, and expert advice, while databases only retrieve and display stored data in response to queries.
Dewey Decimal Classification vs Library of Congress Classification Francheska Vonne Gali
A graphical design on DDC vs LCC.
Library of Congress System and Dewey Decimal System are two popular classification systems in libraries.
Course: LIBSCI 22 - Organization of Information Resources II
Teacher: Sarah Angiela Ragay
The document discusses the Sears List of Subject Headings (SLSH), a controlled vocabulary used for subject cataloging in small to medium sized libraries. It provides an overview of the history and purpose of SLSH, describes some of its key features like new subject headings added in the 21st edition, and outlines its underlying principles of direct, specific, and consistent subject entries based on common usage. The structure of SLSH is also briefly explained as an alphabetical list of standard subject names for the entire range of knowledge.
This is the presentation slides for the joint conference of the 134th SIG conference of Information Fundamentals and Access Technologies (IFAT) and 112th SIG conference of Document Communication (DC), Information Processing Society of Japan (IPSJ)March 22, 2019, at Toyo University, Hakusan Campus.
Cite: Kei Kurakawa, Yuan Sun, and Satoko Ando, Applying a new subject classification scheme for a database by a data-driven correspondence, IPSJ SIG Technical Report, Vol.2019-IFAT-134/2019-DC-112, No.7, pp.1-10, (2019).
A guide and a process for creating OWL ontologies.
Semantic Web course
e-Lite group (https://elite.polito.it)
Politecnico di Torino, 2017
The document discusses the agenda for a presentation on the Semantic Web. The agenda includes an overview of the World Wide Web, an introduction to the Semantic Web, tools and applications for the Semantic Web, Linking Open Data, the Social Semantic Web, and Open Government. Each section provides details on the topic covered.
1) The document discusses information retrieval and search engines. It describes how search engines work by indexing documents, building inverted indexes, and allowing users to search indexed terms.
2) It then focuses on Elasticsearch, describing it as a distributed, open source search and analytics engine that allows for real-time search, analytics, and storage of schema-free JSON documents.
3) The key concepts of Elasticsearch include clusters, nodes, indexes, types, shards, and documents. Clusters hold the data and provide search capabilities across nodes.
This document provides a full syllabus with questions and answers related to the course "Information Retrieval" including definitions of key concepts, the historical development of the field, comparisons between information retrieval and web search, applications of IR, components of an IR system, and issues in IR systems. It also lists examples of open source search frameworks and performance measures for search engines.
This workshop presentation from Enterprise Knowledge team members Joe Hilger, Founder and COO, and Sara Nash, Technical Analyst, was delivered on June 8, 2020 as part of the Data Summit 2020 virtual conference. The 3-hour workshop provided an interdisciplinary group of participants with a definition of what a knowledge graph is, how it is implemented, and how it can be used to increase the value of your organization’s datas. This slide deck gives an overview of the KM concepts that are necessary for the implementation of knowledge graphs as a foundation for Enterprise Artificial Intelligence (AI). Hilger and Nash also outlined four use cases for knowledge graphs, including recommendation engines and natural language query on structured data.
The document discusses probabilistic retrieval models in information retrieval. It introduces three influential probabilistic models: (1) Maron and Kuhns' 1960 model which calculates the probability of relevance based on historical user data; (2) Salton's model which estimates the probability of term occurrence in relevant documents; (3) A model that ranks documents by the probability of relevance and considers retrieval as a decision between costs of retrieving non-relevant vs. not retrieving relevant documents. The document provides background on the development of probabilistic IR models and challenges of estimating probabilities for evaluation.
This document discusses machine learning algorithms for ranking problems. It introduces supervised learning to rank methods including pointwise, pairwise and listwise approaches. Pointwise methods predict relevance scores independently but don't consider order. Pairwise approaches consider relative order but have high computational costs. Listwise methods aim to optimize entire orderings but have complexity issues. Practical challenges include defining objective metrics, generating training labels, and handling new items with limited data. Semi-supervised learning and matrix factorization can help address labeling problems.
The document provides an introduction to Prof. Dr. Sören Auer and his background in knowledge graphs. It discusses his current role as a professor and director focusing on organizing research data using knowledge graphs. It also briefly outlines some of his past roles and major scientific contributions in the areas of technology platforms, funding acquisition, and strategic projects related to knowledge graphs.
INTRODUCTION TO INFORMATION RETRIEVAL
This lecture will introduce the information retrieval problem, introduce the terminology related to IR, and provide a history of IR. In particular, the history of the web and its impact on IR will be discussed. Special attention and emphasis will be given to the concept of relevance in IR and the critical role it has played in the development of the subject. The lecture will end with a conceptual explanation of the IR process, and its relationships with other domains as well as current research developments.
INFORMATION RETRIEVAL MODELS
This lecture will present the models that have been used to rank documents according to their estimated relevance to user given queries, where the most relevant documents are shown ahead to those less relevant. Many of these models form the basis for many of the ranking algorithms used in many of past and today’s search applications. The lecture will describe models of IR such as Boolean retrieval, vector space, probabilistic retrieval, language models, and logical models. Relevance feedback, a technique that either implicitly or explicitly modifies user queries in light of their interaction with retrieval results, will also be discussed, as this is particularly relevant to web search and personalization.
basis of infromation retrival part 1 retrival toolsSaroj Suwal
This document discusses various tools for retrieving literature, including catalogs, indexes, registers, and online databases. It describes the purpose and format of each tool. Catalogs provide access to collections and contain descriptive metadata. Indexes arrange information alphabetically and by subject but do not provide location details. Registers function like catalogs for museum collections. Bibliographic databases contain searchable references to published works. Secondary publications abstract and index primary documents to help users find relevant information.
This document discusses ontology-based data access. It begins by defining ontology as a representation of concepts and relationships that define a domain. It then provides examples of ontology elements like concepts, attributes, and relations. It describes how ontologies can be used to share understanding, enable knowledge reuse, and separate domain from operational knowledge. The document outlines the process for developing ontologies including scope, capture, encoding, integration, and evaluation. It discusses using ontologies to provide a user-oriented view of data and facilitate query access across data sources. The document concludes by discussing ongoing work on semantic query analysis and graphical ontology mapping tools.
The document provides an introduction to knowledge graphs. It discusses how knowledge graphs are being used by large enterprises and intelligent agents to capture concepts, entities, and relationships within domains to drive business, generate insights, and enhance relationships. The presentation will cover an overview of what knowledge graphs are, who uses them, why they are used, and how to use them. It then provides some examples of how knowledge graphs are applied, including in intelligent agents, semantic web, search engines, social networks, biology, enterprise knowledge management, and more.
Broad introduction to information retrieval and web search, used to teaching at the Yahoo Bangalore Summer School 2013. Slides are a mash-up from my own and other people's presentations.
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
The slides from the Learning to Rank for Recommender Systems tutorial given at ACM RecSys 2013 in Hong Kong by Alexandros Karatzoglou, Linas Baltrunas and Yue Shi.
The document summarizes a technical seminar on web-based information retrieval systems. It discusses information retrieval architecture and approaches, including syntactical, statistical, and semantic methods. It also covers web search analysis techniques like web structure analysis, content analysis, and usage analysis. The document outlines the process of web crawling and types of crawlers. It discusses challenges of web structure, crawling and indexing, and searching. Finally, it concludes that as unstructured online information grows, information retrieval techniques must continue to improve to leverage this data.
The document provides an overview of knowledge graphs and introduces metaphactory, a knowledge graph platform. It discusses what knowledge graphs are, examples like Wikidata, and standards like RDF. It also outlines an agenda for a hands-on session on loading sample data into metaphactory and exploring a knowledge graph.
The Semantic Web #9 - Web Ontology Language (OWL)Myungjin Lee
This is a lecture note #9 for my class of Graduate School of Yonsei University, Korea.
It describes Web Ontology Language (OWL) for authoring ontologies.
Knowledgebases differ from databases in three key ways:
1. Knowledgebases capture human knowledge and place it in a system that can solve complex problems using that knowledge, while databases simply store and organize data.
2. Knowledgebases are dynamic and can learn over time as new knowledge is added, whereas databases do not learn or change based on new information.
3. Knowledgebases can use the stored knowledge to provide answers, recommendations, and expert advice, while databases only retrieve and display stored data in response to queries.
Dewey Decimal Classification vs Library of Congress Classification Francheska Vonne Gali
A graphical design on DDC vs LCC.
Library of Congress System and Dewey Decimal System are two popular classification systems in libraries.
Course: LIBSCI 22 - Organization of Information Resources II
Teacher: Sarah Angiela Ragay
The document discusses the Sears List of Subject Headings (SLSH), a controlled vocabulary used for subject cataloging in small to medium sized libraries. It provides an overview of the history and purpose of SLSH, describes some of its key features like new subject headings added in the 21st edition, and outlines its underlying principles of direct, specific, and consistent subject entries based on common usage. The structure of SLSH is also briefly explained as an alphabetical list of standard subject names for the entire range of knowledge.
This is the presentation slides for the joint conference of the 134th SIG conference of Information Fundamentals and Access Technologies (IFAT) and 112th SIG conference of Document Communication (DC), Information Processing Society of Japan (IPSJ)March 22, 2019, at Toyo University, Hakusan Campus.
Cite: Kei Kurakawa, Yuan Sun, and Satoko Ando, Applying a new subject classification scheme for a database by a data-driven correspondence, IPSJ SIG Technical Report, Vol.2019-IFAT-134/2019-DC-112, No.7, pp.1-10, (2019).
A guide and a process for creating OWL ontologies.
Semantic Web course
e-Lite group (https://elite.polito.it)
Politecnico di Torino, 2017
The document defines ontologies as explicit descriptions of a domain that define concepts, properties, attributes, and constraints. It discusses the history of categorization in philosophy and the development of knowledge models like semantic nets and conceptual graphs. The document outlines different methods for building ontologies and different types of ontologies. It also discusses ontology tools like Protege and TopBraid Composer and how ontologies are used on the semantic web through languages like OWL.
This document outlines a course on Knowledge Representation (KR) on the Web. The course aims to expose students to challenges of applying traditional KR techniques to the scale and heterogeneity of data on the Web. Students will learn about representing Web data through formal knowledge graphs and ontologies, integrating and reasoning over distributed datasets, and how characteristics such as volume, variety and veracity impact KR approaches. The course involves lectures, literature reviews, and milestone projects where students publish papers on building semantic systems, modeling Web data, ontology matching, and reasoning over large knowledge graphs.
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...Jennifer D'Souza
We describe a rule-based approach for the automatic acquisition of salient scientific entities from Computational Linguistics (CL) scholarly article titles. Two observations motivated the approach: (i) noting salient aspects of an article’s contribution in its title; and (ii) pattern regularities capturing the salient terms that could be expressed in a set of rules. Only those lexico-syntactic patterns were selected that were easily recognizable, occurred frequently, and positionally indicated a scientific entity type. The rules were developed on a collection of 50,237 CL titles covering all articles in the ACL Anthology. In total, 19,799 research problems, 18,111 solutions, 20,033 resources, 1,059 languages, 6,878 tools, and 21,687 methods were extracted at an average precision of 75%.
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this paper, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of re-search areas in the field of Computer Science. The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. The approach was evaluated on a gold standard of manually annotated articles yielding a significant improvement over alternative methods.
Use of probabilistic topic models to create scalable representation of documents aim to: (1) organize, summarise and search them, (2) explore them in a way that you can index of ideas contained in them, and (3) browse them in a way that you can find documents dealing specific areas
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Giannis Tsakonas
This document proposes a methodology for discovering patterns in scientific literature using a case study of digital library evaluation. It involves:
1. Classifying documents to identify relevant papers using naive Bayes classification.
2. Semantically annotating papers with concepts from a Digital Library Evaluation Ontology using the GoNTogle annotation tool. Over 2,600 annotations were generated.
3. Clustering the annotated papers into coherent groups using k-means clustering.
4. Interpreting the clusters with the assistance of the ontology to discover patterns and trends in the literature. Benchmarking tests were performed to evaluate effectiveness of the methodology.
Semi-automated Exploration and Extraction of Data in Scientific TablesElsevier
Ron Daniel and Corey Harper of Elsevier Labs present at the Columbia University Data Science Institute: https://www.elsevier.com/connect/join-us-as-elsevier-data-scientists-present-at-columbia-university
Amit Sheth with TK Prasad, "Semantic Technologies for Big Science and Astrophysics", Invited Plenary Presentation, at Earthcube Solar-Terrestrial End-User Workshop, NJIT, Newark, NJ, August 13, 2014.
Like many other fields of Big Science, Astrophysics and Solar Physics deal with the challenges of Big Data, including Volume, Variety, Velocity, and Veracity. There is already significant work on handling volume related challenges, including the use of high performance computing. In this talk, we will mainly focus on other challenges from the perspective of collaborative sharing and reuse of broad variety of data created by multiple stakeholders, large and small, along with tools that offer semantic variants of search, browsing, integration and discovery capabilities. We will borrow examples of tools and capabilities from state of the art work in supporting physicists (including astrophysicists) [1], life sciences [2], material sciences [3], and describe the role of semantics and semantic technologies that make these capabilities possible or easier to realize. This applied and practice oriented talk will complement more vision oriented counterparts [4].
[1] Science Web-based Interactive Semantic Environment: http://sciencewise.info/
[2] NCBO Bioportal: http://bioportal.bioontology.org/ , Kno.e.sis’s work on Semantic Web for Healthcare and Life Sciences: http://knoesis.org/amit/hcls
[3] MaterialWays (a Materials Genome Initiative related project): http://wiki.knoesis.org/index.php/MaterialWays
[4] From Big Data to Smart Data: http://wiki.knoesis.org/index.php/Smart_Data
Survey of natural language processing(midp2)Tariqul islam
Document classification is a part of Natural language processing. We have different methodology and technique for processing the document classification. The purpose of this article is to survey some papers related to document classification. Those survey will help the researcher to understand which will be the best approach to use for natural language processing
study or concern about what kinds of things exist
what entities there are in the universe.
the ontology derives from the Greek onto (being) and logia (written or spoken). It is a branch of metaphysics , the study of first principles or the root of things.
Representation of ontology by Classified Interrelated object modelMihika Shah
1. The document discusses representing ontology using the Classified Interrelated Object Model (CIOM) data modeling technique. CIOM represents ontology components like classes, subclasses, attributes, and relationships between classes.
2. Key components of an ontology like classes, subclasses, attributes, and inter-class relationships are described and examples are given of how each would be represented using CIOM notation.
3. CIOM provides a general purpose methodology for representing ontologies using existing database technologies and overcomes limitations of specialized ontology languages and tools.
Generating Lexical Information for Terminologyin a Bioinformatics OntologyHammad Afzal
This document discusses generating lexical information for terms in a bioinformatics ontology. It proposes a model called LexInfo for associating linguistic information with ontologies. The authors lexicalize a bioinformatics ontology called myGrid by creating a LexInfo-based lexicon that captures morphological, syntactic and semantic properties of terms. They generate lexicons both semi-automatically using domain resources and automatically using LexInfo tools. The automatic lexicon has some errors due to POS tagging and tokenization issues that could be addressed using domain knowledge. The enriched ontology may help with automatic annotation of bioinformatics services.
This document summarizes an OKFN Korea hackathon event focused on open data. It discusses modeling Seoul open government data using ontologies, linking it to external datasets like cultural heritage data, and publishing the enriched data in RDF format. It covers topics like linked data, modeling with RDF/RDFS/OWL, reusing existing vocabularies, ontology development best practices, and triple store storage solutions.
This document provides an overview of the Next Generation Science Standards. It discusses that the standards were developed by Achieve in partnership with other organizations to create science standards focused on big ideas. It describes the Framework for K-12 Science Education that the standards are based on, which outlines three dimensions for each standard. It then explains the organization and structure of the Next Generation Science Standards, comparing them to previous standards.
Presented at DocTrain East 2007 by Joe Gelb, Suite Solutions -- Designing, building and maintaining a coherent information architecture is critical to proper planning, creation, management and delivery of documentation and training content. This is especially true when your content is based on a modular or topic-based model such as DITA and SCORM or if you are migrating to such a model.
But where to start? Terms such as taxonomy, semantics, and ontology can be intimidating, and recognized standards like RDF, OWL, Topic Maps (XTM) and SKOS seem so abstract. This pragmatic workshop will provide an overview of the standards and concepts, and a chance to use them hands-on to turn the abstract into tangible skills. We will demonstrate how a well-designed information architecture facilitates reuse and how the information model is integrally connected to conditional and multi-purpose publishing.
We will introduce an innovative, comprehensive methodology for information modeling and content development called SOTA Solution Oriented Topic Architecture. SOTA does not aim to be yet another new standard, but rather a concrete methodology backed up with open-source and accessible tools for using existing standards. We will demonstrate ֖and practice—hands-on—how this powerful methodology can help you organize and express information, determine which content actually needs to be created or updated, and build documentation and training deliverables from your content based on the rules you define.
This workshop is essential for successfully implementing topic models like DITA and SCORM, multi-purpose conditional publishing, and successfully facilitating content reuse.
This document provides an overview of research methods for narrative analysis. It discusses key concepts in narrative analysis including scripts, stories, patterns, themes, coding, and temporal organization. It also covers approaches like contextual analysis, focus groups, retelling narratives, and assumptions related to subjectivity and usefulness. Narrative analysis is presented as an exploratory qualitative methodology to give respondents a venue to articulate their own viewpoints and standards.
http://KOKUIS.my/html5
HTML5 – refers to the modern day of HTML which promotes native handling of video & audio & animation without having to install additional plugins to browser.
.
Bootstrap – A HTML framework supports responsive web design to provide one time webpage development for smartphone, tablet and desktop.
.
Mobirise – a free web design studio that support HTML5 & Bootstrap’s famous ‘block’ design.
This document discusses developing mobile web apps using HTML5, jQuery, and PhoneGap/Apache Cordova. It covers the hybrid approach of using HTML/CSS/JavaScript for the front-end and PhoneGap to package it as a native mobile app. Tools mentioned include Apache Cordova, Node.js, Eclipse, and Xcode. It provides an overview of key topics to be covered in subsequent days, such as the mobile web page structure using jQuery Mobile, connecting to online databases using PHP and MySQL, and building apps with PhoneGap Build.
The document discusses reading and writing text files in Java. It shows how to use the PrintWriter class to write data like names and scores to a text file. It then demonstrates using the Scanner class to read the data back from the file by parsing the strings and integers. The example writes two names and scores to a file, and then reads and prints out the same data.
The document discusses the File class in Java, which represents files and directories on disk without providing file processing capabilities. It describes the File class constructors and common methods like exists(), isFile(), isDirectory(), getPath(), and list(). Examples are provided to demonstrate analyzing a file or directory path specified by the user, including checking if it exists, retrieving information about it, and listing the contents if it is a directory. The document also notes some common errors like using \ instead of \\ in file paths in string literals.
1. Classes allow the creation of user-defined data types through the grouping of related data members and member functions.
2. Class members can be declared as private, public or protected and determine accessibility outside the class.
3. Methods are defined similarly to regular functions but can access any member of the class without passing them as parameters.
This document provides an overview and introduction to developing Android apps using a hybrid approach with HTML5, jQuery, Apache Cordova and the Android SDK. It begins with background information about the author and their experience developing Android apps. It then outlines an agenda for two days of training on this topic. The rest of the document addresses frequently asked questions about Android and the hybrid app development process through short explanations and diagrams. Key topics covered include what Android is, its architecture, tools used for hybrid app development like Cordova and Eclipse, and how to structure an Android project.
The document discusses arrays and motivates their use. It explains that arrays allow storing a large number of values in a program and accessing them through indices. Arrays solve the problem of having to declare many individual variables to store multiple values. The document then introduces the concept of arrays, how to declare and initialize array variables, and how to access elements within an array using indices. It provides examples of declaring, initializing, and accessing one-dimensional and two-dimensional arrays.
The document discusses defining and using methods in Java. It defines what a method is and its key components like the method signature, return type, parameters, and body. It then demonstrates a sample max method to return the maximum of two numbers and traces the steps of invoking the method from the main method, including passing arguments, executing the method body, and returning the result. The document aims to explain the basics of methods in Java, including how to define reusable methods and invoke them to perform certain tasks.
The document discusses defining and using methods in Java. It defines what a method is and its key components like the method signature, return type, parameters, and body. It then demonstrates a sample max method to return the maximum of two numbers and traces the steps of invoking the method from the main method, including passing arguments, executing the method body, and returning the result. The document aims to explain the basics of methods in Java, including how to define reusable methods and invoke them to perform certain tasks.
Dokumen tersebut membahas sejarah perkembangan ilmu pengetahuan dan masyarakat, termasuk definisi ilmu pengetahuan, pengaruh ilmu pengetahuan dalam peradaban dunia, era digital dan perbedaan antara masyarakat bermaklumat dengan masyarakat berpengetahuan.
This document discusses Islam in the era of information and communication technology (ICT). It states that Islam's core belief is that Allah is the creator of all things, and that the purpose of creation is for humans to worship Allah and act as stewards of the earth. While ICT provides benefits, it also poses challenges like the spread of false information and exposure of youth to inappropriate content without oversight. The document argues that Islamic principles of maintaining balance, protecting faith, life, intellect and society, and eliminating harm can help address issues in the ICT era.
This document provides information about a course on fundamentals of Java programming taught by Khirulnizam Abd Rahman. It includes details about the instructor's background and programming experience, course synopsis and objectives, assessment methods, required textbooks, and an outline of topics to be covered such as control structures, methods, arrays, classes, and file I/O. The topic outline lists lessons on selection structures including if-else statements and switch cases, as well as repetition structures and nested control flows.
This document provides information about a course on Java programming fundamentals taught by Khirulnizam Abd Rahman. It includes details about the lecturer's background and programming experience, the course synopsis and objectives, main textbooks, and topics that will be covered such as control structures, methods, arrays, classes, and file I/O. Sample code is provided to demonstrate if/else and switch statements, including nested if statements. An exercise at the end prompts students to write a program using nested if/else statements to assign grades based on scores.
This course is the continuation of the previous course (Algorithm and Problem Solving). It introduces complex flow control, method, array, class design, file and file I/O.
Kursus ini membincangkan adab pengendalian maklumat, isu-isu yang berkaitan dengan teknologi komputer dan teknologi maklumat. Penggunaan komputer peribadi dengan memberi pendedahan kepada perkakasan dan perisian komputer. Membincangkan mengenai internet, telekomunikasi serta pencarian maklumat melalui web. Membentuk pelajar yang beradab dalam pengendalian dan pengurusan maklumat.
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix
Talk at the final event of Data Fusion Dynamics: A Collaborative UK-Saudi Initiative in Cybersecurity and Artificial Intelligence funded by the British Council UK-Saudi Challenge Fund 2024, Cardiff Metropolitan University, 29th April 2025
https://alandix.com/academic/talks/CMet2025-AI-Changes-Everything/
Is AI just another technology, or does it fundamentally change the way we live and think?
Every technology has a direct impact with micro-ethical consequences, some good, some bad. However more profound are the ways in which some technologies reshape the very fabric of society with macro-ethical impacts. The invention of the stirrup revolutionised mounted combat, but as a side effect gave rise to the feudal system, which still shapes politics today. The internal combustion engine offers personal freedom and creates pollution, but has also transformed the nature of urban planning and international trade. When we look at AI the micro-ethical issues, such as bias, are most obvious, but the macro-ethical challenges may be greater.
At a micro-ethical level AI has the potential to deepen social, ethnic and gender bias, issues I have warned about since the early 1990s! It is also being used increasingly on the battlefield. However, it also offers amazing opportunities in health and educations, as the recent Nobel prizes for the developers of AlphaFold illustrate. More radically, the need to encode ethics acts as a mirror to surface essential ethical problems and conflicts.
At the macro-ethical level, by the early 2000s digital technology had already begun to undermine sovereignty (e.g. gambling), market economics (through network effects and emergent monopolies), and the very meaning of money. Modern AI is the child of big data, big computation and ultimately big business, intensifying the inherent tendency of digital technology to concentrate power. AI is already unravelling the fundamentals of the social, political and economic world around us, but this is a world that needs radical reimagining to overcome the global environmental and human challenges that confront us. Our challenge is whether to let the threads fall as they may, or to use them to weave a better future.
Procurement Insights Cost To Value Guide.pptxJon Hansen
Procurement Insights integrated Historic Procurement Industry Archives, serves as a powerful complement — not a competitor — to other procurement industry firms. It fills critical gaps in depth, agility, and contextual insight that most traditional analyst and association models overlook.
Learn more about this value- driven proprietary service offering here.
AI and Data Privacy in 2025: Global TrendsInData Labs
In this infographic, we explore how businesses can implement effective governance frameworks to address AI data privacy. Understanding it is crucial for developing effective strategies that ensure compliance, safeguard customer trust, and leverage AI responsibly. Equip yourself with insights that can drive informed decision-making and position your organization for success in the future of data privacy.
This infographic contains:
-AI and data privacy: Key findings
-Statistics on AI data privacy in the today’s world
-Tips on how to overcome data privacy challenges
-Benefits of AI data security investments.
Keep up-to-date on how AI is reshaping privacy standards and what this entails for both individuals and organizations.
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Aqusag Technologies
In late April 2025, a significant portion of Europe, particularly Spain, Portugal, and parts of southern France, experienced widespread, rolling power outages that continue to affect millions of residents, businesses, and infrastructure systems.
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPathCommunity
Join this UiPath Community Berlin meetup to explore the Orchestrator API, Swagger interface, and the Test Manager API. Learn how to leverage these tools to streamline automation, enhance testing, and integrate more efficiently with UiPath. Perfect for developers, testers, and automation enthusiasts!
📕 Agenda
Welcome & Introductions
Orchestrator API Overview
Exploring the Swagger Interface
Test Manager API Highlights
Streamlining Automation & Testing with APIs (Demo)
Q&A and Open Discussion
Perfect for developers, testers, and automation enthusiasts!
👉 Join our UiPath Community Berlin chapter: https://community.uipath.com/berlin/
This session streamed live on April 29, 2025, 18:00 CET.
Check out all our upcoming UiPath Community sessions at https://community.uipath.com/events/.
Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo
From predictive maintenance to robotic automation, AI is driving the future of manufacturing. But without high-quality annotated data, even the smartest models fall short.
Discover how data annotation services are powering accuracy, safety, and efficiency in AI-driven manufacturing systems.
Precision in data labeling = Precision on the production floor.
Mobile App Development Company in Saudi ArabiaSteve Jonas
EmizenTech is a globally recognized software development company, proudly serving businesses since 2013. With over 11+ years of industry experience and a team of 200+ skilled professionals, we have successfully delivered 1200+ projects across various sectors. As a leading Mobile App Development Company In Saudi Arabia we offer end-to-end solutions for iOS, Android, and cross-platform applications. Our apps are known for their user-friendly interfaces, scalability, high performance, and strong security features. We tailor each mobile application to meet the unique needs of different industries, ensuring a seamless user experience. EmizenTech is committed to turning your vision into a powerful digital product that drives growth, innovation, and long-term success in the competitive mobile landscape of Saudi Arabia.
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Impelsys Inc.
Impelsys provided a robust testing solution, leveraging a risk-based and requirement-mapped approach to validate ICU Connect and CritiXpert. A well-defined test suite was developed to assess data communication, clinical data collection, transformation, and visualization across integrated devices.
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, presentation slides, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
Big Data Analytics Quick Research Guide by Arthur MorganArthur Morgan
This is a Quick Research Guide (QRG).
QRGs include the following:
- A brief, high-level overview of the QRG topic.
- A milestone timeline for the QRG topic.
- Links to various free online resource materials to provide a deeper dive into the QRG topic.
- Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic.
QRGs planned for the series:
- Artificial Intelligence QRG
- Quantum Computing QRG
- Big Data Analytics QRG
- Spacecraft Guidance, Navigation & Control QRG (coming 2026)
- UK Home Computing & The Birth of ARM QRG (coming 2027)
Any questions or comments?
- Please contact Arthur Morgan at [email protected].
100% human made.
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-und-verwaltung-von-multiuser-umgebungen/
HCL Nomad Web wird als die nächste Generation des HCL Notes-Clients gefeiert und bietet zahlreiche Vorteile, wie die Beseitigung des Bedarfs an Paketierung, Verteilung und Installation. Nomad Web-Client-Updates werden “automatisch” im Hintergrund installiert, was den administrativen Aufwand im Vergleich zu traditionellen HCL Notes-Clients erheblich reduziert. Allerdings stellt die Fehlerbehebung in Nomad Web im Vergleich zum Notes-Client einzigartige Herausforderungen dar.
Begleiten Sie Christoph und Marc, während sie demonstrieren, wie der Fehlerbehebungsprozess in HCL Nomad Web vereinfacht werden kann, um eine reibungslose und effiziente Benutzererfahrung zu gewährleisten.
In diesem Webinar werden wir effektive Strategien zur Diagnose und Lösung häufiger Probleme in HCL Nomad Web untersuchen, einschließlich
- Zugriff auf die Konsole
- Auffinden und Interpretieren von Protokolldateien
- Zugriff auf den Datenordner im Cache des Browsers (unter Verwendung von OPFS)
- Verständnis der Unterschiede zwischen Einzel- und Mehrbenutzerszenarien
- Nutzung der Client Clocking-Funktion
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc
Most consumers believe they’re making informed decisions about their personal data—adjusting privacy settings, blocking trackers, and opting out where they can. However, our new research reveals that while awareness is high, taking meaningful action is still lacking. On the corporate side, many organizations report strong policies for managing third-party data and consumer consent yet fall short when it comes to consistency, accountability and transparency.
This session will explore the research findings from TrustArc’s Privacy Pulse Survey, examining consumer attitudes toward personal data collection and practical suggestions for corporate practices around purchasing third-party data.
Attendees will learn:
- Consumer awareness around data brokers and what consumers are doing to limit data collection
- How businesses assess third-party vendors and their consent management operations
- Where business preparedness needs improvement
- What these trends mean for the future of privacy governance and public trust
This discussion is essential for privacy, risk, and compliance professionals who want to ground their strategies in current data and prepare for what’s next in the privacy landscape.
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock
Building 10x Organizations with Modern Productivity Metrics
10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ‘The Coding War Games.’
Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method we invent for the delivery of products, whether physical or virtual, we reinvent productivity philosophies to go alongside them.
But which of these approaches actually work? DORA? SPACE? DevEx? What should we invest in and create urgency behind today, so that we don’t find ourselves having the same discussion again in a decade?
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azman from FSTM, UKM
1. Application of Ontology in Semantic Information Retrieval
Presentation for MyRENSeminar
Berjaya Hotel, Kuala Lumpur
27 November 2014
1
2. Brief speaker’s info
2
Shahrul Azman Mohd. Noah, Ph.D.
Knowledge Technology Research Group
Center for AI Technology (CAIT)
[email protected]
Graduated in BSc(Mathematics) from UKM
Graduated in MSc(IS) from Sheffield U.
Graduated in PhD(IS) from Sheffield U. –
knowledge-based systems
From Muar, Johor
4. What is ontology?
•Ontology may be considered as a kind of method to represent knowledge.
•From a philosophical discipline –the science of “what is”; the kinds and structures of objects, properties, events, processes and relations in every area of reality.
•Aristotle classification of animals is one
the first ontology developed.
6
5. Ontology in Computing
•An ontology is an engineering artifact:
–It is constituted by a specific vocabulary used to describe a certain reality, plus
–A set of explicit assumptions regarding the intended meaning of the vocabulary.
•Thus, an ontology describes a formal specification of a certain domain:
–Shared understanding of a domain of interest
–Formal and machine manipulablemodel of a domain of interest
7
6. 8
Ontology Definition
Formal, explicit specification of a shared conceptualization
commonly accepted understanding
conceptual model of a domain (ontological theory)
unambiguous terminology definitions
machine-readability with computational semantics
[Gruber93]
7. Source: Smith & Welty (2001)
a catalog
a set of
text files
a glossary
a thesaurus
a collection of
taxonomies
a set of
general logical
constraints
a collection of
frames
Complexity
An ontology is…
9
8. Various approaches to classify ontologies
10
Classify ontologies according to the information the ontology needs to express and the richness of its internal structure (Lassila& McGuiness, 2001)
Classify into 2 orthogonal dimensions: the amount and type of structure and the subject (Van Heijstet al., 1997)
Classify ontologies according to their level of dependence on a particular task (Guarino, 1998)
9. Ontology language
• Ontology languages are formal languages used to construct ontologies
– allow the encoding of knowledge about specific domains and often
– include reasoning rules that support the processing of that knowledge
• Various languages have been proposed: CycL, KL-One, Ontolingua, F-Logic,
OCML, LOOM, Telos, RDF(S), OIL, DAML+OIL, XOL, SHOE,
OWL etc.
• Usually based on Description Logic (DL).
• Summarised as (Kalibatiene & Vasilecas, 2011):
11
10. Example of ontologies
•Top level ontology -
12
Suggested Upper Merged Ontology (SUMO
11. 13
Portion of SUMO ontology with
USGS Geo-concepts inserted
17. Concepts
•“Information retrieval (IR)is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.” (Salton, 1968).
•Applications of IR: recommendations, Q&A, filtering… and of course searching.
20
18. Issues in IR
•Some issues in IR:
–Relevance
–Evaluation
–Users and information needs
•Context based search
•Semantic search
•Etc.
21
21. Ontology and semantic search
•Various ways to support semantic search:
–Query expansion –users query are expanded with related terminological terms
–Disambiguation –resolving terms or concepts when they refer to more than one topics
–Classifying –classify documents such as ads into ontological topics to support semantic search
–Enhanced IR model –embed ontology into existing IR model resulting a modified IR model
25
22. Query Expansion
•Query expansion (QE) is needed due to the ambiguity of natural language.
•Main aim of QE –to add new meaningful terms to the initial query.
26
Bhogal, J., Macfarlane, A. & Smith, A. 2007. A review of ontology based query expansion. Information Processing and Management, 43: 866-886.
24. Semantic index
• Textual documents are indexed according to some ontology
model.
• Remember the concept of vocabulary in IR?
31
architecture
bus
computer
database
….
xml
computer science
collection index terms or vocabulary
of the collection
Extract Indexing
25. Semantic index
• Textual documents are indexed according to some ontology
model.
• Remember the concept of vocabulary in IR?
32
computer science
collection Replace the index with ontological-index
Extract Indexing
architecture
bus
computer
database
….
xml
26. Examples
•Three research projects that illustrate the applications of ontology-based IR:
–Semantic digital library
–Crime news retrieval
–Multi modality ontology-based image retrieval
35
27. Semantic digital library
•Proposed an approach for managing, organizing and populating ontology for document collections in digital library.
•The document metadata and content are inserted and populated to a knowledge base which allows sophisticated query and searching.
•Firstly to propose an ontology based information retrieval model which is based on the classic vector space model which includes document annotation, instance-based weighting and concept-based ranking.
36
29. Semantic digital library
•Involved three ontologies –ACM Topic hierarchies, Geo ontology and Dublin core metadata
•Portion of domain ontology focusing on academic thesis
38
32. VSM Index
#create Class Person
#create instance of Class Student
<Student rdf:ID="Student1">
<rdfs:label>ArifahAlhadi</rdfs:label>
</Student>
<Student rdf:ID="Student2">
<rdfs:labelrdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>AsyrafArifin</rdfs:label>
</Student>
#Create Instance of Class Supervisor
<Supervisor rdf:ID="Supervisor1">
<rdfs:label>PM Dr ShahrulAzman</rdfs:label>
<rdfs:label>Prof. MadyaDr. ShahrulAzmanMohdNoah</rdfs:label>
</Supervisor>
<Supervisor rdf:ID="Supervisor2">
<rdfs:label>Prof Aziz Deraman</rdfs:label>
</Supervisor>
Concept
Instance
Documents
http://www.ukm.my/thesis/supervisor#
http://www.ukm.my/thesis/person#
Supervisor1
Doc1
http://ukm.my/thesis/student#
http://ukm.my/thesis/creator#
http://ukm.my/thesis/person#
Student1
Doc1
http://ukm.my/thesis/student#
http://ukm.my/thesis/creator#
http://ukm.my/thesis/person#
Student2
Doc1
Id
Term
TFIDF
Frq
Doc
Id
1
ArifahAlhadi
0.11
2
Doc1
2
AsyrafArifin
0.123
1
Doc1
3
PMDr ShahrulAzman
0.45
1
Doc1
33. Ontology-based IR for crime news retrieval
•Each crime news must be classified into categories: Traffic Violation, Theft, Sex Crime, Murder, Kidnap, Fraud, Drugs, Cybercrime, Arsonand Gang(Chen et al. 2004)
•Useful entities need to be identified: Person, Location, Organisation, Date/Time, Weapon, Amount, Vehicle, Drug, Personel properties, and Age.
•Clustering of crime news into topics, e.g. NurinJazlinmurder, Canny Ong, Sosilawatietc.
•Clustering of specific topic into various
and chronological events.
•Mapping of named entities into news
ontology to support semantic querying and retrieval.
42
34. Example
43
Murder
Kidnap
Theft
Gang
NurinJazlin
Sosilawati
Canny Ong
Investigation into Canny Ong case include medical report and trial
Evidence/Suspect into Canny Ong case
DNA test
Family reacts into Canny Ong and negligence suit
Court Sentence, plead guilty
(17)
(6)
(3)
(9)
(13)
………………..
Classification
Clustering
Cluster into topics
35. Required methods
•In order to support the aforementioned requirements:
–Conventional text processing -tokenizing, indexing, stopping, stemming etc.
–Named entity recognition (NER)
–Classification and clustering
–Ontology mapping
44
37. Document representation
•Documents will be presented into meaningful forms:
–BoW–Bag of Words
–Named Entity Recognition –used the GATE Annie and Jape rules
–Adopt the Vector Space Model (VSM) but enhanced with ontological model
48
39. Document organization
•Documents need to be organised into categories, topics and events.
–Classification –Adaboostalgorithm
–Clustering –Used the KNN clustering
–Ontology mapping –we have develop a crime news ontology by extending the existing SNaPontology. Includes classes/entities which are important to crime such as classification of crimes, locationand weapon.
50
43. Ontology-based Image Retrieval
•Rapid growth of visual information (VI) –lead to difficulty in finding and accessing VI.
•Inability to capture the semantic content.
•Problem arise –lack of coincidence between information extracted from VI and user needs.
•Conventional approaches of image retrieval (IMR) -TBIR and CBIR have reached their limit in attempting to solve this problem.
•As a result –SBIR approach,
ontology-based provide an explicit
domain oriented semantic for
concept and relationship.
55
44. Ontology-based Image Retrieval
•Illustrate how images are describes based on it visual, textual and domain semantic features.
•Proposed a multi-modality ontology: visual ontology, textual ontology and domain ontology.
•Illustrate how such ontology can be integrated with open source knowledge base (DBpedia) to support a more comprehensive search.
56
48. Conclusion -Practical implementation of ontology-based IR
60
TBox
ABox
Ontology
Documents
Index
Extraction
build
Population
Annotation
Query Processing
query
ranked docs
49. Research issues
•Index representation –most still based on the conventional VSM.
•Ranking –weighting and ranking mechanisms
•Automatic population –supervised and unsupervised
•Extraction & annotation
•Multilingual and cross-language
61
50. References
•Castells, P., Fernandez, M.,Vallet, D. 2007. An Adaptation of Vector Space Model for Ontology Based Information Retrieval. IEEE Transaction on Knowledge and Data Engineering, 19(2):
•Shahrul Azman Noah, Nor AfniRaziahAlias, NurulAida Osman, ZuraidahAbdullah, NazliaOmar, YazrinaYahya, MaryatiMohd Yusof: Ontology-Driven Semantic Digital Library. AIRS2010: 141-150.
•Shahrul Azman Noah, DatulAida Ali: The Role of Lexical Ontology in Expanding the Semantic Textual Content of On-Line News Images. AIRS2010: 193-202.
•Fernández, M., Cantador, I., López, V. , Vallet, D., Castells, P., & Motta, E. 2011. Semantically enhanced information retrieval: an ontology-based approach. Web Semantics: Science, Services and Agents on the World Wide Web, 9: 434-452.
•Kara, S. Alan, O., Sabuncu, O., Akpınar, S., CicekliN.K., & Alpaslan, F.N. 2012. An ontology-based retrieval system using semantic indexing. Information Systems, 37: 294-305.
•Kohler, J., Philippi, S., Specht, M., & Ruegg, A. 2006. Ontology based text indexing and querying for the semantic web. Knowledge-Based Systems, 19: 744-754.
•Etc.
62