Unit-1
Unit-1
The Semantic Web is not a separate Web but an extension of the current one, in which
information is given well-defined meaning, better enabling computers and people to
work in cooperation. It is the application of advanced knowledge technologies to the
web and distributed systems.
The vision of the semantic web is that of a world wide distributed architecture where
data and services easily interoperate.
The presence of huge amount of resources on the Web thus poses a serious problem
of accurate search. This is mainly because today's Web is a human-readable Web
where information cannot be easily processed by machine.
Highly sophisticated, efficient keyword based search engines that have evolved today
have not been able to bridge this gap.
A search engine is a document retrieval system designed to help find information
stored in a computer system, such as on the WWW. The search engine allows one to
ask for content meeting specific criteria and retrieves a list of items that match those
criteria.
Regardless of the underlying architecture, users specify keywords that match words in
huge search engine databases, producing a ranked list of URLs and snippets of Web-
pages in which the keywords matched.
Although such technologies are mostly used, users are still often faced with the
daunting task of shifting through multiple pages of results, many of which are
irrelevant.
The use of ontologies to overcome the limitations of keyword-based search has been
put forward as one of the motivations of the Semantic Web.
One of the biggest problems we nowadays face in the information society is
information overload, a problem which is boosted by the huge size of the WWW. The
Web has given us access to millions of resources, irrespective of their physical
location and language.
In order to deal with this sheer amount of information, new business models on the
web have seen the light, such as commercial search engines. With the expected
continuous growth of the WWW, we expect search engines will have a hard time
maintaining the quality of retrieval results. Moreover, they only access static content,
and ignore the dynamic part of the web. It is our vision that the technology of current
generation of search engines has it’s limits. To be able deal with the continuous
growth of the WWW (in size its languages and formats), we need to exploit other
information. So here the Semantic Web help us.
The current Web is based on HTML, which specifies how to layout a web page for
human readers. HTML as such cannot be exploited by information retrieval
techniques to improve results, which has thus to rely on the words that form the
content of the page; hence it is restricted to keywords. Search engines are thus
programmed in such a way that the first page shows a diversity of the most relevant
links related to the keyword.
The current Web has its limitations when it comes to :
1. finding relevant information
2. extracting relevant information
3. combining and reusing information
Finding information on the current Web is based on keyword search. Keyword search
has a limited recall and precision due to :
(a) Synonyms : e.g. Searching information about “Cars” will ignore Web
pages that contain the word “Automobiles” even though the information on these
pages could be relevant.
(b) Homonyms : e.g. Searching information about “Jaguar” will bring up
pages containing information about both “Jaguar” (the car brand) and “Jaguar” (the
animal) even though the user is interested only in one of them.
Knowledge gap
The knowledge gap is due to the lack of some kind of background knowledge that
only the human possesses. The background knowledge is often completely missing
from the context of the Web page and thus our computers do not even stand a fair
chance by working on the basis of the web page alone.
Semantic web is being to be developed to overcome the following problems for
current web.
1. The web content lacks a proper structure regarding the representation of
information.
2. Ambiguity of information resulting from poor interconnection of
information.
3. Automatic information transfer is lacking.
4. Usability to deal with enormous number of users and content ensuring trust
at all levels.
5. Incapability of machines to understand the provided information due to lack
of a universal format.
The Semantic Web is an extension of the current web in which information is given
welldefined meaning, better enabling computers and people to work in co-operation.
It is a next generation of the WWW. Information has machine-processable and
machineunderstandable semantics.
Not a separate Web but an augmentation of the current one. The backbone of
Semantic Web are ontologies.
Semantic Web technology provides a basis for information sharing and performing
some functions with generic software, but doesn't solve all application problems. The
some of the areas that must be addressed by an application using Semantic Web
technology.
The main obstacle to provide better support to Web users is that, at present , the
meaning of Web content is not machine accessible.
Although there are tools to retrieve texts, but when it comes to interpreting sentence
and extracting useful information for the user, the capabilities of current software are
still very limited.
In the Semantic Web, pages not only store content as a set of unrelated words in a
document, but also code their meaning and structure.
Semantic Web is set to become the future because it makes the understanding between
humans and machines easy. Semantic web Design methodologies use ontology
languages such as RDF, OWL to represent information internally.
The Semantic Web has been actively promoted since by the World Wide Web
Consortium, the organization that is chiefly responsible for setting technical standards
on the Web.
The core technology of the Semantic Web, logic-based languages for knowledge
representation and reasoning have been developed in the research field of Artificial
Intelligence.
As the potential for connecting information sources on a Web-scale emerged, the
languages that have been used in the past to describe the content of the knowledge
bases of standalone expert systems have been adapted to the open, distributed
environment of the Web.
Since the exchange of knowledge in standard languages is crucial for the
interoperability of tools and services on the Semantic Web, these languages have been
standardized by the W3C as a layered set of languages.
Research to develop languages that, on the one hand, allows human users to describe
the meaning of words and, on the other hand, a computer to process these
descriptions, began in the late 1980s, and led to what we now call ontology languages,
in particular OWL.
Researchers from the School of Computer Science played a pivotal role in this
international effort. Specifically, they played central roles designing these languages
based on logic, namely so-called 'Description Logics', and demonstrated that they are
suitable by developing powerful tools to process and engineer ontologies in these
languages.
Tools for creating, storing and reasoning with ontologies have been primarily
developed by university-affiliated technology startups and at research labs of large
corporations.
Over time the focus of Semantic Web research has been significantly extended to
other topics. In addition to knowledge representation and reasoning topics from
related communities, in particular from Databases, Data Mining, Information
Retrieval and Computational Linguistics.
1. Language Standards and - extensions : The development of standardized
knowledge representation languages was the starting point of Semantic Web research.
Languages like DAML, OWL and RDF, but also representation languages for
services such as DAML-S, OWL-S and WSMO were developed and various
extensions were proposed, only some of which actually made it into the official
language standard. Further, researchers discussed the use of other existing languages
like XML as a basis for the Semantic Web.
2. Logic and Reasoning : Most of the language standards proposed for the Semantic
Web are based on some for-mal logic. Thus extending existing logics to completely
cover the respective standards as well as the development of scalable and efficient
reasoning methods have been in the focus of research form the beginning
3. Ontologies and Modelling : The existence of language standards is necessary for
Semantic Web applications, but it does not enable people to build the right models.
4. Linked Data : As a reaction, linked data has been proposed as a bottom-up
approach, where data is converted into Semantic Web standards with minimal
ontological commitment, published and linked to other data sources.
The Web was a read-only medium for a majority of users. Upto 1990, web was
combination of a telephone book and yellow pages. Some user was knows about
hyperlinks.
When web 2.0 was invented by Tim O’Reilly, attitude towards the web was changed.
Web 2.0 tools allow libraries to enter into a genuine conversation with their users.
Libraries are able to seek out and receive patron feedback and respond directly.
In 2003, noticeable shift in how people and businesses were using the web and
developing web-based applications.
Tim O'Reilly said that 'Web 2.0 is the business revolution in the computer industry
caused by the move to the Internet as a platform, and an attempt to understand the
rules for success on that new platform”.
Many Web 2.0 companies are built almost entirely on user-generated content and
harnessing collective intelligence. Google, MySpace, Flickr, YouTube and Wikipedia,
users create the content, while the sites provide the platforms.
The user is not only contributing content and developing open source software, but
directing how media is delivered, and deciding which news and information outlets
you trust.
At that stage we thought the Web 2.0 stack was fairly empty, but since those days the
extent that people collaborate, communication, and the range of tools and
technologies have rapidly changed.
Editing blogs and wikis did not require any knowledge of HTML any more. Blogs
and wilds allowed individuals and groups to claim their personal space on the Web
and fill it with content at relative ease.
The first online social networks entered the field at the same time as blogging and
wikis started to take off. Web 2.0 is the network as platform, spanning all connected
devices.
Web 2.0 applications are those that make the most of the intrinsic advantages of that
platform. It delivers software as a continually-updated service that gets better the
more people use it.
Consuming and remixing data from multiple sources, including individual users,
while providing their own data and services in a form that allows remixing by others.
Web 3.0:
Web 3.0, or the Semantic Web, is the web era we are currently in, or perhaps the era
we are currently creating. Web 3.0, with its use of semantics and artificial intelligence
is meant to be a “smarter web”, one that knows what content you want to see and how
you want to see it so that it saves you time and improves your life.
Semantic Web is really the participatory web, which today includes “Classics” such
as YouTube, MySpace, eBay, Second Life, Blogger, RapidShare, Facebook and so
forth.
Web 2.0 is that users are willing to provide content as well as metadata. This may take
the form articles and facts organized in tables and categories in Wikipedia, photos
organized in sets and according to tags in Flickr or structured information embedded
into homepages and blog postings using micro-formats.
A major disadvantage associated with Web 2.0 is that the websites become vulnerable
to abuse since, anyone can edit the content of a Web 2.0 site. It is possible for a
person to purposely damage or destroy the content of a website.
Web 2.0 also has to address the issues of privacy. Take the example of YouTube. It
allows any person to upload a video. But what if the video recording was done
without the knowledge of the person who is being shown in the video? Thus, many
experts believe that Web 2.0 might put the privacy of a person at stake.
The basic idea of web 3.0 is to define structure data and link them in order to more
effective discovery, automation, integration, and reuse across various applications. It
is able to improve data management, support accessibility of mobile internet, simulate
creativity and innovation, encourage factor of globalization phenomena, enhance
customers' satisfaction and help to organize collaboration in social web.
Web 3.0 supports world wide database and web oriented architecture which in earlier
stage was described as a web of document.
1. Actors and their actions are viewed as interdependent rather than independent, autonomous
units.
2. Relational ties (linkages) between actors are channels for transfer or “flow” of resources
(either material or nonmaterial).
3. Network models focusing on individuals view the network structure environment as
providing opportunities for or constraints on individual action.
4. Network models conceptualize structure (social, economic, political, and so forth) as
lasting patterns of relations among actors.
Where N is the order of the graph
Graph density (D) is defined as the total number of observed lines in a graph divided
by the total number of possible lines in the same graph. Density ranges from 0 to 1.
This figure shows core-periphery structure that would be perfect without the edge
between nodes.
Affiliation networks contain information about the relationships between two
sets of nodes : a set of subjects and a set of affiliations. An affiliation network can
be formally represented as a bipartite graph, also known as a two-mode network.
Affiliation networks are two mode networks that allow one to study the dual
perspectives of the actors and the events. They look at collections or subsets of
actors or subsets rather than ties between pairs of actors. Connections among
members of one of the modes as based on linkages established through the second
mode.
An affiliation network is a network in which actors are joined together by
common membership of groups or dubs of some kind.
A distinctive feature of affiliation networks is duality i.e. events can be
described as collections of individuals affiliated with them and actors can be
described as collections of events with which they are affiliated.
Based on two-mode matrix data, affiliation networks consist of sets of relations
connecting actors and events, rather than direct ties between pairs of actors as in
one-mode data. Familiar affiliation networks include persons belonging to
associations, social movement activists participating in protest events, firms
creating strategic alliances, and nations signing treaties.
The representation of two-mode data should facilitate the visualization of three
kinds of patterning : a) the actor-event structure b) the actor-actor structure c) the
event-event structure
Many ways to represent affiliation networks :
1. Affiliation network matrix
2. Bipartite graph or Sociomatrix
3. Hypergraph
4. Simplicial Complex
Bipartite Graph:
Nodes are partitions into two subsets and all lines are between pairs of nodes belonging to
different subsets. The following figure shows bipartite network. As there are g actors and h
events, there are g + h nodes.
Bipartite Graph
“The lines on the graph represent the relation “is affiliated with” from the perspective
of the actor and the relation ''has as a member” from the perspective of the event.
No two actors are adjacent and no two events are adjacent. If pairs of actors are
reachable, it is only via paths containing one or more events. Similarly, if pairs of
events are reachable, it is only via paths containing one or more actors.
Advantages:
1. They highlight the connectivity in the network, as well as the indirect chains of
connection.
2. Data is not lost and we always know which individuals attended which events.
Disadvantage:
1. They can be unwieldy when used to depict larger affiliation networks.
Precision is the ratio of the number of relevant records retrieved to the total number of
irrelevant and relevant records retrieved. It is usually expressed as a percentage.
As recall increases, the precision decreases and recall decreases the precision
increases.
The average precision method is more sophisticated in that it takes into account the
order in which the search engine returns document for a person : it assumes that
names of other persons that occur closer to the top of the list represent more important
contacts than names that occur in pages at the bottom of the list. The method is also
more scalable as it requires only downloading the list of top ranking pages once for
each author.