0% found this document useful (0 votes)
12 views18 pages

Business Process Reengineering Informati

Uploaded by

ysimbaa10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views18 pages

Business Process Reengineering Informati

Uploaded by

ysimbaa10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

International

INTERNATIONALJournal of Information
JOURNAL Technology
OF&INFORMATION
Management Information System (IJITMIS), ISSN
TECHNOLOGY &
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME
MANAGEMENT INFORMATION SYSTEM (IJITMIS)

ISSN 0976 – 6405(Print)


ISSN 0976 – 6413(Online) IJITMIS
Volume 4, Issue 3, September - December (2013), pp. 96-113
© IAEME: http://www.iaeme.com/IJITMIS.asp
Journal Impact Factor (2013): 5.2372 (Calculated by GISI) ©IAEME
www.jifactor.com

LINK PATTERNS IN THE WORLD WIDE WEB

Tawfiq Khalil1 and Ching-Seh (Mike) Wu2, Ph.D.


1
Department of Computer Science and Engineering, Oakland University, Rochester,
Michigan, USA
2
Department of Computer Science and Engineering, Oakland University, Rochester,
Michigan, USA

ABSTRACT

In this paper we classify the different link patterns during the evolution of the World
Wide Web and the methods used to support data modeling and navigation, we identify six
patterns based on our observation and analysis and examine the core technology supporting
them (especially the link mechanism) and how data is structured within them. First, we
review the document web paradigm known popularly as Web 1.0, which has primarily
focused on linking documents. We overview the common algorithms used to link people to
documents and service providers. Then we review Web 2.0, which has primarily focused on
linking Web Services (known as mashups) and people (known as the Social Web). Finally,
we review two more recent patterns: one for linking objects to create a global object web and
one for linking data to create a global data space. As part of our review, we identify some of
the challenges and opportunities presented by each pattern.

Keywords: Big data, classic web, global document space, graph databases, global object
space, global data space, Linked Data, NoSQL, semantic web, social web, relational
databases, Web 1.0, Web 2.0, Web 3.0.

I. INTRODUCTION

The World Wide Web (WWW) has enjoyed phenomenal growth and has received
wide global adoption without regard to factors such as the age, ethnicity and location of its
users. The web’s user base has grown continually since its inception and is expected to
comprise nearly 3 billion users by 2015 [1]. Many features have contributed to the Web’s
success, such as its ease of use, the near-ubiquity of its access, and the wealth of valuable
content it contains. The ability to link to content and to navigate from one web page to

96
International Journal of Information Technology & Management Information System (IJITMIS), ISSN
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

another is a core functionality of the original architecture of the classic web. The web’s
evolution since its inception can be characterized according to the nature of what it has
linked. The web has grown from a document repository to an application platform allowing
users to conduct transactions such as buying books, paying bills, taking online classes, and
connecting to each other. As a result, the general function of the web has evolved from
merely linking documents to linking people to service providers, linking services to other
services, and linking people to people. Researchers have made many attempts to build a more
intelligent web, and in this paper we will focus on linking objects and linking data.
The web presents many opportunities and challenges to the research community,
including how to better interlink the classic web and obtain knowledge from the information
available on the web. Web mining techniques can be classified according to three categories:
web structure, web content and web usage mining [2]. These techniques are characterized
based on the data (i.e., structure and linkage) used for the mining process. Web structure
mining focuses on the structure of the links between the documents. Web content mining
extracts the content of the document as an input to the mining process. Web usage mining
uses the user’s interactions on the web as an input to the mining process. Web 2.0 has
contributed to the increase of the size of data available on the web and necessitate research
for structuring and linking data differently and more intelligently.
This paper is organized into four parts. Section II reviews the data structure and
linkage in classic web. It identifies two link patterns: among documents and between people
and documents (including service providers). Section III focuses on Web 2.0 and the
explosion of data leading to Big Data. It identifies two link patterns: linking services and
linking people. Section IV reviews the Web of Object architecture and how objects are
interlinked. Finally, section V will review Linked Data and its technology stack.

II. CLASSIC WEB (WEB 1.0)

2.1. Fundamental Architecture


Three fundamental standards comprise the infrastructure of the World Wide Web:
Uniform Resource Identifiers (URI): Globally unique identifiers for resources on the
internet [3]
Hypertext Transfer Protocol (HTTP): A standard internet protocol for accessing
resources on the internet [4].
Hypertext Markup Language (HTML): A standard language for rendering information
on the browser (content format) [5].

Figure 1. Web of documents high-level architecture

97
International Journal of Information Technology & Management Information System (IJITMIS), ISSN
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

Fig. 1 depicts an internet user is using a browser to request a static page on the internet
by specifying the protocol (HTTP) and the location (URL) for the resource. The request is
routed over the internet until it reaches the server hosting the resource. The server responds to
the user with the resource and the browser is using the HTML representation of the resource to
render the resource information.
Most of the current applications on the internet follow a three-tier architecture to
provide a dynamic and interactive user experience (see Fig. 2):
Presentation Layer: This layer is responsible for presenting information to the user and
invoking the subsequent Business Logic Layer to perform requested tasks. Users interact
with this layer directly.
Business Logic Layer (BLL): This layer contains business rules for the application and
invokes the subsuquent Data Access Layer to obtain the requested data.
Data Access Layer (DAL): This layer contains classes responsible for connecting and
executing commands on the database based on the BLL request. It is beyond the scope of
this paper to present an overview of all the various web application architectures.

Figure 2. Three-tier web application architecture

2.2. Data Structure


The standard data structure used to represent data on the internet is HTML. However,
it is only focused on the content and format of a hypertext page and does not have any
metadata that describes the content of the page.

<!DOCTYPE html>
<html>
<title>
Sample Page for Link Patterns in the World Wide Web
</title>
<body>

<p><B>This is bold text.</B></p>


<p><U>This is underlined test<U></p>

</body>
</html>
Figure 3. Sample HTML code

98
International Journal of Information Technology & Management Information System (IJITMIS), ISSN
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

2.3. Data Linkage


2.3.1. Pattern 1: Linking Documents and Anchors Within the Document
The HTML specification provides an explicit linking method (hypertext links, or
hyperlinks) that can connect the document to other documents (see Fig. 3, line 1) or link text
in the document to another section of the current document (see Fig. 3, lines 2–3). The
HTML <a> tag stands for “anchor” and its attribute href stands for “hypertext reference” [6].

1 <p><a href="http://secs.oakland.edu">Link to Oakland


University</a></P>
2 <p><a href="#TOC">Click here for TOC!</a></P>
3 <p><a name="TOC">Table of Contents</a></P>
Figure 4. Example of HTML hypertext links

2.3.2. Pattern 2: Linking People to Documents and Services


In this pattern, linkage is not explicit as in Pattern 1. Search engines were created to
fulfil that need. An online user who does not have the URL (Uniform Resource Locator) for
the document or service can use a search engine to find documents related to his or her
keywords. Search engines have rapidly become the gateway to the web and represent the
most visited sites on the internet. Search engines use web mining techniques to identify
matches based on the user’s criteria. Data is needed in order to apply these techniques [7].
The decentralized nature of the web environment means that it must be “crawled” and the
resultant information must be centralized in a search index.
In Fig. 5, a web crawler (also known as a spider or robot) selects a set of URLs from
previous crawls as a starting point, fetches information about those pages, and subsequently
follows the links on those pages to identify additional pages to index. During this process,
new information is inserted into the search index and changes are either updated or deleted.
The query interface is responsible for executing queries against the search index and
returning results to the user based on their criteria.

Figure 5. Search engine high-level architecture

99
International Journal of Information Technology & Management Information System (IJITMIS), ISSN
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

Hyperlink Induced Topic Search (HITS) was developed at the IBM Almaden
Research Center and uses the hyperlink structure of web pages to infer notions of authority.
When a page has a hyperlink to another page, it implicitly endorses it. HITS defines two
types of pages: hubs, which provide a collection of links to authorities, and authorities, which
provide the best source of information about a subject. The central concept of this algorithm
is that there is a mutually reinforcing relationship between authorities and hubs (i.e., a good
hub is a page that points to many good authorities and vice versa). However, the web does
not always conform to this model: many pages could point to other pages for reasons such as
paid advertisement and may not necessarily endorse them.
PageRank was developed at Stanford University by Larry Page and Sergey Brin, the
co-creators of Google. PageRank is similar to HITS in its method for finding authoritative
pages. The key difference is that not all links (votes) are considered equal in status. Highly
linked pages have more importance than scarcely linked pages. Backlinks (incoming links)
from high-ranked pages count more than links from lower ones. To calculate the PageRank of
a page (or node), it is mandatory to calculate the PageRank of all pages (nodes) pointing to it
[12].
There is a great interest among the research community in determining how to optimize
search engine algorithms and to improve search results by taking into account the content of a
page (such as its title, headings, and tags) as well as user behavior (such as clicks and co-
visited pages) [8–11]. The search engine must also be capable of identifying and blocking the
efforts of spammers to spuriously increase page ranking using methods such as doorway
pages (pages full of keywords related to the site), pages dedicated to creating links to a
specific target page, and cloaking (pages that deliver different content to web crawlers than
that seen by regular users).

2.4. Evaluation
The goal of the World Wide Web is to provide a decentralized and dynamic
environment for interlinked, heterogeneous documents. Linking documents is, of course, a
built-in feature of HTML. Search engines satisfy the need for linking people to documents.
Web crawlers follow hyperlinks to create search indexes and thereby centralize information
about the decentralized web using web mining algorithms.
Many challenges face search engines. Web documents are written for many different
locales in many different languages, dialects, and styles. The number of documents on the
web is always on the rise. Efforts by spammers to improve their visibility and ranking within
search results are continuously evolving. There is also no guarantee that a site will be indexed
in the crawling process by a certain time or at all (unless there are many other external links
pointing to the page). Crawlers may not be able to follow links that use Java events such as
onclick. Crawlers will also fail to index documents that are accessible only by a search box
(dynamically generated content) or that require authentication. In addition, the web is
constantly moving target for indexing: it is a dynamic environment and its content is
frequently added, updated, dynamically generated, and deleted.

3. WEB 2.0

3.1. Fundamental Architecture


The following are the key technologies underlying Web 2.0:
XHTML and CSS as presentation standards
The Document Object Model (DOM) for dynamic display

100
International Journal of Information Technology & Management Information System (IJITMIS), ISSN
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

XML and XSLT for data interchange and transformation


XMLHttpRequest for asynchronous data retrieval
Web Services:
• XML-RPC, SOAP, and REST for communication protocols
• WSDL for describing interfaces
• UDDI for discovering web services
AJAX (Asynchronous JavaScript and XML) enables applications to update portions
of a page without the need to reload the entire page and prevents the user from
having to wait for the page updates.
Smartphones for easing communication and collaboration in the Social Web

AJAX is considered an essential component of Web 2.0 applications [13], [14].

Figure 6. The building blocks of Web 2.0

Web 2.0 was not clearly defined until Tim O'Reilly prescribed the concepts and
principles of Web 2.0 [13].

3.2. Data Structure


XML is a hierarchical representation of data that enables machines and applications to
easily interchange, transform, and parse data. Other formats have also been used such as
JSON which uses key-value pairs to represent data.
XML has played a large role in the evolution of Web 2.0. Elements of XML were
combined with HTML to create XHTML in order to enrich web documents and make HTML
more accessible, machine and device independent, and well-formed. The Simple Object
Access Protocol (XML over HTTP) has also been used for web services communication. It is
also part AJAX which is a key component of Web 2.0 that enriches internet applications.
Syndication technologies such as RSS and Atom are used in Web 2.0 to notify
subscribers (users and applications) about the existence of updated content. They are also
built on XML (see Fig.s 7 and 8).

101
International Journal of Information Technology & Management Information System (IJITMIS), ISSN
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

Figure 7. An example of RSS code

Link Patterns in the World Wide Web


Paper on Link Patterns in the web
RSS Version: 2.0
Language: en-us
Classic Web: Information on classic web
Web 2.0: Information on web 2.0

Figure 8. Rendering of the RSS code presented in Fig. 7

3.3. Data Linkage

3.3.1. Pattern 3: Mashups, Orchestration, and Choreography


In this pattern, applications and web services consume and remix information from
heterogeneous sources. Sharing information between web applications and services can
enrich the user experience and contributes to a more dynamic and powerful web.
Fig. 9 illustrates a web application utilizing Yahoo’s map service to present directions
to the user and also using an RSS news feed to present news to the user related to his or her
interests, etc.

Figure 9. Illustration of a web application mashup.

102
International Journal of Information Technology & Management Information System (IJITMIS), ISSN
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

Web Services are software components accessible on the web and exposing their
operations to others for use. Linking web services is the foundation for SOA (Service Oriented
Architecture). There are two models for linking web services, as illustrated in Fig. 10. In the
orchestration model, participating web services are controlled by a central web service that
manages the other services’ engagement. Meanwhile, in the choreography model, each web
service must know when to become active and with which service it needs to interact.

Figure 10. An illustration of two models for SOA: web service orchestration and
choreography

3.3.2. Pattern 4: Linking People


With the advent of Web 2.0, people became an integral part of web applications.
Online users became contributors rather than passive consumers. Users gained the ability to
tag, comment, review, publish, vote on, and express approval for (i.e., “like”) content. They
also gained the ability to link to their friends and create a community of friends or a network
of professionals. Web 2.0 users can also contribute to the news, sending photos and
comments when news channels are not available. For these reasons, Web 2.0 has been
described as having an “Architecture of Participation” [13].
Despite this, data published on the web is still not structured in a semantic way that
represents its relationship to other entities. The architecture of data stored in the web
application must be able to accommodate a large and dynamic volume of information,
including relationships among entities. In addition, it must return information about linked
entities in real time.
Traditionally, many web applications used relational databases to manage information
[16], [17]. Relational databases rely on tables to manage similar data into rows and columns.
Rows are uniquely identified inside the table via a primary key. It is a good practice to
normalize tables to reduce data redundancy and increase data integrity. Foreign keys are used
to link tables (entities) to each other. Normally, tables are joined in the query when selecting
data that is available in multiple tables by utilizing the foreign key relationship. For
performance reasons, database architects typically denormalize tables to reduce joins and
improve performance, which subsequently causes data redundancy and may negatively affect
data integrity.
Figure 11 depicts a small entity relationship model that has information about users,
their friends, and the orders they made. Creating queries to find information from this model
can quickly become very complex. For example, take a query intended to find friends of
Tawfiq who bought the same item as he did. The more joins we have, the more performance
will suffer. It would be more challenging to find friends of a friend of a friend three or four

103
International Journal of Information Technology & Management Information System (IJITMIS), ISSN
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

levels deep who bought the same product. Relational databases have been widely used and
successfully accommodated to business needs; unfortunately, they lack efficiency in
addressing strongly interlinked datasets.

Figure 11. The entity relationship model

The objective of non-relational databases (also known as NoSQL) is to avoid the


potential processing overhead and complexity of relational databases by allowing redundancy
and by relaxing the constraints imposed on relational databases. NoSQL is more suitable and
prevalent in cloud environments because of its ability to horizontally scale (via sharding)
more than relational databases. Most NoSQL databases can be classified according to their
underlying storage mechanism: key-value (e.g., Amazon Dynamo), document (e.g.,
MongoDB, RavenDB, CouchDB), column family (e.g., Apache HBase, Google BigTable),
and graph store (e.g., neo4j, HyperGraphDB).
The first three NoSQL models listed above store data in a disconnected manner which
necessitates inserting an identifier (such as a foreign key) into the other document, value, or
column. This is not enforced by the store and could lead to dangling pointers. On the other
hand, graph databases naturally support network topologies such as social networks. Fig. 12
depicts the entity relationship in Fig. 11 using a graph model and it shows how easy it is to
follow the path to answer a query seeking friends of Tawfiq who bought the same item as he
did.

Figure 12. The property graph model

104
International Journal of Information Technology & Management Information System (IJITMIS), ISSN
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

In a graph database model, real instances are depicted but it does not contain a general
entity relationship model like relational database models do. The graph model consists of
nodes (i.e., entities), relationships (directed edges or connections that have a label and are not
generic), and properties (in the form of a key-value pair) for both nodes and relationships.
Adding properties to the relationship provides a great value by representing metadata (such as
the strength of the connection) between entities. Another advantage of the graph model is that
it is schema free. New nodes and relationships can be added without the need for migration or
downtime as is the case with relational databases.

3.4. Evaluation
Although Web 1.0 has enjoyed many successes and made access to information nearly
seamless, it has provided only read access to users and has prevented them from contributing
to its knowledge base. Researchers have tried to compensate the lack human feedback by
developing algorithms looking at links between pages, content, and user behavior.
Web 2.0 embraced the human intelligence and changed the role of online users from
passive consumers to contributors. It created an attractive, easy to use, and powerful platform
for collaboration. This platform is used not only for people to contribute to pages (with
reviews, tags, comments, and other content.) but to also link services and people.
Developers are able to quickly build powerful applications by calling ready-made web
services instead of building them from scratch. Researchers have developed various
languages (such as WSFL, WSCI, as BPEL) for web service composition to build a business
processes. Manual composition of web services is time consuming and not scalable. This has
led researchers to automate the composition process based on functionality and QoS
attributes (such as availability, reliability, security, and cost) for the service selection process
[20]–[24]. As a testament to the importance of this topic, a Google Scholar search for
“dynamic web service composition” yields more than 500,000 articles and research papers.
However, OWL-S is a service ontology that enables dynamic web service discovery,
composition, and monitoring. It has three parts: the service profile, the process model, and the
grounding [25]. A key observation is that the service profile (which includes QoS
information) is provided by the service publisher and consumers cannot contribute to it. This
is reminiscent of the read-only restriction that is characteristic of Web 1.0. Researchers have
tried to address this issue by proposing new frameworks that include the application of social
networks to web service composition [26]–[30].
Web 2.0 has also created a convenient and intelligent environment for people to
connect to each other. People are increasingly using social networks to connect with friends
and others from various backgrounds. Web applications are charged with managing
information in an optimized way and applying analytical and inferential algorithms in order
to make intelligent recommendations based on the network topology. Traditional relational
databases fall short due to their rigid constraints and their inability to scale horizontally.
However, social networks are a natural fit for the graph model. The ability to model social
network data in a graph provides a great advantage due to the several hundred years of
mathematical and scientific studies made on graphs. Breadth-first (one layer at a time) and
depth-first (one path at a time) search algorithms can be used for traversing the graph.
Dijkstra’s algorithm (using a breadth-first search) can be used to find the shortest path
between two nodes in the graph. There are also many graph theory techniques that can be
applied to the graph for analysis and inference. Triadic closure (if node A is connected to
node B and also connected to node C, there is a possibility that B has some relation to C) and

105
International Journal of Information Technology & Management Information System (IJITMIS), ISSN
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

structural balance principles can be applied to induce relationships and make


recommendations [31].
Social networks have created many opportunities for researchers. A key area of
developing research is sentiment analysis and opinion mining. Analyzing text (such as
reviews, comments, messages, etc.) to determine whether it reflects a positive or negative
attitude is a critical step in the sentiment analysis process [32], [33]. Another prominent
research area is community detection based on the analysis of linked people interests, “likes,”
and induced opinion [34]–[36].

4. FOREST: WEB OF INTERACTING OBJECTS

4.1. Fundamental Architecture


Functional Observer REST (FOREST) is a resource-oriented architecture proposed by
Duncan Cragg [37]. In this architecture, domains or applications can share information via
interlinked objects, whether locally or across the network. An object is identified by a URL
that includes the object’s globally unique ID. An object’s state is determined by its own state
and the state of other objects that it observes without the need for calling the other objects’
methods (Functional Observer Pattern). REST over HTTP (using GET for poll and POST for
push) is used for object state transfer (see Fig. 13). FOREST objects can be written in
traditional languages such as Java, C#, Python, or naturally in declarative languages (such as
Prolog or Clojure, or Erlang). It is necessary to use HTTP headers ETag (digest of the object
content) and max-age (how long to cache the object) to control the client-side cache for
future GET requests in the initial post.

Figure 13. Example of FOREST interlinked objects

4.2. Data Structure


FOREST objects can be serialized or represented in XML, XHTML, or JSON

4.3. Data Linkage

4.3.1. Pattern 5: Linking Objects


Hyperdata is data represented in objects that are linked up into a global object Web by
using URL and object unique id as a linkage method

106
International Journal of Information Technology & Management Information System (IJITMIS), ISSN
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

4.4. Evaluation
Functional Observer REST (FOREST) is a resource-oriented framework for
implementing domain and application logic by creating functional dependencies among linked
objects. An object is identified via a unique identifier and its states are evaluated according to
its current sb0tate along with the states of the objects’ that it observes. However, a given
object cannot tell by whom it is observed. The interactions and the state dependencies are
based on the application logic and are not globally realized. That is, objects are not
semantically described but work together according to the application constraints. Overall, the
framework provides interoperability (objects can be serialized to XML, JSON, or XHTML),
scalability (objects can be distributed and linked), and evolvability (observed objects’ state can
be pushed and pulled).

5. LINKED DATA- SEMANTIC WEB (WEB 3.0)

Many people use the terms Linked Data, Semantic Web, and Web 3.0
interchangeably. It is critical for our discussion to clarify the distinction among them. The
next version of the World Wide Web (Web 3.0) focuses on supplementing raw data on the
web with metadata and links to make it understandable by machines for automation,
discovery, and integration. Semantic Web employs a top-down technology stack (RDF,
OWL, SPARQL, and RIF) to support this goal. Linked Data is a bottom-up approach that
uses Semantic Web technologies to enact Web 3.0.
There are other approaches for enacting Web 3.0 without the use of Semantic Web
technologies. Microformats and microdata are examples. Microformats provide standard
class definitions to represent commonly used objects on the web in HTML (objects such as
people, blog posts, products, and reviews) . This allows web crawlers and APIs to easily
understand the content of the site. It has also rel attribute that provides a relationship meaning
for the hyperlink. For example, rel="home" in <a href="URL" rel="home">Home</a> gives
the hyperlink a meaning that the link is relative to the homepage [40].
Similarly, schema.org provides schemas (i.e., itemtype which has properties similar to
microformats classes and properties). Microdata format (such as itemscope, which includes
the itemtype and itemprop) is used to provide metadata to HTML content [42].
Linked Data is similar to Microdata and Microformats in its method for enacting Web
3.0. However, it uses a Semantic Web technology stack (i.e. OWL) for the vocabulary and
RDF for the data model. In addition, vocabulary in Linked Data (classes in microformats and
itemtype in microdata) is not limited to a certain organization for updates. Finally, another
key difference is that the described data item in microformats and microdata does not have a
unique identifier as it does in Linked Data (URI). For these reasons, our focus in this section
will be on Linked Data.

5.1. Fundamental Architecture


Uniform Resource Identifier (URI) is a globally unique identifier used as a name for the
described entity
Hypertext Transfer Protocol (HTTP) is a standard internet protocol to for accessing
resources on the internet.
Resource Description Framework (RDF) is a graph-based data model for describing
things (entities) in the form of triples: subjects (nodes), predicates (edges as a relations),
and objects (nodes can be a literal value or a referring to a subject of another triple). For
example, in Fig. 12 one of the triples in the graph has Tawfiq as the subject, ordered as a

107
International Journal of Information Technology & Management Information System (IJITMIS), ISSN
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

predicate, and order id:12 as an object. In Fig. 14, the RDF model describes two persons
(Tawfiq and Mike) and establishes a connection (“knows”) between them using FOAF
(Friend-of-A-Friend) vocabulary

Figure 14. Example of RDF Model

5.2. Data Structure


RDF can be serialized in RDF/XML, RDFa (RDF embedded in HTML), Turtle, N-
Triples, and RDF/JSON.

5.3. Data Linkage


5.3.1. Pattern 6: Linking Data
There are three types of RDF links. Vocabulary links point to the definition of the
vocabulary used to represent the data. Relationship links point to related entities in other data
sources. Identity links are pointers to other descriptions about the same entity from different
data sources. For example, if we are describing a person, the vocabulary link could use the
FOAF (i.e. OWL DL) vocabulary to describe the person and his social connections (Fig. 14,
line 4), the relationship link would be a link to his school or a friend (Fig. 14, line 11), and
the identity link can provide links to different represetations about the person using
owl#sameAs to indicate that the URIs are representing (have different views of) the same
entity from different sources (Fig. 14, line 16).

5.4. Evaluation
The intent of Web 3.0 is to make data understandable by machines to improve reuse
and integration. Linked data is based on two major Semantic Web technologies: RDF as a
graph-based data model and OWL (Web Ontology Language) as a vocabulary for describing
classes and properties including the relationships among classes, equality, cardinality, and
enumerated classes.
Linked Data has received worldwide adoption and from various domains. Linking
Open Data Cloud (LOD Cloud) group catalogs datasets that are presented on the Web as
Linked Data has became an immense repository for publishers. As of September 2011 there
were more than 290 datasets available which include more than 30 billion triples published
by various domains [43]. Search engines such as Falcons and SWSE enable users to search
for Linked Data using keywords. In addition Sindice, Swoogle, Watson, and CKAN provide
APIs for applications to look up Linked Data.

108
International Journal of Information Technology & Management Information System (IJITMIS), ISSN
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

While access to Linked Data provides great opportunities for publishers and
consumers, they are also faced with many challenges. The human interaction with Linked
Data is not as user friendly and intuitive as Web 1.0. HTML presents data in a friendly
manner for users to view, however, data formatting is missing in Linked Data as it is intended
for machines to understand. In addition, applications face many challenges in order to search,
integrate, and present the data in a unified view. There are thousands of ontologies (third
party and user-generated) used to describe the data. Data fusion requires data integration and
aggregation from different sources written in different languages thus requires data cleansing
(including deduplication and removal of irrelevant or untrustworthy data) and mapping of the
schemas used to describe the data. Many researchers have tried to address the issues of data
quality and trustworthiness by using cross-checking, voting, and machine learning techniques
[44]. Inference techniques can also improve quality by discovering new relationships and
automatically analyzing the content of data to discover inconsistencies during the integration
process.
Link maintenance presents another challenge since RDF contains links to other data
sources that could be deleted at any time, causing the links to be dangling. Frameworks to
validate links on a regular basis or use syndication technology to communicate changes are
proposed means to address this issue. Another interesting research area is related to
automatically interlinking data based on its similarity.
Another challenge in Linked Data is related to the core technology that it uses. OWL
has significant expressivity limitations. OWL 2.0 was introduced to resolve some of the
shortcomings of OWL 1.1 such as expressing qualified cardinality restrictions, using keys to
uniquely identify data as well as the partOf relations (asymmetric, reflexive, and disjoint). In
addition, OWL is written in RDF (triples) which makes it complex to express relations such
as “class X is the union of class Y and Z.” Essentially, OWL is a description language based
on first order logic and it is unable to describe integrity constraints or perform closed-world
querying [46]. The Rule Interchange Format (RIF) was introduced on top of OWL to include
rules using Logic Programming. However, a true rule specification in logic programming is
fundamentally incompatible with OWL [47].

6. CONCLUSION

The World Wide Web has been adopted by billions of users globally for its wealth of
information and its ease of use. Information on the web has been linked in many ways since
its inception to optimize automation, discovery, and reuse. The more relationships and links
that are applied to the data, the better knowledge can be induced from it. It is important to
recognize the different link patterns in the web to identify some of the opportunities and
challenges in each pattern and to be able to recommend new patterns to better serve online
users and service providers. It is evident that each iteration of the web’s evolution—from
linking documents to documents, to linking people to documents, to linking services to
services, to linking people to people, to linking objects to objects, and finally to linking data
to data—has increased the value of the network and has made it an increasingly rich and
valuable platform. These efforts and their results are inspiring researchers to work on the
challenges, recommend new patterns in the web, or to apply these patterns to real life
physical objects as we see in Internet of Things (IoT) initiative. This paper is also used as a
foundation for our research for a new web pattern to optimize the use of data on the web and
overcome some of the shortcomings in the current methods by focusing on better methods for
publishing data and improve linkage.

109
International Journal of Information Technology & Management Information System (IJITMIS), ISSN
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

TABLE I. LINK PATTERNS EVALUATION RESULTS

Link Patterns
Linking Linking People Linking Linking People Linking Linking Data
Documents to Documents services Objects
Web Version Web 1.0 Web 1.0 Web 2.0 Web 2.0 Web 2.0 Web 3.0
Link Type Explicit Implicit Implicit Implicit Explicit Explicit
User Access Read Only Read Only Read/Write Read/Write N/A Read &
& Only Application
Application Application Level
Level Level
Link Hyperlinks Search engines UDDI Request/accept a Object URI
Mechanism Search and connection GUID
auto
discovery
Impact Global Ease of access Service Social Network Global Global Data
Document to information Mashups Object Space Space
Space and service
providers
Shortcomings/ • No typed • Crawlers are • Optimize • Published data • No typed • Limited
Challenges links used to dynamic is not represented links expressivity
• No centralize and service semantically between in OWL and
collaboration index composition • Sentiment objects. RDF
capabilities. documents using analysis, opinion • Object • Link
• Not instead of semantic mining and interactions maintenance
understand- realtime or near- and social community and state • Data
able by realtime lookup. web detection. dependencies integration
machines. • Search techniques. • Optimize data are and
algorithms • Quality of management for constrained aggregation
mainly rely on Service intelligent by domain • Lack of
links which may verification recommendations and support for
be used for based on the application integrity
other purposes. network topology logic. constraints
• Web usage • No and closed-
and content metadata is world
mining are used used to querying
to optimize describe the •
results. object Automatically
• Web content interlink
mining must similar data
accommodate
different
languages and
locales.
• Spurious
efforts to
increase
visibilty (spam)
must be
blocked.
• Users are not
able to provide
feedback on
results.

110
International Journal of Information Technology & Management Information System (IJITMIS), ISSN
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

REFERENCES

[1] R. Kalakota, “Big Data Infographic and Gartner 2012 Top 10 Strategic Tech Trends,”
http://practicalanalytics.wordpress.com/2011/11/11/big-data-infographic-and-gartner-
2012-top-10-strategic-tech-trends/, Nov. 2011. Web. Nov. 2013
[2] A. Hotho and G. Stumme, “Mining the World Wide Web,” Künstliche Intelligenz,
vol. 3, pp. 5–8, 2007.
[3] T. Berners-Lee, R. Fielding, and L. Masinter, “RFC 2396 - Uniform Resource
Identifiers (URI): Generic Syntax,” http://www.isi.edu/in-notes/rfc2396.txt, Aug.
1998. Web. Nov. 2013
[4] R. Fielding, “Hypertext Transfer Protocol – http/1.1. Request for Comments: 2616,”
http://www.w3.org/Protocols/rfc2616/rfc2616.html, 1999. Web. Nov. 2013
[5] D. Raggett, A. Le Hors, and I. Jacobs, “HTML 4.01 Specification - W3C
Recommendation,” http://www.w3.org/TR/html401, 1999. Web. Nov. 2013
[6] “HTML Examples,” W3Schools,
http://www.w3schools.com/html/html_examples.asp. Web. Nov. 2013
[7] S. Brin and L. Page. “The Anatomy of a Large-Scale Hypertextual Web Search
Engine,” Computer Networks and ISDN Syst., vol. 30, pp. 107–117, 1998.
[8] G-R. Xue, H-J Zeng, Z. Chen, Y. Yu, W-Y Ma, W. Xi, and W. Fan. “Optimizing Web
Search Using Web Click-Through Data.” Proc. 13th ACM Int'l Conf. Inform. and
Knowledge Manage., pp. 118–126, 2004.
[9] S. Ding, and T. Adviser-Suel, “Index Compression and Efficient Query Processing in
Large Web Search Engines,” PhD dissertation, Polytechnic Inst. of New York Univ.,
Mar. 2013.
[10] C.N. Pushpa, S. Girish, S.K. Nitin, J. Thriveni, K.R. Venugopal, and L.M. Patnaik,
“Computing Semantic Similarity Measure Between Words Using Web Search
Engine,” Computer Sci. and Inform. Technology, pp. 135–142, 2013.
[11] L.G. Giri, P.L. Srikanth, S.H. Manjula, K.R. Venugopal, and L.M. Patnaik,
“Mathematical Model of Semantic Look: An Efficient Context Driven Search
Engine,” Int'l J. Inform. Process., vol. 7, no. 2, pp. 20–31, 2013.
[12] L. Page. “The PageRank Citation Ranking: Bringing Order to the Web.” Tech. report,
Stanford Univ., Jan. 1998.
[13] T. O'Reilly. “What is Web 2.0: Design Patterns and Business Models for the Next
Generation of Software,” Int'l J. Digital Econ., no. 65, pp. 17–37, Mar. 2007.
[14] L.D. Paulson. “Building Rich Web Applications with Ajax,” Computer, vol. 38, no.
10, pp. 14–17, 2005.
[15] “XML Essentials,” W3C, http://www.w3.org/standards/xml/core. Web. Nov. 2013
[16] P.P-S. Chen. “The Entity-Relationship Model—Toward a Unified View of Data,”
ACM Trans. Database Syst., vol. 1, no. 1, pp. 9–36, 1976.
[17] E.F. Codd. “"A Relational Model of Data for Large Shared Data Banks," Commum.,
ACM, vol. 26, no. 1, pp. 64–69, 1983.
[18] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S.
Sivasubramanian, P. Vosshall, and W. Vogels. “Dynamo: Amazon's Highly Available
Key-Value Store,” 21st ACM Symp. Operating Syst. Principles, pp. 205–220, 2007.
[19] F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T.
Chandra, A. Fikes, and R.E. Gruber. “Bigtable: A Distributed Atorage Aystem for
Atructured Sata,” ACM Trans. Computer Syst., vol. 26, no. 2, pp. , 2008.

111
International Journal of Information Technology & Management Information System (IJITMIS), ISSN
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

[20] M. Moghaddam. “An Auction-Based Approach for Composite Web Service


Selection,”Proc. 8th Int'l Workshop Eng. Service-Oriented Applicat., pp. 400–405,
2013.
[21] Z. Zheng and M.R. Lyu. “QoS-Aware Fault Tolerance for Web Services,” QoS
Management of Web Services, pp. 97–118, Berlin: Springer, 2013.
[22] W.L. Kong, Q.T. Liu, Z.K. Yang, and S.Y. Han. “Composition of Web Services
Based on Dynamic Qos,” Computer Science, vol. 39, no. 2, pp. 268–272, 2012.
[23] C-S. Wu, and I. Khoury. “Web Service Composition: From UML to Optimization.”
IEEE 5th Int'l Conf. Service Sci. and Innovation (ICSSI), pp. 139–146, 2013 .
[24] M. Moghaddam and J.G. Davis, “Service Selection in Web Service Composition: A
Comparative Review of Existing Approaches,” in Web Services Foundations, A.
Bouguettaya, Q.Z. Sheng, and F. Daniel, eds., New York: Springer, to be published,
pp. 321–346, 2014.
[25] “OWL-S: Semantic Markup for Web Services,” W3C,
http://www.w3.org/Submission/OWL-S/, 2004. Web. Nov. 2013
[26] X. Xie, B. Du, and Z. Zhang, “Semantic Service Composition Based on Social
Network,” Proc. 17th Int'l World Wide Web Conf., 2008.
[27] Q. Wu, A. Iyengar, R. Subramanian, I. Rouvellou, I. Silva-Lepe, and T. Mikalsen.
“Combining Quality of Service and Social Information for Ranking Services,” Proc.
7th Int'l Joint Conf. ICSOC-ServiceWave, 2009.
[28] M.N. Ko, G.P. Cheek, M. Shehab, R. Sandhu, “Social-Networks Connect Services,”
Computer, vol. 43, no. 8, pp. 37–43, Aug. 2010.
[29] W. Tan, J. Zhang, I. Foster, “Network Analysis of Scientific Workflows: A Gateway
to Reuse,” Computer, vol. 43, no. 9, pp. 54–61, Sep. 2010.
[30] A. Maaradji, H. Hacid, J. Daigremont, and N. Crespi. “Towards a Social Network
Based Approach for Services Composition,” Proc. 2010 IEEE Int'l Conf. on
Commun., 2010.
[31] S.A. Golder and S. Yardi. “Structural Predictors of Tie Formation in Twitter:
Transitivity and Mutuality,” Proc. IEEE 2nd Int'l Conf. Social Computing, 2010.
[32] S. Arora, E. Mayfield, C. Penstein-Rosé, and E. Nyberg. “Sentiment Classification
Using Automatically Extracted Subgraph Features,” Proc. Assoc. for Computational
Linguistics Workshop on Computational Approaches to Analysis and Generation of
Emotion in Text, pp. 131–139, 2010.
[33] I. Becker and V. Aharonson. “Last but Definitely Not Least: On the Role of the Last
Sentence in Automatic Polarity Classification,” Proc. Asoc. Computational
Linguistics 2010 Conf. Short Papers, pp. 331–335, 2010.
[34] Y. Bhawsar and G. S. Thakur. “Community Detection in Social Networking,” J
Inform. Eng. and Applicat., vol 3, no. 6, pp. 51–52, 2013.
[35] S. Fortunato. “Community Detection in Graphs,” Physics Reports, vol. 486, no. 3, pp.
75–174, 2010.
[36] C. Pizzuti. “GA-Net: A Fenetic Algorithm for Community Detection in Social
Networks,” in Parallel Problem Solving from Nature–PPSN X, Berlin: Springer, pp.
1081–1090, 2008.
[37] D. Cragg. “FOREST: An Interacting Object Web,” in REST: From Research to
Practice, New Yok: Springer, pp. 161–195, 2011.
[38] C. Bizer, T. Heath, and T. Berners-Lee. “Linked Data - The Story So Far,” Int'l J.
Semantic Web and Inform. Syst., vol. 5, no. 3, pp. 1–22, 2009.

112
International Journal of Information Technology & Management Information System (IJITMIS), ISSN
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

[39] T. Heath and C. Bizer. “Linked Data: Evolving the Web Into a Global Data Space,”
Synthesis Lectures on the Semantic Web: Theory And Technology, vol. 1, no. 1,
pp. 1–136, 2011.
[40] “What Are Microformats?,” Microformats Wiki RSS,
http://microformats.org/wiki/what-are-microformats. Web. Nov. 2013
[41] “Semantic Web,” W3C, http://www.w3.org/standards/semanticweb/. Web. Nov. 2013
[42] I. Hicks. “HTML Microdata,” W3C, http://dev.w3.org/html5/md-LC/, May 2011.
Web. Nov. 2013
[43] C. Bizer, and A. Jentzsch, “State of the LOD Cloud,”
http://lod-cloud.net/state/, Sept. 2011. Web. Nov. 2013
[44] J. Madhavan, S.R. Jeffery, S. Cohen, X.(L.) Dong, D. Ko, C. Yu, and A. Halevy,
“Web-Scale Data Integration: You Can Only Afford to Pay as You Go,” Proc. 3rd
Biennial Conf. on Innovative Data Systems Research, pp. 342–350, 2007.
[45] C. Bizer, T. Heath, and T. Berners-Lee. “Linked Data: Principles and State of the
Art,” Proc. 17th Int'l World Wide Web Conf., 2008.
[46] B. Motik, I. Horrocks, R. Rosati, and U. Sattler, “Can OWL and Logic Programming
Live Together Happily Ever After?,” Proc. 5th Int'l Semantic Web Conf., pp. 501–
514, 2006.
[47] M. Kifer, J. de Bruijn, H. Boley, and D. Fensel, “A Realistic Architecture for the
Semantic Web,” Proc. 1st Int'l Conf. on Rules and Rule Markup Languages for the
Semantic Web, pp. 17–29, 2005.
[48] Shaymaa Mohammed Jawad Kadhim and Dr. Shashank Joshi, “Agent Based Web
Service Communicating Different IS’s and Platforms”, International Journal of
Computer Engineering & Technology (IJCET), Volume 4, Issue 5, 2013, pp. 9 - 14,
ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[49] Jaydev Mishra and Sharmistha Ghosh, “Normalization in a Fuzzy Relational Database
Model”, International Journal of Computer Engineering & Technology (IJCET),
Volume 3, Issue 2, 2012, pp. 506 - 517, ISSN Print: 0976 – 6367, ISSN Online:
0976 – 6375.
[50] Sanjeev Kumar Jha, Pankaj Kumar and Dr. A.K.D.Dwivedi, “An Experimental
Analysis of MYSQL Database Server Reliability”, International Journal of Computer
Engineering & Technology (IJCET), Volume 3, Issue 2, 2012, pp. 354 - 371,
ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[51] Houda El Bouhissi, Mimoun Malki and Djamila Berramdane, “Applying Semantic
Web Services”, International Journal of Computer Engineering & Technology
(IJCET), Volume 4, Issue 2, 2013, pp. 108 - 113, ISSN Print: 0976 – 6367,
ISSN Online: 0976 – 6375.
[52] A. Suganthy, G.S.Sumithra, J.Hindusha, A.Gayathri and S.Girija, “Semantic Web
Services and its Challenges”, International Journal of Computer Engineering &
Technology (IJCET), Volume 1, Issue 2, 2010, pp. 26 - 37, ISSN Print: 0976 – 6367,
ISSN Online: 0976 – 6375.

113

You might also like