100% found this document useful (1 vote)

18 views73 pages

Advanced Database Query Systems Techniques Applications and Technologies 1st Edition Li Yan pdf download

The document is a comprehensive overview of the book 'Advanced Database Query Systems: Techniques, Applications and Technologies,' edited by Li Yan and Zongmin Ma. It discusses various methodologies and technologies related to database queries, including XML and metadata queries, and aims to provide a consolidated account of advanced database query systems. The book includes contributions from multiple authors and covers a range of topics from automatic categorization of query results to fuzzy querying capabilities in relational databases.

Uploaded by

ohmeskairog7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

18 views73 pages

Advanced Database Query Systems Techniques Applications and Technologies 1st Edition Li Yan pdf download

Uploaded by

ohmeskairog7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 73

Advanced Database Query Systems Techniques

Applications and Technologies 1st Edition Li Yan

download

https://ebookgate.com/product/advanced-database-query-systems-
techniques-applications-and-technologies-1st-edition-li-yan/

Get Instant Ebook Downloads – Browse at https://ebookgate.com

Get Your Digital Files Instantly: PDF, ePub, MOBI and More
Quick Digital Downloads: PDF, ePub, MOBI and Other Formats

Advanced Operating Systems and Kernel Applications

Techniques and Technologies 1st Edition Yair Wiseman

https://ebookgate.com/product/advanced-operating-systems-and-
kernel-applications-techniques-and-technologies-1st-edition-yair-
wiseman/

Advanced Applications and Structures in Xml Processing

Label Streams Semantics Utilization and Data Query
Technologies Premier Reference Source 1st Edition
Changqing Li
https://ebookgate.com/product/advanced-applications-and-
structures-in-xml-processing-label-streams-semantics-utilization-
and-data-query-technologies-premier-reference-source-1st-edition-
changqing-li/

Computational Models Software Engineering and Advanced

Technologies in Air Transportation Next Generation
Applications 1st Edition Li Weigang

https://ebookgate.com/product/computational-models-software-
engineering-and-advanced-technologies-in-air-transportation-next-
generation-applications-1st-edition-li-weigang/

Encyclopedia of Database Technologies and Applications

Laura C. Rivero

https://ebookgate.com/product/encyclopedia-of-database-
technologies-and-applications-laura-c-rivero/
Database Technologies Concepts Methodologies Tools and
Applications 1st Edition John Erickson

https://ebookgate.com/product/database-technologies-concepts-
methodologies-tools-and-applications-1st-edition-john-erickson/

Dried Blood Spots Applications and Techniques 1st

Edition Wenkui Li

https://ebookgate.com/product/dried-blood-spots-applications-and-
techniques-1st-edition-wenkui-li/

Advanced Water Technologies Concepts and Applications

1st Edition P.K. Tewari

https://ebookgate.com/product/advanced-water-technologies-
concepts-and-applications-1st-edition-p-k-tewari/

Database Modeling for Industrial Data Management

Emerging Technologies and Applications Zongmin Ma

https://ebookgate.com/product/database-modeling-for-industrial-
data-management-emerging-technologies-and-applications-zongmin-
ma/

Atlas of Gastrointestinal Endomicroscopy 1st Edition

Yan-Qing Li

https://ebookgate.com/product/atlas-of-gastrointestinal-
endomicroscopy-1st-edition-yan-qing-li/
Advanced Database Query
Systems:
Techniques, Applications and
Technologies
Li Yan
Northeastern University, China

Zongmin Ma
Northeastern University, China
Senior Editorial Director: Kristin Klinger
Director of Book Publications: Julia Mosemann
Editorial Director: Lindsay Johnston
Acquisitions Editor: Erika Carter
Development Editor: Michael Killian
Production Coordinator: Jamie Snavely
Typesetters: Michael Brehm, and Milan Vracarich Jr.
Cover Design: Nick Newcomer

Published in the United States of America by

Information Science Reference (an imprint of IGI Global)
701 E. Chocolate Avenue
Hershey PA 17033
Tel: 717-533-8845
Fax: 717-533-8661
E-mail: [email protected]
Web site: http://www.igi-global.com/reference

Copyright © 2011 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in
any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher.
Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or
companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.

Library of Congress Cataloging-in-Publication Data

Advanced database query systems : techniques, applications and technologies /

Li Yan and Zongmin Ma, editors.
p. cm.
Includes bibliographical references and index.
Summary: “This book focuses on technologies and methodologies of database
queries, XML and metadata queries, and applications of database query systems,
aiming at providing a single account of technologies and practices in advanced
database query systems”--Provided by publisher.
ISBN 978-1-60960-475-2 (hardcover) -- ISBN 978-1-60960-476-9 (ebook) 1.
Databases. 2. Query languages (Computer science) 3. Querying (Computer
science) I. Yan, Li, 1964- II. Ma, Zongmin, 1965-
QA76.9.D32A39 2011
005.74--dc22
2010054423

British Cataloguing in Publication Data

A Cataloguing in Publication record for this book is available from the British Library.

All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the
authors, but not necessarily of the publisher.
Editorial Advisory Board
Reda Alhajj, University of Calgary, Canada
Gloria Bordogna, CNR-IDPA, Dalmine (Bg), Italy
Alfredo Cuzzocrea, University of Calabria, Italy
Guy De Tré, Ghent University, Belgium
Janusz Kacprzyk, Polish Academy of Sciences, Poland
Xiangfu Meng, Liaoning Technical University, China
Tadeusz Pankowski, Poznan University of Technology, Poland
Slobodan Ribarić, University of Zagreb, Croatia
Leonid Tineo, Universidad Simón Bolívar, Venezuela
Sławomir Zadrożny, Polish Academy of Sciences, Poland
Table of Contents

Preface . ................................................................................................................................................ xii

Acknowledgment................................................................................................................................ xvii

Section 1

Chapter 1
Automatic Categorization of Web Database Query Results.................................................................... 1
Xiangfu Meng, Liaoning Technical University, China
Li Yan, Northeastern University, China
Z. M. Ma, Northeastern University, China

Chapter 2
Practical Approaches to the Many-Answer Problem............................................................................. 28
Mounir Bechchi, LINA-University of Nantes, France
Guillaume Raschia, LINA-University of Nantes, France
Noureddine Mouaddib, LINA-University of Nantes, Morocco

Chapter 3
Concept-Oriented Query Language for Data Modeling and Analysis................................................... 85
Alexandr Savinov, SAP Research Center Dresden, Germany

Chapter 4
Evaluating Top-k Skyline Queries Efficiently..................................................................................... 102
Marlene Goncalves, Universidad Simón Bolívar, Venezuela
María Esther Vidal, Universidad Simón Bolívar, Venezuela

Chapter 5
Remarks on a Fuzzy Approach to Flexible Database Querying, its Extension and Relation
to Data Mining and Summarization..................................................................................................... 118
Janusz Kacprzyk, Polish Academy of Sciences, Poland
Guy De Tré, Ghent University, Belgium
Sławomir Zadrożny, Polish Academy of Sciences, Poland
Chapter 6
Flexible Querying of Imperfect Temporal Metadata in Spatial Data Infrastructures.......................... 140
Gloria Bordogna, CNR-IDPA, Italy
Francesco Bucci, CNR-IREA, Italy
Paola Carrara, CNR-IREA, Italy
Monica Pepe, CNR-IREA, Italy
Anna Rampini, CNR-IREA, Italy

Chapter 7
Fuzzy Querying Capability at Core of a RDBMS............................................................................... 160
Ana Aguilera, Universidad de Carabobo, Venezuela
José Tomás Cadenas, Universidad Simón Bolívar, Venezuela
Leonid Tineo, Universidad Simón Bolívar, Venezuela

Chapter 8
An Extended Relational Model & SQL for Fuzzy Multidatabases..................................................... 185
Awadhesh Kumar Sharma, M.M.M. Engg College, India
A. Goswami, IIT Kharagpur, India
D. K. Gupta, IIT Kharagpur, India

Section 2

Chapter 9
Pattern-Based Schema Mapping and Query Answering in Peer-to-Peer XML
Data Integration System....................................................................................................................... 221
Tadeusz Pankowski, Poznan University of Technology, Poland

Chapter 10
Deciding Query Entailment in Fuzzy OWL Lite Ontologies.............................................................. 247
Jingwei Cheng, Northeastern University, China
Z. M. Ma, Northeastern University, China
Li Yan, Northeastern University, China

Chapter 11
Relational Techniques for Storing and Querying RDF Data: An Overview........................................ 269
Sherif Sakr, University of New South Wales, Australia
Ghazi Al-Naymat, University of New South Wales, Australia
Section 3

Chapter 12
Making Query Coding in SQL Easier by Implementing the SQL Divide Keyword:
An Experimental Query Rewriter in Java............................................................................................ 287
Eric Draken, University of Calgary, Canada
Shang Gao, University of Calgary, Canada
Reda Alhajj, University of Calgary, Canada & Global University, Lebanon

Chapter 13
Querying Graph Databases: An Overview........................................................................................... 304
Sherif Sakr, University of New South Wales, Australia
Ghazi Al-Naymat, University of New South Wales, Australia

Chapter 14
Querying Multimedia Data by Similarity in Relational DBMS.......................................................... 323
Maria Camila Nardini Barioni, Federal University of ABC, Brazil
Daniel dos Santos Kaster, University of Londrina, Brazil
Humberto Luiz Razente, Federal University of ABC, Brazil
Agma Juci Machado Traina, University of São Paulo at São Carlos, Brazil
Caetano Traina Júnior, University of São Paulo at São Carlos, Brazil

Compilation of References ............................................................................................................... 360

About the Contributors .................................................................................................................... 378

Index.................................................................................................................................................... 386
Detailed Table of Contents

Preface . ................................................................................................................................................ xii

Acknowledgment................................................................................................................................ xvii

Section 1

This chapter proposes a novel categorization approach which consists of two steps. The first step ana-
lyzes query history of all users in the system offline and generates a set of clusters over the tuples,
where each cluster represents one type of user preference. When a user issues a query, the second step
presents to the user a category tree over the clusters generated in the first step such that the user can
easily select the subset of query results matching his needs. The chapter develops heuristic algorithms
to compute the min-cost categorization. The efficiency and effectiveness of the proposed approach are
demonstrated by experimental results.

This chapter reviews and discusses several research efforts that have attempted to provide users with
effective and efficient ways to access databases. The focus is on a simple but useful strategy for retriev-
ing relevant answers accurately and quickly without being distracted by irrelevant ones. The chapter
presents a very recent but promising approach to quickly provide users with structured and approximate
representations of users’ query results, a must have for decision support systems. The underlying algo-
rithm operates on pre-computed knowledge-based summaries of the queried data, instead of raw data
themselves.
Chapter 3
Concept-Oriented Query Language for Data Modeling and Analysis................................................... 85
Alexandr Savinov, SAP Research Center Dresden, Germany

This chapter describes a novel query language, called the concept-oriented query language, and dem-
onstrates how it can be used for data modeling and analysis. The query language is based on a novel
construct, called concept, and two relations between concepts, inclusion and partial order. Concepts
generalize conventional classes and are used for describing domain-specific identities. Inclusion rela-
tion generalized inheritance and is used for describing hierarchical address spaces. Partial order among
concepts is used to define two main operations: projection and de-projection. The chapter demonstrates
how these constructs are used to solve typical tasks in data modeling and analysis such as logical navi-
gation, multidimensional analysis and inference.

This chapter describes existing solutions and proposes to use the TKSI algorithm for the Top-k Skyline
problem. TKSI reduces the search space by computing only a subset of the Skyline that is required to
produce the top-k objects. In addition, the Skyline Frequency Metric is implemented to discriminate
among the Skyline objects those that best meet the multidimensional criteria. The chapter empirically
studies the quality of TKSI, and the experimental results show the TKSI may be able to speed up the
computation of the Top-k Skyline in at least 50% percent with regards to the state-of-the-art solutions.

This chapter is meant to revive the line of research in flexible querying languages based on the use of
fuzzy logic. Details of a basic technique of flexible fuzzy querying are recalled and some newest devel-
opments in this area are discussed. Moreover, it is shown how other relevant tasks may be implemented
in the framework of such queries interface. In particular, the chapter considers fuzzy queries with lin-
guistic quantifiers and shows their intrinsic relation with linguistic data summarization. Moreover, so
called bipolar queries are mentioned and advocated as a next relevant breakthrough in flexible querying
based on fuzzy logic and possibility theory.
Chapter 6
Flexible Querying of Imperfect Temporal Metadata in Spatial Data Infrastructures.......................... 140
Gloria Bordogna, CNR-IDPA, Italy
Francesco Bucci, CNR-IREA, Italy
Paola Carrara, CNR-IREA, Italy
Monica Pepe, CNR-IREA, Italy
Anna Rampini, CNR-IREA, Italy

This chapter discusses the limitations of current temporal metadata in discovery services of spatial data
infrastructures (SDIs) and proposes some solutions. The proposal of a formal and operational method
is presented to represent imperfect temporal metadata values and allow users to express flexible search
conditions, i.e. tolerant to under-satisfaction. In doing so, discovery services can apply partial matching
mechanisms between the “desired” metadata, expressed by the user, and the archived metadata: this
would allow retrieving geodata in decreasing order of relevance to the user needs, as it usually occurs
on the Web when using search engines. Finally, the chapter illustrates the proposal with an example.

This chapter concentrates on incorporating the fuzzy capabilities to a relational database management
system (RDBMS) of open source. The fuzzy capabilities include connectors, modifiers, comparators,
quantifiers and queries. The extensions consider a more flexible DDL and DML languages. The aim
is to show the design and implementation details in the RDBMS PostgreSQL. For this, a fuzzy query
processor and fuzzy access mechanism are designed and implemented. The physical fuzzy relational
operators are also defined and implemented. The flow of a fuzzy query through the different modules
(parser, planner, optimizer and executor) is shown. The chapter includes some experimental results to
demonstrate the performance of the proposal solution. These results show that the extensions do not
decrease the performance of the RDBMS.

This chapter investigates the problems in integration of fuzzy relational databases and extends the rela-
tional data model to support fuzzy multidatabases of type-2 that contain integrated fuzzy relational da-
tabases. The extended model named fuzzy tuple source (FTS) relational data model is provided with a
set of FTS relational operations to manipulate the global relations called FTS relations from such fuzzy
multidatabases. The chapter proposes and implements a full set of FTS relational algebraic operations
capable of manipulating an extensive set of fuzzy relational multidatabases of type-2 that include fuzzy
data values in their instances. To facilitate formulation of global fuzzy query over FTS relations in such
fuzzy multidatabases, an appropriate extension to SQL is done so as to get fuzzy tuple source structured
query language (FTS-SQL).

Section 2

This chapter discusses a method for schema mapping and query reformulation in a P2P XML data
integration system. The discussed formal approach enables us to specify schemas, schema constraints,
schema mappings, and queries in a uniform and precise way. Based on this approach, the chapter de-
fines some basic operations used for query reformulation and data merging, and proposes algorithms for
automatic generation of XQuery programs performing these operations in real. Some issues concerning
query propagation strategies and merging modes are discussed, when missing data is to be discovered
in the P2P integration processes. The approach is implemented in 6P2P system. Its general architecture
is presented and the way how queries and answers are sent across the P2P environment is sketched.

This chapter focuses on fuzzy (threshold) conjunctive queries over knowledge bases encoding in fuzzy
DL SHIF(D), the logic counterpart of fuzzy OWL Lite language. The decidability of fuzzy query entail-
ment in this setting is shown by providing a corresponding tableau-based algorithm. It is also shown
that the data complexity for answering fuzzy conjunctive queries in fuzzy SHIF(D) is in coNP, as long
as only simple roles occur in the query. Regarding combined complexity, the chapter proves a co3NEx-
pTime upper bound in the size of the knowledge base and the query.

This chapter concentrates on using relational query processors to store and query RDF data. An over-
view of the different approaches is given and these approaches are classified according to the storage
and query evaluation strategies.
Section 3

This chapter intends to provide SQL expression equivalent to explicit relational algebra division (with
static divisor). The goal is to implement a SQL query rewriter in Java which takes as input a divide
grammar and rewrites it to an efficient query using current SQL keywords.

This chapter provides an overview of different techniques for indexing and querying graph databases.
An overview of several proposals of graph query lanÂ�guage is also given and a set of guidelines for fu-
ture research directions is provided.

This chapter presents an already validated strategy that adds similarity queries to SQL, supporting a
powerful set of similarity operators. The chapter also describes techniques to store and retrieve multi-
media objects in an efficient way and show existing DBMS alternatives to execute similarity queries
over multimedia data.

Compilation of References ............................................................................................................... 360

About the Contributors .................................................................................................................... 378

Index.................................................................................................................................................... 386
xii

Preface

Databases are designed to support the data storage, processing, and retrieval activities related to data
management. The wide usage of databases in various applications has resulted in an enormous wealth
of data, which populate various types of databases around the worlds. Ones can find many types of
database systems, for example, relational databases, object-oriented databases, object-relational data-
bases, deductive databases, parallel databases, distributed databases, multidatabase systems, Web data-
bases, XML databases, multimedia databases, temporal/spatial databases, spatiotemporal databases, and
uncertain databases. As a result, databases have become the repositories of large volumes of data.
Database query is closely related to data management. Database query processing is such a procedure
that database management systems (DBMSs) obtain the information needed by the users from the data-
bases according to users’ requirements, and then provides them to the users after this useful information
is organized. It is very critical to deal with the enormity and retrieve the worthwhile information for
effective problem solving and decision making. It is especially true when a variety of database types,
data types, and users’ requirements, as well as large volumes of data, are available. The techniques of
database queries are challenging today’s database systems and promoting their evolvement. There is no
doubt that database query systems play an important role in data management, and data management
requires database query support.
The research and development of information queries over a variety of databases are receiving increas-
ing attention. By means of query technology, large volumes of information in databases can be retrieved,
and Information Systems are hereby built based on databases to support various problem solving and
decision making. So database queries are the fields which must be investigated by academic researchers
together with developers and users both from database and industry areas.
This book focuses on the following issues of advanced database query systems: the technologies
and methodologies of database queries, XML and metadata queries, and applications of database query
systems, aiming at providing a single account of technologies and practices in advanced database query
systems. The objective of the book is to provide the state of the art information to academics, researchers
and industry practitioners who are involved or interested in the study, use, design, and development of
advanced and emerging database queries with ultimate aim to empower individuals and organizations
in building competencies for exploiting the opportunities of the data and knowledge society. This book
presents the latest research and application results in advanced database query systems. The different
chapters in the book have been contributed by different authors and provide possible solutions for the
different types of technological problems concerning database queries.
This book, which consists of fourteen chapters, is organized into three major sections. The first sec-
tion discusses the technologies and methodologies of database queries, over the first eight chapters. The
xiii

next three chapters covering XML and metadata queries comprise the second section. The third section,
containing the final three chapters, focuses on the design and applications of database query systems.
First of all, we take a look at the issues of the technologies and methodologies of database queries.
Web database queries are often exploratory. The users often find that their queries return too many
answers and many of them may be irrelevant. Based on different kinds of user preferences, Xiangfu
Meng, Li Yan and Z. M. Ma propose a novel categorization approach which consists of two steps. The
first step analyzes query history of all users in the system offline and generates a set of clusters over
the tuples, where each cluster represents one type of user preference. When a user issues a query, the
second step presents to the user a category tree over the clusters generated in the first step such that
the user can easily select the subset of query results matching his needs. The problem of constructing
a category tree is a cost optimization problem and the authors develop heuristic algorithms to compute
the min-cost categorization. The efficiency and effectiveness of their approach are demonstrated by
experimental results.
Database systems are increasingly used for interactive and exploratory data retrieval. In such retriev-
als, user queries often result in too many answers, so users waste significant time and efforts sifting
and sorting through these answers to find the relevant ones. Mounir Bechchi, Guillaume Raschia and
Noureddine Mouaddib first review and discuss several research efforts that have attempted to provide
users with effective and efficient ways to access databases. Then, they focus on a simple but useful
strategy for retrieving relevant answers accurately and quickly without being distracted by irrelevant
ones. They present a very recent but promising approach to quickly provide users with structured and
approximate representations of users’ query results, a must have for decision support systems. The un-
derlying algorithm operates on pre-computed knowledge-based summaries of the queried data, instead
of raw data themselves. Thus, this first-citizen data structure is also presented.
Alexandr Savinov describes a novel query language, called the concept-oriented query language
(COQL), and demonstrates how it can be used for data modeling and analysis. The query language is
based on a novel construct, called concept, and two relations between concepts, inclusion and partial
order. Concepts generalize conventional classes and are used for describing domain-specific identities.
This includes relation generalized inheritance and is used for describing hierarchical address spaces.
Partial order among concepts is used to define two main operations: projection and de-projection. Sa-
vinov demonstrates how these constructs are used to solve typical tasks in data modeling and analysis
such as logical navigation, multidimensional analysis, and inference.
Criteria that induce a Skyline naturally represent user’s preference conditions useful to discard ir-
relevant data in large datasets. However, in the presence of high-dimensional Skyline spaces, the size
of the Skyline can still be very large. To identify the best k points among the Skyline, the Top-k Skyline
approach has been proposed. Marlene Goncalves and María-Esther Vidal describe existing solutions
and propose to use the TKSI algorithm for the Top-k Skyline problem. TKSI reduces the search space
by computing only a subset of the Skyline that is required to produce the top-k objects. In addition, the
Skyline Frequency Metric is implemented to discriminate among the Skyline objects those that best
meet the multidimensional criteria. They empirically study the quality of TKSI, and their experimental
results show the TKSI may be able to speed up the computation of the Top-k Skyline in at least 50%
percent with regards to the state-of-the-art solutions.
Janusz Kacprzyk, Guy De Tré, and Sławomir Zadrożny briefly present the concept of, a rationale for
and various approaches to the use of fuzzy logic in flexible querying. They discuss first some historical
developments, and then the main issues related to fuzzy querying. Next, they concentrate on fuzzy queries
xiv

with linguistic quantifiers, and discuss in more detail their FQUERY for Access fuzzy querying system.
They indicate not only the straightforward power of that fuzzy querying system but its great potential
as a tool to implement linguistic data summaries that may provide an ultimately human consistent way
of data mining and data summarization. Also, they briefly mention the concept of bipolar queries that
may reflect positive and negative preferences of the user, and may be a breakthrough in fuzzy querying.
In the context of fuzzy querying and linguistic summarization they mention a considerable potential of
their new recent proposals to explicitly use in linguistic data summarization some elements of natural
language generation (NLG), and some natural language generation related elements of Halliday’s sys-
temic functional linguistics (SFL). They argue that this may be a promising direction for future research.
Gloria Bordogna et al. discuss the limitations of current temporal metadata in discovery services of
Spatial Data Infrastructures (SDIs) and propose some solutions. They present their proposal of a formal
and operational method to represent imperfect temporal metadata values and allow users to express
flexible search conditions, i.e. tolerant to under-satisfaction. In doing so, discovery services can apply
partial matching mechanisms between the “desired” metadata, expressed by the user, and the archived
metadata: this would allow retrieving geodata in decreasing order of relevance to the user needs, as it
usually occurs on the Web when using search engines. The proposal is finally illustrated with an example.
Ana Aguilera, José Tomás Cadenas and Leonid Tineo concentrate on incorporating the fuzzy capa-
bilities to a relational database management system (RDBMS) of open source. The fuzzy capabilities
include connectors, modifiers, comparators, quantifiers, and queries. The extensions consider a more
flexible DDL and DML languages. The aim is to show the design and implementation details in the
RDBMS PostgreSQL. For this, they design and implement a fuzzy query processor and fuzzy access
mechanism. Also, they define and implement the physical fuzzy relational operators. They show the flow
of a fuzzy query through the different modules (parser, planner, optimizer, and executor). They include
some experimental results to demonstrate the performance of the proposal solution. These results show
that the extensions do not decrease the performance of the RDBMS.
Awadhesh Kumar Sharma, A. Goswami, and D.K. Gupta investigate the problems in integration
of fuzzy relational databases and extend the relational data model to support fuzzy multidatabases of
type-2 that contain integrated fuzzy relational databases. The extended model is given the name fuzzy
tuple source (FTS) relational data model which is provided with a set of FTS relational operations to
manipulate the global relations called FTS relations from such fuzzy multidatabases. They propose and
implement a full set of FTS relational algebraic operations capable of manipulating an extensive set of
fuzzy relational multidatabases of type-2 that include fuzzy data values in their instances. To facilitate
formulation of global fuzzy query over FTS relations in such fuzzy multidatabases, an appropriate
extension to SQL can be done so as to get fuzzy tuple source structured query language (FTS-SQL).
The second section deals with the issues of XML and metadata queries.
Tadeusz Pankowski addresses the problem of data integration in a P2P environment, where each peer
stores schema of its local data, mappings between the schemas, and some schema constraints. The goal
of the integration is to answer queries formulated against a chosen peer. The answer must consist of
data stored in the queried peer as well as data of its direct and indirect partners. Pankowski focuses on
defining and using mappings, schema constraints, query propagation across the P2P system, and query
answering in such scenario. Schemas, mappings, constraints (functional dependencies) and queries are
all expressed using a unified approach based on tree-pattern formulas. He discusses how functional de-
pendencies can be exploited to increase information content of answers (by discovering missing values)
xv

and to control merging operations and propagation strategies. He proposes algorithms for translating
high-level specifications of mappings and queries into XQuery programs, and shows how the discussed
method has been implemented in SixP2P (or 6P2P) system.
Significant research efforts in the Semantic Web community have recently been directed toward the
representation and reasoning with fuzzy ontologies. Description logics (DLs) are the logical foundations
of standard Web ontology languages. Conjunctive queries are deemed as an expressive reasoning service
for DLs. Jingwei Cheng, Z. M. Ma, and Li Yan focus on fuzzy (threshold) conjunctive queries over
knowledge bases encoding in fuzzy DL SHIF(D), the logic counterpart of fuzzy OWL Lite language. They
show decidability of fuzzy query entailment in this setting by providing a corresponding tableau-based
algorithm. Also they show data complexity for answering fuzzy conjunctive queries in fuzzy SHIF(D)
is in coNP, as long as only simple roles occur in the query. Regarding combined complexity, they prove
a co3NExpTime upper bound in the size of the knowledge base and the query.
The Resource Description Framework (RDF) is a flexible model for representing information about
resources in the Web. With the increasing amount of RDF data which is becoming available, efficient
and scalable management of RDF data has become a fundamental challenge to achieve the Semantic
Web vision. The RDF model has attracted attentions in the database community, and many researchers
have proposed different solutions to store and query RDF data efficiently. Sherif Sakr and Ghazi Al-
Naymat concentrate on using relational query processors to store and query RDF data. They give an
overview of the different approaches and classify these approaches according to the storage and query
evaluation strategies.
In the third section, we see the design and application aspects of database query systems.
Relational Algebra (RA) and structured query language (SQL) are supposed to have a bijective re-
lationship by having the same expressive power. That is, each operation in SQL can be mapped to one
RA equivalent and vice versa. RA has an explicit relational division symbol (÷) whereas SQL does not
have a corresponding explicit division keyword. Division is implemented using a combination of four
core operations, namely cross product, difference, selection, and projection. The work described by
Eric Draken, Shang Gao, and Reda Alhajj is intended to provide SQL expression equivalent to explicit
relational algebra division (with static divisor). The goal is to implement a SQL query rewriter in Java
which takes as input a divide grammar and rewrites it to an efficient query using current SQL keywords.
The developed approach could be adapted as front-end or as a wrapper to existing SQL query system.
Recently, there has been a lot of interest in the application of graphs in different domains. Graphs
have been widely used for data modeling in different application domains such as: chemical compounds,
protein networks, social networks, and Semantic Web. Given a query graph, the task of retrieving related
graphs as a result of the query from a large graph database is a key issue in any graph-based application.
This has raised a crucial need for efficient graph indexing and querying techniques. Sherif Sakr and
Ghazi Al-Naymat provide an overview of different techniques for indexing and querying graph databases.
They also give an overview of several proposals of graph query language. Finally, they provide a set of
guidelines for future research directions.
Multimedia objects–such as images, audio, and video–do not present the total ordering relationship,
so the relational operators are not suitable to compare them. Therefore, similarity queries are the most
useful, and often the only types of queries adequate to search multimedia objects stored in a database.
Unfortunately, the ubiquitous query language SQL–the most widely employed language in Database
Management Systems (DBMS)–does not provide effective support for similarity queries. Maria Camila
xvi

Nardini Barioni et al. present an already validated strategy that adds similarity queries to SQL, supporting
a powerful set of similarity operators. They also describe techniques to store and retrieve multimedia
objects in an efficient way and show existing DBMS alternatives to executing similarity queries over
multimedia data.

Li Yan
Northeastern University, China

Zongmin Ma
Northeastern University, China
xvii

Acknowledgment

The editors wish to thank all of the authors for their insights and excellent contributions to this book and
would like to acknowledge the help of all involved in the collation and review process of the book,
without whose support, the project could not have been satisfactorily completed. Most of the authors of
chapters included in this book also served as referees for chapters written by other authors. Thanks go
to all those who provided constructive and comprehensive reviews.
A further special note of thanks goes to all the staff at IGI Global, whose contributions throughout
the whole process from inception of the initial idea to final publication have been invaluable. Special
thanks also go to the publishing team at IGI Global. This book would not have been possible without
the ongoing professional support from IGI Global.
The idea of editing this volume stems from the initial research work that the editors did in past sev-
eral years. The research work of the editors was supported by the National Natural Science Foundation
of China (60873010 and 61073139), the Fundamental Research Funds for the Central Universities
(N090504005, N100604017 and N090604012), and the Program for New Century Excellent Talents in
University (NCET- 05-0288).

Li Yan
Northeastern University, China

Zongmin Ma
Northeastern University, China

June 2010
Section 1
1

Chapter 1
Automatic Categorization of
Web Database Query Results
Xiangfu Meng
Liaoning Technical University, China

Li Yan
Northeastern University, China

Z. M. Ma
Northeastern University, China

ABSTRACT
Web database queries are often exploratory. The users often find that their queries return too many
answers and many of them may be irrelevant. Based on different kinds of user preferences, this chapter
proposes a novel categorization approach which consists of two steps. The first step analyzes query his-
tory of all users in the system offline and generates a set of clusters over the tuples, where each cluster
represents one type of user preference. When a user issues a query, the second step presents to the user
a category tree over the clusters generated in the first step such that the user can easily select the subset
of query results matching his needs. The problem of constructing a category tree is a cost optimization
problem and heuristic algorithms were developed to compute the min-cost categorization. The efficiency
and effectiveness of our approach are demonstrated by experimental results.

INTRODUCTION

As internet becomes ubiquitous, many people are searching their favorite cars, houses, stocks, etc. over
the Web databases. However, Web database queries are often exploratory. The users often find that
their queries return too many answers, which are commonly referred to as “information overload”. For

DOI: 10.4018/978-1-60960-475-2.ch001

Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Automatic Categorization of Web Database Query Results

Figure 1. Tree generated by the Greedy method

example, when a user submits a query to MSN House&Home Web site to search for a house located in
Seattle with a price between $200,000 and $300,000, 1,256 tuples are returned. Information overload
makes it hard for the user to separate the interesting items from the uninteresting ones, and thereby lead
to a huge wastage of user’s time and effort. In such a situation, the user would pose a broad query in the
beginning to avoid exclusion of potentially interesting results, and then iteratively refine their queries
until a few answers matching their preferences are returned. However, this iterative procedure is time-
consuming and many users will give up before they reach the final stage.
In order to resolve the problem of “information overload”, two types of solutions have been proposed.
The first type categorizes the query results into a category tree (Chakrabarti, Chaudhuri & Hwang, 2004;
Chen & Li, 2007), and second type ranks the results (Agrawal, Chaudhuri, Das & Gionis, 2003; Agrawal,
Rantzau &Terzi, 2006; Bruno, Gravano & Marian, 2002; Chaudhuri, Das, Hristidis & Weikum, 2004;
Das, Hristidis, Kapoor & Sudarshan, 2006). The success of both approaches depends on the utilization
of user preferences. But these approaches always assume that all users have the same user preferences,
but in real life different users often have different preferences. Let us look at the following example.

Example 1. Consider a real estate searching Web site. Figure 1 and Figure 2 respectively show a fraction
of category trees generated by using the methods of Greedy (Chakrabarti, Chaudhuri & Hwang, 2004)
and C4.5-Categorization (Chen & Li, 2007) over 214 houses returned by a query with the condition
“Price between 250000 and 350000 ∧ City = Seattle”. Each of tree nodes specifies the range or equality
conditions on an attribute, and the number in the parentless is the number of tuples satisfying all condi-
tions from the root to the current node. Users can use this tree to select the houses they are interested in.

Consider three users U1, U2, and U3. Assume that U1 prefers houses with large square, U2 prefers
houses with water views, and U3 prefers both water views and Burien living area. The Greedy method
assumed that all users have the same preferences. As a result, attributes “Livingarea” and “Schooldistrict”
are placed at the first two levels of the tree because more users are concerned with “Livingarea” and
“Schooldistrict” than other attributes. However, there may be some users (such as U2 and U3) who want
to first visit the large square and water view houses. Then they have to visit many nodes if they go along
with the tree built in Figure 1. Considering the diversity of user preferences and the cost of both visiting
intermediate nodes and leaf nodes, the C4.5-Categorization method took advantage of C4.5 algorithm

2
Automatic Categorization of Web Database Query Results

Figure 2. Tree generated by the C4.5-Categorization method

to create the navigational tree. But the created category tree (Figure 2) has two drawbacks: (i) the tuples
under the intermediate nodes cannot be explored by the users, i.e., users can only access the tuples under
the leaf nodes but cannot examine the tuples in the intermediate nodes; (ii) the cost of visiting the tuples
of intermediate node is not considered if the user choose to explore the tuples of intermediate node.
User preferences are often difficult to obtain because users do not want to spend extra efforts to specify
their preferences, thus there are two major challenges to address the diversity issue of user preferences:
(i) how to summarize different kinds of user preferences from the behavior of all users already in the
system, and (ii) how to categorize or rank the query results according to the specific user preferences.
Query history has been widely applied to infer the preferences of all users in the system (Agrawal,
Chaudhuri, Das & Gionis, 2003; Chaudhuri, Das, Hristidis & Weikum, 2004; Chakrabarti, Chaudhuri
& Hwang, 2004; Das, Hristidis, Kapoor & Sudarshan, 2006).
In this chapter, we present techniques to automatically categorize the results of user queries on Web
databases in order to reduce information overload. We propose a two-step approach to address both
challenges for the categorization case. The first step analyzes query history of all users already in the
system offline and then generates a set of clusters over the data. Each cluster corresponds to one type of
user preferences and is associated with a probability that users may be interested in the cluster. Assume
that an individual user’s preference can be represented as a subset of these clusters. When a specific user
submits a query, the second step first compute the similarity between the query and the representative
queries in the query clusters, and then the data clusters the user may be interested in can be inferred by
the query. Next, the set of data clusters generated in the first step is intersected with the query answers
and then a labeled hierarchical category structure is generated automatically based on the contents of the
tuples in the answer set. Consequently, a category tree is automatically constructed over these intersected
clusters on the fly. This tree is finally presented to the user.
This chapter presents a domain-independent approach to addressing the information overload problem.
The contributions are summarized as follows:

3
Automatic Categorization of Web Database Query Results

• We propose a clustering approach to cluster queries and summarize preferences of all users in the
system using the query history. This approach uses query pruning, pre-computation and query
clustering to deal with large query histories and large data sets.
• We propose a cost-based algorithm to construct a category tree over these clusters pre-formulated
in the offline processing phrase. Unlike the existing categorization and decision tree construction
approaches, our approach shows tuples for intermediate nodes and considers the cost for users to
visit both intermediate nodes and leaves.

The rest of this chapter is organized as follows. Section 2 reviews some related work. Section 3
formally defines some notions. Section 4 describes the queries and tuples clustering method. Section 5
proposes the algorithm for the category tree construction. Section 6 shows the experimental results. The
chapter is concluded in Section 7.

RELATED WORK

Two kinds of automatic categorization approaches have been proposed by Chakrabarti et. al. (Chakrab-
arti, Chaudhuri & Hwang, 2004) and Chen et. al. (Chen & Li, 2007), respectively. Chakrabarti et.al.
proposed a greedy algorithm to construct a category tree. This algorithm uses query history of all users
in the system to infer an overall user preference as the probabilities that users are interested in each at-
tribute. Taking advantage of C4.5 decision tree constructing algorithm, Chen (Chen & Li, 2007) proposed
a two-step solution which first clusters user query history and then constructs the navigated tree for
resolving the user’s personalized query. We make use of some of these ideas, but enhance the category
tree with the feature of showing tuples in the intermediate nodes and focus on how the clusters of query
history and the cost of visiting both intermediate nodes and leaves have impact on the categorization.
For providing query personalization, several approaches have been proposed to define a user profile
for each user and use the profile to decide his preferences (Koutrika &Ioannidis, 2004; Kieβling, 2002).
As Chen (Chen & Li, 2007) pointed out that, however, in real life, user profiles may not be available
because users do not want to or cannot specify their preferences (if they can, they can form the appropri-
ate query and there is no need for either ranking or categorizing). The profile may be derived from the
query history of a certain user, but this method does not work if the user is new to the system, which is
exactly true when the user needs help.
There has been a rich body of work on categorizing text documents (Dhillon, Mallela, & Kumar,
2002; Joachims, 1998; Koller,& Sahami, 1997) and Web search results (Liu, Yu & Meng, 2002; Zeng,
He, Chen, Ma & Ma, 2004). But categorizing relational data presents unique challenges and opportuni-
ties. First, relational data contains numerical values while text categorization methods treat documents
as bags of words. This chapter tries to minimize the overhead for users to navigate the generated tree
(it will be defined in Section 3), which is not considered in the existing text categorization methods.
Also there has been a rich body of work on information visualization techniques (Card, MacKinlay
& Shneiderman, 1999). Two popular techniques are dynamic query slider (Ahlberg & Shneiderman,
1994) and brushing histogram (Tweedie, Spence, Williams & Bhogal, 1994). The former allows users
to visualize dynamic query results by using sliders to represent range search conditions, and the latter
employs interactive histograms to represent each attribute and helps users exploring correlations between
attributes. Note that they do not take query history into account. Furthermore, information visualization

4
Automatic Categorization of Web Database Query Results

techniques require users to specify what information to visualize (e.g., by setting the slider or selecting
histogram buckets). Since our approach generates the information to visualize, i.e., the category tree,
our approach is a complementary to visualization techniques. If the leaf of the tree still contains many
tuples, for example, a query slider or brushing histogram can be used to further narrow down the scope.
Concerning the ranked retrieval from databases, user relevance feedback (Rui, Huang & Merhotra,
1997; Wu, Faloutsos, Sycara & Payne, 2000) is employed to learn the similarity between a result tuple
and the query, which is used to rank the query results in relational multimedia databases. The SQL query
language is extended to allow the user to specify the ranking function according to their preference for
the attributes (Kieβling, 2002; Roussos, Stavrakas & Pavlaki, 2005). Also, the importance scores of
result tuples (Agrawal, Chaudhuri, Das & Gionis, 2003; Chaudhuri, Das, Hristidis & Weikum, 2004;
Geerts, Mannila & Terzim, 2004) are extracted automatically by analyzing the past workloads, which
can reveal what users are looking for and what they consider as important. According to the scores, the
tuples can be ranked. Ranking is a complementary to categorization. We can use ranking in addition to
our techniques (e.g., we rank tuples stored in the intermediate nodes and leaves). However, most exist-
ing work does not consider the diversity issue of user preferences. In contrast, we focus on addressing
the diversity issue of user preferences for the categorization approach.
Also there has been a lot of work on information retrieval (Card, MacKinlay & Shneiderman, 1999;
Shen, Tan & Zhai, 2005; Finkelstein &Gabrilovich, 2001; Joachims, 2006; Sugiyama & Hatano, 2004)
using query history or other implicit feedbacks. However, these work focuses on searching text docu-
ments, while this chapter focuses on searching relational data. In addition, these studies typically rank
query results, while this chapter categorizes the results. Of course, ones could use the existing hierar-
chical clustering techniques (Mitchell, 1997) to create the category tree. But the generated trees are not
easy for users to navigate. For example, how do we describe the tuples contained in a node? We can
use a representative tuple, but such a tuple may contain many attributes. It is difficult for users to read.
On the contrary, the category tree used in this chapter is easy to understand because each node just uses
one attribute.

BASICS OF CATEGORIZATION

This section introduces the query history firstly, and then defines the category tree and the category cost.
The categorical space and exploration model are finally described.

Query History

Consider a database relation D with n tuples D = {t1,..., tn} with schema R {A1,...,Am}. Let Dom(Ai)
represent the active domain of attribute Ai.
Let H be a query history {(Q1, U1, F1),..., (Qk, Uk, Fk)} in chronological order, where Qi is a query,
Ui is a session ID (a session starts when a user connects to the database and ends when the user discon-
nects), and Fi is the importance weight of the query, which is evaluated by the frequency of the query in
H. Assume that the queries in the same session are asked by the same user, which will be used later to
prune queries. The query history can be collected using the query log of commercial database systems.

5
Automatic Categorization of Web Database Query Results

We assume that all queries only contain point or range conditions, and the query is of the form: Q =
∧iâ‹‹m(Ai θ ai), where ai â‹‹ Dom(Ai), θ â‹‹ {>, <, =, ≥, ≤, between, in}. Note that if θ is the operator between,
Ai θ ai has the format of “Ai between ai1 and ai2”or “ai1 ≤ Ai ≤ ai2”, where ai1, ai2 â‹‹ Dom(Ai).
D can be partitioned into a set of disjoint preference-based clusters C = {C1,..., Cq}, where each
cluster Cj corresponds to one type of user preferences. Each Cj is associated with a probability Pj that
users are interested in Cj. This set of clusters over D is inferred from the query history. We assume that
the dataset D is fixed. But, in practice D may get modified from time to time. For the purpose of this
chapter, we will assume that the clusters are generated periodically (e.g., once a month) as the set of
queries evolve and database is updated.

Category Tree

Definition 1 (Category tree). A category tree T (V, E, L) consists of a node set V, an edge set E, and a
label set L. Each node v â‹‹ V has a label lab(v) â‹‹ L which specifies the condition on an attribute such that
the following should be satisfied: (i) such conditions are point or range conditions, and the bounds in
the range conditions are called partition points; (ii) v contains a set of tuples N(v) that satisfy all condi-
tions on its ancestors including itself, in other words, N(v) is the subset of tuples in D that satisfies the
conjunction of catalog labels of all nodes on the path from the root to v; (iii) conditions associated with
subcategory of an intermediate node v are on the same attribute (called partition attribute), and define
a partition of the tuples in v.

The label of a category (or a node), therefore, solely and unambiguously describes to the user which
tuples, among those in the tuple set of the parent of v, appear under v. Hence, user can determine whether
v contains any item that is relevant to her or not by looking just at the label and hence decide whether
to explore or ignore v. The lab(v) has the following structure:
If the categorizing attribute A is a categorical attribute: lab(v) is of the form ‘A â‹‹ S’ where S â−‡
Dom(A) (Dom(A) denotes the domain of values of attribute A in D). A tuple t satisfied the predicate lab
(v) if t.A â‹‹ S, otherwise it is false (t.A denotes the value of tuple t on attribute A).
If the categorizing attribute A is a numeric attribute: lab(v) is of the form ‘a1≤A<a2’ where a1,
a2â‹‹ Dom(A). A tuple t satisfies the predicate lab(v) is true if a1≤t.A<a2, otherwise it is false.

Exploration Model

Given a category tree T over the query results, the user starts the exploration by exploring the root node.
Suppose that she has decided to explore the node v, if v is an intermediate node, she non-deterministically
(i.e., not known in advance) chooses one of the two options:
Option ‘ShowTuples’: Browse through the tuples in N(v). Note that the user needs to examine all
tuples in N(v) to make sure that she finds every tuple relevant to her.
Option ‘ShowCat’: Examine the labels of all the n subcategories of v, exploring the ones relevant
to her and ignoring the rest. More specifically, she examines the label of each subcategory vi of v star-
ing form the first subcategory and no-deterministically chooses to either explore it or ignore it. If she
chooses to ignore vi, she simply proceeds and examines the next label (of vi+1). If she chooses to explore

6
Automatic Categorization of Web Database Query Results

vi, she does so recursively based on the same exploration model, i.e., by choosing either ‘ShowTuples’
or ‘ShowCat’ if it is an intermediate node or by choosing ‘ShowTuples’ if it is a leaf node. After she
finishes the exploration of vi, she goes ahead and examines the label of the next subcategory of v (of
vi+1). When the user reaches the end of the subcategory list, she is done. Note that we assume that the
user examines the subcategories in the order it appears under v; it can be from top to bottom or from left
to right depending on how the tree is rendered by the user interface.

Category Cost

We assume that a user visits T in a top-bottom fashion, and stops at a node (intermediate node or leaf
node) that contains the tuples that she is interested in.
Let v be a node (intermediate node or leaf node) of T with N(v) tuples and Cj be a cluster in C. Cj ∩
v ≠ ÏŁ denotes that v contains tuples in Cj. Anc(v) denotes the set of ancestors of v including v itself, but
excluding the root. Sib(v) denotes the set of nodes at the same level as the node v including itself. Let
K1 and K2 represent the weights of visiting a tuples in the node and visiting an intermediate tree node,
respectively. Let Pj be the probability that users will be interested in cluster Cj, and let Pst be the prob-
ability that user goes for option ‘ShowTuples’ for an intermediate node v given that she explores v. The
category cost is defined as follows.

Definition 2 (category cost). The category cost

Cost (T ,C ) =
|Sib (vi )|
(1)
∑ ∑ Pj (K 1 | N (v ) | +K 2 ∑ (| Sib(vi ) | + ∑ Pst (N (v j ))))
j
v ∈Node (T ) C j ∩v ≠f vi ∈Anc (v ) j =1

The category cost of a leaf node v consists of three terms: the cost of visiting tuples in leaf node v,
the cost of visiting intermediate nodes, and the cost of visiting tuples in intermediate nodes if the user
chooses to explore it. Users need to examine the labels of all sibling nodes to select a node on the path
from the root to v, thus users have to visit ∑ v ∈Anc (v ) | Sib(vi ) | intermediate tree nodes. Users may also
i

like to examine the tuples of some sibling nodes on the path from the root to v, thus users have to visit
|Sib (vi )|

∑ ∑ Pst (N (v j )) tuples of intermediate tree nodes. When users reach the node v which they
j
vi ∈Anc (v ) j =1

would like to explore it, they have to look at N(v) tuples in v. Pst is the probability that the user exploring
v using ‘ShowTuples’, Pst = N(Av)/N, where N(Av) denotes the number of queries in the query history
that contain selection condition on attribute A of node v and N is the total number of queries in the
query history. Definition 2 computes the expected cost over all clusters and nodes.

Query Results Categorization

For resolve the problem of query results categorization, we propose a solution which consists of two
steps, the offline data clustering step and the online category tree construction step. In this Section, we
first describe data clustering and then present the category tree construction approach.

7
Automatic Categorization of Web Database Query Results

Data Clustering

We generate preference-based clusters as follows. We first define a binary relationship R over tuples such
that (ri, rj) â‹‹ R if and only if two tuples ri and rj appear in the results of the exactly same set of queries
in H. If (ri, rj) â‹‹ R, according to the query history, ri and rj are not distinguishable because each user that
requests ri also requests rj and vice versa. Clearly, R is reflexive, symmetric, and transitive. Thus R is an
equivalence relation and it partitions D into equivalence classes {C1,..., Cq}, where tuples equivalent to
each other are put into the same class. Those tuples not selected by any query will also form a cluster
associated with zero probability (since no users are interested in them). Thus, we can define the data
clustering problem as follows.

Problem 1. Given database D, query history H, find a set of disjoint clusters C = {C1,…, Cq} such that for
any tuples ri and rjâ‹‹ Cl, 1≤ l ≤q, (ri, rj) â‹‹R, and for any tuples ri and rj not in the same cluster, (ri, rj)â‹‹Ì¸R.

Since the query history H may contain many queries, thus we need to cluster the queries in H, and
then to cluster the tuples depending on the clusters of query history. We will propose the algorithm for
query history and data clustering in the next section.

Category Tree Construction

We next define the problem of category tree construction.

Problem 2. Given D, C, Q, find a tree T(V, E, L) such that (i) it contains all tuples in the results of Q,
and (ii) there does not exist another tree T’ satisfying (i) and with Cost (T’, C) < Cost (T, C).

The above problem can be proved to be NP-hard in a way similar to proving that the problem of
finding an optimal decision tree with a certain average length is NP-hard. Section 5 will present an ap-
proximate solution. The category tree construction algorithm is shown in Algorithm 1.

CLUSTERING FOR QUERY HISTORY AND DATA TUPLES

This section describes the algorithm to cluster tuples using query history. We propose the preprocessing
steps to refine the query history, which include prune unimportant queries and cluster the queries. Based
on different kinds of user preferences, we propose the method of generating tuples clusters.

Algorithm 1. The category tree construction algorithm

Input: query history H, database D, query Q

Output: a category tree T
1. (Offline step) Cluster tuples in query results D using H. The results are a set
of clusters C1,..., Cq, and each cluster Cj, 1 ≤ j ≤ q, is assigned a probability Pj.
2. (Online step) For the query Q, create a category tree T with minimal
LCost(T, C) over tuples in results of Q, using C1,..., Cq as class labels.

8
Automatic Categorization of Web Database Query Results

Query Refinement and Clustering

Query Prune

The query pruning algorithm is based on the following heuristics: (i) queries with empty answers are
not useful, (ii) in the same session, a user often starts with a query with general conditions and return
many answers, and then continuously refines the previous query until the query returns a few interesting
answers. Therefore, only the last query in such a refinement sequence is important. The queries prune
algorithm is shown in Algorithm 2.
We identify the relationship “â−ƒ”between queries by using the following method. Let Qi’s condition
on attribute Ai (i = 1,…,m) is ai1 ≤ Ai ≤ ai2, and Qj’s condition on attribute Ai is a’i1 ≤ Ai ≤ a’i2,. Qi â−ƒ Qj if
for every condition in Qi, Qj either does not contain any condition on Ai or has a condition a’i1 ≤ Ai ≤
a’i2, such that a’i1 ≤ ai1 ≤Ai ≤ ai2 ≤ a’i2,. For simplicity, we use H to denote the pruned query history H’
in the following sections.

Query Clustering

Since there are too many queries in the query history, we should cluster the similar queries into the same
cluster and find the representative queries.

The Problem of Queries Clustering

In order to quantify the similarity between the query Q1 and Q2, we adopt a typical definition of similar-
ity, the cosine similarity. For this to be defined we first need to form the vector representations of query
Q1 and Q2. Consider the set Δ of all distinct <attribute, attribute-value> pairs appearing in the D, that is,
Δ = {<Ai, ai> | â‹•i â‹‹{1,…, d} and â‹•a â‹‹ Dom(Ai)}. Since Dom(Ai) is the active domain of attribute Ai the
cardinality of this set is finite. Let it be N = |Δ| and let OD be an arbitrary but fixed order on the pairs ap-
pearing in Δ. We refer to the i-th element of Δ based on the ordering OD by Δ[i]. A vector representation
of query Q1 = ∧jâ‹‹m(Aj θ aj) is a binary vector VQ1 of size N. The i-th element of the vector corresponds to
pair Δ[i]. If Δ[i] is contained in the conjunctions of Q1 then VQ1[i] = 1. Otherwise it is 0. Analogously,
the vector representation of a query Q2 is a binary vector VQ2 of size N. The i-th element of the vector
corresponds to pair Δ[i]. If Δ[i] is contained in the conjuncts of Q2, then VQ2[i] = 1; otherwise it is 0.

Algorithm 2. The queries pruning algorithm

Input: query history H, database D

Output: pruned query history H’
1. Eliminate queries with empty answers by executing the query over D.
2. Order the remaining queries by session ID in chronological order.
3. For each sequence (Qi, Ui, Fi), …, (Qj, Uj, Fj) in H such that Ui = Ui+1 = … =Uj
and Qi â−ƒQi+1…â−ƒQj and (Qj â−‡Ì¸ Qj+1 or Uj ≠ Uj+1).
4. Eliminate all queries Qi,…, Qj-1.

9
Automatic Categorization of Web Database Query Results

Now, we can define the similarity between Q1 and Q2 using their vector representations VQ1 and VQ2
as follows:

VQ 1 ⋅VQ 2
Sim(Q1,Q2 ) = cos(VQ 1,VQ 2 ) = (2)
|VQ 1 ||VQ 2 |

In order to quantify how well a query Q1 is represented by another query Q2, we need to define a
distance measure between two queries. Based on the similarity mentioned above, the distance between
Q1 and Q2 can be defined as

d(Q1, Q2) = 1- Sim(Q1, Q2) (3)

Based on the definitions above, the queries clustering problem can be defined. Let H be the set of m
queries in query history: H = {Q1,…, Qm}. The we need to find a set of k queries Hk = {Q1 ,…, Q k } (k
< m) such that:

cos t(H k ) = ∑ d (Q,QC k ) (4)

Q ∈H

is minimized. The distance of a query Qi from a set of queries H is defined as

d (Qi , H ) = min d (Qi ,Q j ) (5)

Q j ∈H

We call the queries in set Hk representative queries and associate with each representative query Q i
a set of queries QC j = {Qi | Q j = arg min j ' d (Qi ,Q j ' )} .

Complexity of the Queries Clustering Problem

The problem of queries clustering is the same as the k-median problem. The k-median problem is well
known to be NP-hard. An instance of the metric k-median problem consists of a metric space χ = (X, c),
where X is a set of points and c is a distance function (also called the cost) that specifies the distance cxy
≥ 0 between any pair of nodes x, y â‹‹ X. The distance function is reflexive, symmetric, and satisfies the
triangle inequality. Given a set of points F â−ƒ X, the cost of F is defined by cost(F ) = ∑ x ∈X c , where
xF

cxF = min f ∈F cxF for x â‹‹ X. The objective is to find a k-element set F â−ƒ X that minimizes cost(F)
(Chrobak, Keynon & Young, 2005). Obviously, the queries clustering problem can be treated as the k-
median problem and it is also NP-hard. Thus, we have to think of approximation algorithms for solving
it.

10
Automatic Categorization of Web Database Query Results

Algorithm 3. The Greedy-Refine algorithm of queries clustering

Input: A pruned query history H with m queries: H = {Q1,…, Qm}, a set of all
Stars: U = {â�¨Qi, QCiâ�© | Qi â‹‹ H, QCi â−ƒ H}, k
Output: A set of k query clusters Hk = {â�¨Q 1 ,QC1â�©...â�¨Q k , QCkâ�©}
1. Let B = {} be a buffer that can hold m â�¨Qi, QCiâ�©
2. While H ≠ â‹– and k > 0 Do
3. B ← â‹–
4. For each Qi â‹‹ H Do
5. Pick si = â�¨Qi, QCiâ�© with minimum rsi from Ui = {â�¨Qi, QCiâ�© | QCi â−ƒ H, | QCi |
= [2, |H| - k + 1]}
6. B ←B +{si}
7. End For
8. Pick s = â�¨Qi, QCiâ�© with minimum rs from B
9. H←H –QCi - {Qi}, Hk ←Hk + s, k← k – 1
10. End While
11. Return Hk

Algorithmic Solution

For clustering queries, we propose a novel approach, which can discover the near-globally optimal so-
lution and has the low time complexity of the algorithm as well. The approach is described as follows.
Observing the solution of the queries clustering, we can find that every representative query connects
with some other queries of H and these connections are like star structures. Here, we call a connection
as a Star. Then we can re-define the queries clustering problem as follows: Let U be the set of all Stars,
i.e., U = {â�¨Qi, QCiâ�© | Qi â‹‹ H, QCi â−ƒ H}. The cost of each Star s = â�¨Qi, QCiâ�© â‹‹ U can be denoted as:
cs = ∑Q ∈QC d (Qi ,Q j ). Let rs = cs/|QCi| be the performance-price ratio. Our objective is to find a set of
j i

Star S, such that S â−ƒ U, which minimizes the cost and enables that there are k representative queries in
S and any original query Qj â‹‹ H appears at least once at Star s â‹‹ S.
For solving this problem, we propose an approach which consists of two parts: a pre-processing part
and a processing part. In the processing part, we build a sequential permutation ki = {Qi1, Qi2,..., Qim} over
H for each query Qi â‹‹ H, where {Qi1, Qi2,..., Qim}â‹‹ H and the queries in ki are arranged non-decreasing
according to their cost corresponding to Qi, that is, d(Qi, Qi1) ≤ d(Qi, Qi2) ≤... ≤ d(Qi, Qim). Such permuta-
tions can help us only consider the first l queries in ki other than all queries in ki when we build the Star
for Qi. Note that, the number l should be choosed appropriately. It can be seen that the complexity of
pre-processing part is O(|H|2log|H|), where |H| denotes the number of queries of H.
The task of processing part is to cluster queries by using the Greedy-Refine algorithm (Algorithm 3)
based on the Stars formed in pre-processing part. The input is a set of all Stars formed in preprocessing
part. For each Qi â‹‹ H, the algorithm picks up the Star si with the minimal rs in Ui (the set of all Stars in
U corresponding to Qi, Ui â−ƒ U) and put it in the set B. From the set B, the algorithm chooses the Star s
with the minimal rs and adds it to the objective set Hk. And then, the algorithm removes Qi and QCi from
H. The algorithm stops when the set Hk has k elements. The output is a set of k pairs of the form â�¨Q i ,

11
Automatic Categorization of Web Database Query Results

QCiâ�©, where Q i is a representative query (i.e., it is the center of clustering QCi), and Q i corresponds to
the query cluster QCi. The time complexity in the processing part is O(|H|k), and thus the algorithm is
polynomial solvable (Meng & Ma, 2008).

Generation of Data Clusters

After queries pruning and clustering, we get a set of query clusters QC1,…, QCk. For each tuple ti, we
generate a set Si consisting of query clusters such that one of the queries in that cluster returns ti. That is,
Si = {QCp | â‹… Qiâ‹‹QCp such that ti is returned by Qj}. We then group tuples according to their Si, and each
group forms a cluster. Each cluster is assigned a class label. The probability of users being interested in
cluster Ci is computed as the sum of probabilities that a user asks a query in Si. This equals the sum of
frequencies of queries in Si divided by the sum of frequencies of all queries in the pruned query history H.

Example 2. Suppose that there are four queries Q1, Q2, Q3, and Q4 and 15 tuples r1, r2, …, r15. Q1 returns
first 10 tuples r1, r2, …, r10, Q2 returns the first 9 tuples r1, r2, …, r9, and r14, Q3 returns r11, r12 and r14,
and Q4 returns r15. Obviously, the first 9 tuples r1, r2, …, r9 are equivalent to each other since they are
returned by both Q1 and Q2. The data can be divided into five clusters {r1, r2, …, r9} (returned by Q1,
Q2), {r10} (returned by Q1 only), {r11, r12, r14} (returned by Q3), {r15} (returned by Q4), and {r13} (not
returned by any query).

In example 2, after clustering we get two clusters {Q1, Q2}, {Q3} and {Q4}. Four clusters C1, C2, C3,
and C4 will be generated. The cluster C1 corresponds to Q1 and Q2 and contains the first 10 tuples, with
probability P1 = 2/4= 0.5. The cluster C2 corresponds to Q3 and contains r11, r12, r14, with probability P2
= 1/4 = 0.25. The cluster C3 corresponds to Q4 and contains r15, with probability P3 = 1/4 = 0.25. The
cluster C4 contains r13, with probability 0, because r13 is not returned by any query. The data clusters
generating algorithm is shown in Algorithm 4.

Algorithm 4. The algorithm for generating the preference-based data clusters

Input: pruned query history H, query results R

Output: data clusters of query results C1,…, Cq
1. QueryCluster (H, U) → {QC1, …, QCk}.
2. For each tuple ri â‹‹ R, identify Si = {QCp | â‹… Qjâ‹‹QCp such that ri is returned
by Qj}.
Group tuples in R by Si.
3. Output each group as a cluster Cj, assign a class label for each cluster,
and compute
∑Q ∈S Fi
probability Pj = i j
.
∑Q ∈H Fi
p

12
Automatic Categorization of Web Database Query Results

CATEGORY TREE CONSTRUCTION

This section proposes the category tree construction algorithm. Section 5.1 gives an overview of our
algorithm. Section 5.2 presents a novel partitioning criterion that considers the cost of visiting both
intermediate and leaves nodes.

Algorithm Overview

A category tree is very similar to a decision tree. There are many well-known decision construction al-
gorithms such as ID3 (Quinlan, 1986), C4.5 (Quinlan, 1993), and CART (Breiman, Friedman & Stone,
1984). However, the existing decision tree construction algorithm aims at minimizing the impurity of
data (Quinlan, 1993) (represented by information gain, etc.). Our goal is to minimize the category cost,
which includes both the cost of visiting intermediate tree nodes (and the cost of visiting tuples in the
intermediate nodes if the user explores it) and the cost of visiting tuples stored in leaf nodes.
For building the category tree, we make use some ideas of solution presented by Chen (Chen & Li,
2007) and propose the improved algorithms for solving it. The problems of our algorithm have to resolve
including (i) eliminating a subset of relatively unattractive attributes without considering any of their
partitions, and (ii) for every attribute selected above, obtaining a good partition efficiently instead of
enumerating all the possible partitions. Finally, we construct the category tree by choosing the attribute
and its partition that has the least cost.

Reducing the Choices of Categorizing Attribute

Since the presence of a selection condition on an attribute in query history reflects the user’s interest in
that attribute, attributes that occur infrequently in the query history can be omitted while constructing
the category tree. Let N(A) be the number of queries in the query history that contain selection condition
on attribute A and N be the total number in the query history. We eliminate the uninteresting attributes
using the following solution: if an attribute A occurs in less than a fraction x of the queries in the query
history, i.e., N(A)/N < x, we eliminate A. The threshold x will need to be specified by the system domain
expert. For example, for the real estate searching application, if we use x = 0.4, only 7 attributes, namely
Price, SqFt, Livinarea, View, Neighborhood, Schooldistrict, and Bedrooms, are retained from among 25
attributes in the MSN House&Home dataset. The algorithm for eliminating the uninterested attributes
is shown in Algorithm 5.

Algorithm 5. Eliminate uninterested attribute algorithm

Input: pruned query history H, threshold x

Output:AR = {A1, …, Ak} which are the attribute set retained after eliminating
the uninterested attributes in H
1. For each attribute Ai of the attribute set AH appeared in H, Compute its
N(Ai)/N, eliminate Ai if its N(Ai)/N < x.
2. Output AR = {A1, …, Ak}.

13
Automatic Categorization of Web Database Query Results

Partitioning for Categorical Attribute

For a categorical attribute, a new subcategory will be created with one branch for each value of the at-
tribute, and the information gain (we will discuss how to compute the information gain in Section 5.2)
will be computed over that subcategory. If a categorical attribute have too many values and thus generate
too many branches, we can add intermediate levels to that attribute. The categorical attribute can only
generate one possible partition, and it will be removed from AR if it is selected as the partition attribute.

Partitioning for Numeric Attribute

For a numeric attribute Ai, we use binary partition, i.e., Ai ≤ v or Ai > v. For a numerical value attribute, the
algorithm will generate one subcategory for every possible partition point, and compute the information
gain for that partition point. The best partition point will be selected and the gain of the best partition

Algorithm 6. The category tree building algorithm

Function: BuildTree (AR, DQ, C, λ)

Input:AR is the set of attributes retained after eliminating, DQ is the query
results, C is the class labels assigned in the clustering step for each tuple
in DQ, λ and is a user defined stopping threshold.
Output: a category tree T.
1. Create a root r.
2. If all tuples in DQ have the same class label stop.
3. For each attribute Ai â‹‹ AR.
4. If Ai is a categorical attribute then
5. For each value v of Ai, create a branch under the current root. Add
those tuples with “Ai = v” to that branch.
6. Compute the attribute Ai’s gain g(Ai, Ti) where Ti is the subcategory
created.
7. Else
v
8. For each value v of Ai, create a tree Ti with r as the root and two
branches, one for those tuples with Ai ≤ v or Ai, one for those tuples with Ai >
v or Ai.
v
9. Compute the gain g(Ai, Ti ) for each partition point v and choose the
maximal one as g(Ai, Ti).
10. End If
11. End For
12. Choose the attribute Aj with the maximal g(Aj, Tj), remove Aj from AH if Aj
is categorical.
13. If g(Aj, Tj)> λ then
14. Replace r with the subcategory Tj, for each leaf nk in Tj with tuples in
DQk,
BuildTree (AH, DQk, C, λ).
15. End If

14
Automatic Categorization of Web Database Query Results

is the gain of the attribute. If the gain-ratio of the attribute with the maximal gain-ratio exceeds a pre-
defined threshold λ, the tree will be expanded by adding the selected subcategory to the current root.

Algorithm Solution

Based on the solutions mentioned above, we can now describe how we construct a category tree. Since
the problem of finding a tree with minimal category cost is NP-hard, we propose an approximate algo-
rithm (see Algorithm 6).
After building the category tree, the user can go along with the branches of tree to find the interesting
answers. As mentioned above, the user can explore category tree using two models, i.e., showing tuples
(option ‘ShowTuples’) and showing category (option ‘ShowCat’). When the user chosen the option
‘ShowTuples’ on a node (an intermediate node or a leaf node), the system will provide the items satisfy-
ing all conditions from the root to the current node. The category tree accessing algorithm is shown in
Algorithm 7.

Partition Criteria and Cost Estimation

We first describe how to compute the information gain which acts as the partition criteria, and then we
give the cost estimation of visiting the intermediate nodes and the leaves.

Partition Criteria

Existing decision tree construction algorithms such as C4.5 compute an information gain to measure
how good an attribute classifies data. Given a decision tree T with N tuples and n classes, where each
class Ci in T has Ni tuples. The entropy can be defined as follows,

n
Ni N
E (T ) = -∑ log i (6)
i =1 N N

Algorithm 7. The algorithm of accessing category tree

1. For each intermediate node

2. If option = ‘ShowTuples’
3. List the tuples satisfying all conditions from the root to the current
node
4. Else
5. List the subcategory of the current node
6. End For
7. For each leaf node
8. The explore model is only the option ‘ShowTuples’
9. List the tuples satisfying all conditions from the root to the current node
10. End For

15
Automatic Categorization of Web Database Query Results

In real applications, there may be several distinct values in the domain of an attribute A. For each
attribute value v of A, let NTi be the number of tuples with the attribute value v of A in class Ci, and thus
the conditional entropy can be defined as

n N 
EA(v ) = ∑  Ti
× E (Ti ) (7)
i =1  N 

And then, the information gain of attribute A can be computed by

g(A) = E(T) – EA(v) (8)

For example, consider a fraction results (showed in Table 1) returned by MSN house&home Web
database for a query with the condition “Price between 250000 and 350000 and City = Seattle”. We then
use it to describe how to obtain a best partition attribute by using the formulas defined above.
Here, we assume the decision attributes are View, Schooldistrict, Livingarea, and SqFt. We first
compute the entropy of tree T,

5 5 6 6 4 4
E (T ) = E (C 1, C 2 , C 3 ) = −  log + log + log  = 0.471293.
 15 15 15 15 15 15 

And then, we compute the entropy of each decision attributes. For attribute “View”, it contains four
distinct values which are ‘Water’, ‘Mountain’, ‘GreenBelt’, and ‘Street’, the entropy of each value are

Table 1. The fraction of query results

ID Price Bedrooms Livingarea Schooldistrict View SqFt Cluster

01 329000 2 Burien Highline Water 712 C1
02 335000 2 Burien Tukwila Water 712 C1
03 325000 1 Richmond Shoreline Water 530 C1
04 325000 3 Richmond Shoreline Water 620 C1
05 328000 3 Richmond Shoreline Water 987 C1
06 264950 1 Burien Seattle Mountain 530 C2
07 264950 1 C-seattle Seattle Mountain 530 C2
08 328000 3 Burien Seattle GreenBelt 987 C2
09 349000 2 Burien Seattle Water 955 C2
10 339950 2 C-seattle Seattle GreenBelt 665 C2
11 339950 3 Burien Seattle Street 852 C2
12 264950 4 Richmond Highline Street 1394 C3
13 264950 5 C-seattle Seattle Mountain 1400 C3
14 338000 5 Burien Tukwila Street 1254 C3
15 340000 3 Burien Tukwila GreenBelt 1014 C3

16
Automatic Categorization of Web Database Query Results

5 5 1 1
E View (Water) = −  log − log  = 0.195676,
6 6 6 6
0 0 2 2 1 1
E View (Mountain)  log − log − log  = 0.276434,
3 3 3 3 3 3
0 0 2 2 1 1
E View (GreenBelt) = −  log − log − log  = 0.276434,,
3 3 3 3 3 3
0 0 2 2 1 1
E View (Street)  log − log − log  = 0.276434.
3 3 3 3 3 3

Next,

E(View) = 6/15 * 0.195676 + 3/15 * 0.276434+3/15 * 0.276434 + 3/15 * 0.276434 = 0.2441308.

Thus, the gain

g(View) = E(T) – E(View) = 0.2271622.

Analogously,

g(Schooldistrict) = 0.2927512, g(Livingarea) = 0.1100572, g(SqFt) = 0.251855,

where, we choose the value ‘987’ as the partition value.

Finally, the attribute “Schooldistrict” will be selected as the first level partition attribute of decision
tree T by using the C4.5 algorithm.
However, the main difference between our algorithm and the existing decision tree construction
algorithm is how to compute the gain of a partition. Our approach wants to reduce the category cost of
visiting intermediate nodes (includes the tuples in them if user choose to explore them) and the cost of
visiting tuples in the leaves. Our following analysis will show that information gain ignores the cost
of visiting tupes, and the existing category tree construction algorithm proposed by Chakrabarti et. al.
(Chakrabarti, Chaudhuri & Hwang, 2004) ignores the cost of visiting intermediate nodes generated by
future partitions while the category tree construction algorithm proposed by Chen et. al. (Chen & Li,
2007) ignores the cost of visiting tuples in the intermediate nodes.

Cost Estimation

Cost of Visiting Leave

Let v be the node to be partitioned and N(v) be the number of tuples in v. Let v1 and v2 be children gen-
erated by a partition. Let Pi be the probability that users are interested in cluster Ci. The gain equals the
reduction of category cost when v is partitioned into v1 and v2. Thus based on the category cost defined
in Definition 2, the reduction of the cost of visiting tuples due to partition v into v1 and v2 equals

17
Automatic Categorization of Web Database Query Results

N (t ) ∑
Cl ∩t ≠f
Pl − ∑ N (t j )( ∑ Pi )
j =1,2 C i ∩t j
(9)

The decision tree construction algorithms do not consider the cost of visiting leaf tuples. For example,
consider a partition that generates two nodes that contain tuples with labels (C1, C2, C1) and (C2), and
a partition that generates two nodes that contain tuples with labels (C2, C1, C2) and (C1). According to
the discussion in Section 5.4.1, these two partitions have the same information gain. However, if P1 =
0.5 and P2 = 0, then the category cost for the first partition is smaller because the cost is 1.5 for the first
partition and is 2 for the second partition.

Cost of Visiting Intermediate Nodes

For estimate the cost of visiting intermediate nodes, we adopted the method proposed by Chen et. al.
(Chen & Li, 2007). According to the definition in (Chen & Li, 2007), the perfect tree is that their leaves
only contain tuples of on class and can not be partitioned further. In fact, the perfect tree is a decision
tree. Given a perfect tree T with N tuples and k classes, where each class Ci in T has Ni tuples. The en-
k
Ni N
tropy E(T) = − ∑ log i approximates the average length of root-to-leaf paths for all tuples in T.
i =1 N N
Since T is a perfect tree, its leaves contain only one class per node. For each such leaf Li that contains
Ni tuples of class Ci, it can be further expanded into a smaller subtree Ti which is rooted at Li, and its leaf
contains exactly one record in Ci. Each such small subtree Ti contains Ni leaves. All these subtrees and
T compose a big tree Tb that contains ∑ 1≤i ≤k N i = N leaves. We further assume that each Ti and the
big tree Tb are balanced, thus the height for Ti is logNi and the height for Tb is logN. Note that for the i-th
leaf Li in T, the length of the path from root to Li equals the height of big tree Tb minus the height of
small tree Ti. There are Ni tuples in Li, all with the same path from the root. Thus the average length of
root-to-leaf paths for tuples is defined as follows,

Ni k
Ni N
∑ (log N − log N i ) = −∑ log i (10)
1≤i ≤k N i =1 N N

This is exactly the entropy E(t). Note that most existing decision tree algorithms choose the partition
that maximizes information gain. Information gain is the reduction of entropy due to a partition and is
represented in the following formula,

N1 N
IGain(t, t1, t2 ) = E (t ) − E (t1 ) − 2 E (t2 ) (11)
N N

Thus a partition with a high information gain will generate a tree with a low entropy. And, this tree
will have short root-to-leaf paths as well. Since the cost of visiting intermediate nodes equals the prod-
uct of path lengths and fan-out in Definition 2, if we assume the average fan-out is about the same for
all trees, then the cost of visiting intermediate nodes is proportional to the length of root-to-leaf paths.
Therefore, the cost reduction of visiting intermediate nodes can be used information gain to estimate.

18
Automatic Categorization of Web Database Query Results

Cost of Visiting Tuples in Intermediate Nodes

Since user may choose to examine the tuples in an intermediate node, thus we need to consider the cost
of visiting these tuples. Let Pst be the probability that user goes for option ‘ShowTuples’ for an inter-
mediate node v given that she explores v, and N(t) be the number of tuples in the intermediate node t.
Next, the cost of visiting tuples in an intermediate node equals Pst·N(t).

Combining Costs
The remaining problem is how to combine the three types of costs. Here we take a normalization ap-
proach, which uses the following formula to estimate the gain of partitioning t into t1 and t2,

IGain(t, t1, t2 ) / E (t )
(12)
((∑ j =1,2 N (t j )(∑C ∩t ≠f Pi )) / (N (t )∑C ∩t ≠f Pl )) ∗ Pst N (t )
i j l

The denominator is the product of the cost of visiting leaf tuples after partition normalized by the
cost before partition multiplying the cost of visiting the tuples in t. A partition always reduces the cost of
visiting tuples (the proof is straightforward). Thus the denominator ranges from (0, 1]. The nominator is
the information gain normalized by the entropy of t. We compute a ratio between these two terms rather
than sum of the nominator and (1-denominator) because in practice the nominator (information gain)
is often quite small. Thus the ratio is more sensitive to the nominator when the denominator is similar.

Complexity Analysis
Let n be the number of tuples in query results, m be the number of attributes, and k be the number of
classes. The gain in Formula 5 can be computed in O(k) time. C4.5 also uses several optimizations such
as computing the gains for all partition points of an attribute in one pass, sorting all tuples on different
attribute values beforehand, and reusing the sort order. The cost of sorting tuples on different attribute
values is O(mnlogn), and the cost of computing gains for all possible partitions at one node is O(mnk)
because there are at most m partition attributes and n possible partition points, and each gain can be
computed in O(k) time. If we assume the generated tree has O(logn) levels, the total time is O(mnklogn).

EXPERIMENTAL EVALUATION

In this section, we describe our experiments, report the experimental results and compare our approach
with several existing approaches.

Experimental Setup

We used Microsoft SQL Server 2005 RDBMS on a P4 3.2-GHz PC with 1 GB of RAM for our experi-
ments.
Dataset: For our evaluation, we setup a real estate database HouseDB (Price, SqFt, Bedrooms,
Bathrooms, Livingarea, Schooldistrict, View, Neighborhood, Boat, Garage, Buildyear…) containing
1,700,000 tuples extracted from MSN House&Home Web site. There are 27 attributes, 10 numerical
and 17 categorical. The total data size is 20 MB.

19
Automatic Categorization of Web Database Query Results

Table 2. Test queries used in user study

Queries Result size

Q1: Price between 250000 and 300000 and SqFt > 1000 323
Q2: Schooldistrict = Tukwila and View = GreenBelt 894
Q3: Price < 250000 and Neighborhood in { Winslow, Shoreline} 452
Q4: Schooldistrict = Seattle and View in {Water, GreenBelt} 558
Q5: SqFt between 600 and 1000 16213

Query history: In our experiments, we requested 40 subjects to behave as different kinds of house
buyers, such as rich people, clerks, workers, women, young couples, etc. and post queries against the
database. We collected 2000 queries for the database and these queries are used as the query history.
Each subject was asked to submit 15 queries for HouseDB, each query had 2~6 conditions and had
4.2 specified attributes on average. We assume each query has equal weight. We did observe that users
started with a general query which returned many answers, and then gradually refined the query until it
returned a small number of answers.
Algorithm: We implemented all algorithms in C# and connected to the RDBMS through ADO.
The clusters are stored by adding a column to the data table to store the class labels of each tuple. The
stopping threshold λ in build tree algorithm is set to 0.002. We have developed an interface that allows
users to classify query results using generated trees.
Comparison: We compare our create tree algorithm (henceforth referred to as Cost-based algorithm)
with the algorithm proposed by Chakrabarti et. al. (Chakrabarti, Chaudhuri & Hwang, 2004) (henceforth
referred to as Greedy algorithm). It differs from our algorithm on two aspects: (i) it does not consider
different user preferences, and (ii) it does not consider the cost of intermediate nodes generated by fu-
ture partitions. We also compare the algorithm proposed by Chen et. al. (Chen & Li, 2007) (henceforth
referred to as C4.5-Categorization algorithm), it first uses the merging queries step to generate data
clusters and corresponding labels, then uses modified C4.5 to create the navigational tree. It differs
from our algorithm on two aspects: (i) it needs to execute queries on the dataset to evaluate the queries
similarity and then to merging the similar queries, and (ii) it can not expand the intermediate nodes to
show tuples and it thus does not consider the cost of visiting tuples of intermediate nodes.
Setup of user study: We conduced an empirical study by asking 5 subjects (with no overlap with
the 40 users submitting the query history) to use this interface. The subjects were randomly selected
colleagues, students, etc. Each subject was given a tutorial about how to use this interface. Next, each
subject was given the results of 5 queries listed in Table 2, which do not appear in the query history.
For each such query, the subject was asked to go along with the trees generated by the three algorithms
mentioned above, and to select 5-10 houses that he would like to buy.

Categorization Cost Experiment

The experiment aims at comparing the cost of three categorization algorithms and showing the efficiency
of our categorization approach. The actual category cost is defined as follows:

20
Automatic Categorization of Web Database Query Results

Figure 3. Total actual cost

Figure 4. Average number of selected houses per subject

|Sib(vi )|
ActCost = ∑
∀leaf v visited by a subject
(K1N (v ) + K 2 ∑
vi ∈Anc(v )
(| Sib(vi ) | + ∑
j =1
Pst (N (v j ))))
j

(13)

Unlike the category cost in Definition 2, this cost is the real count of intermediate (including siblings)
and tuples visited by a subject. We assume the weight for visiting intermediate nodes and visiting tuples
are equal, i.e. K1 = K2 = 1. In general the lower the total category cost, the better the categorization method.
Figure 3 shows the total actual cost, averaged over all the subjects, for Cost-based, C4.5-Categorization,
and Greedy algorithm. Figure 4 reports the average number of houses selected by each subject. Figure
5 reports the average category cost of per selected house for these algorithms.
The results show that the category trees generated by Cost-based algorithm have the lowest actual
cost and the lowest average cost per selected house (the number of query clusters k was set to 30). Users

21
Automatic Categorization of Web Database Query Results

Figure 5. Average category cost of per selected house

Table 3. Results of survey

Categorization algorithm #subjects that called it best

Cost-based 16
C4.5-categorization 4
Greedy 2

have also found more houses worth considering to buy using our algorithm than the other two algorithms,
suggesting our method makes it easier for users to find interesting houses. The tree generated by Greedy
algorithm has the worst results. This expected because the Greedy algorithm ignores different user
preferences, and dose not consider future partitions when generating category trees. The C4.5-Catego-
rization algorithm also has higher cost than our method. The reason is that our algorithm uses a parti-
tioning criterion that considers the cost of visiting the tuples in intermediate nodes, while C4.5-Catego-
rization algorithm does not. Moreover, our algorithm can use a few clusters to representative a large
scale tuples without lose accuracy (it will be tested in the next experiment).
The results show that using our approach, on average a subject only needs to visit no more than 8
tuples or intermediate nodes for queries Q1, Q2, Q3, and Q4 to find the first relevant tuple, and needs to
visit about 18 tuples or intermediate nodes for Q5. The total navigational cost for our algorithm is less
than 45 for the former four queries, and is less than 80 for Q5. At the end of the study, we asked subjects
which categorization algorithm worked the best for them among all the queries they tried. The result of
that survey is reported in Table 3 and shows that a majority of subjects considered our algorithm the best.

Queries Clustering Experiment

This experiment aims at testing the quality of the algorithm for the queries clustering, whose accuracy
has a great impaction on the accuracy of the clusters of the tuples. We first translated each query in the
query history into its corresponding vector representation, and then we adopt the following strategies
to generate synthetic datasets. Every dataset is characterized by 4 parameters: n, m, l, noise. Here the n

22
Automatic Categorization of Web Database Query Results

Figure 6. Algorithms’ performance for the queries clustering

is the number of vector elements in <attribute, value> pairs set of Δ (note that, each query in the query
history is translated into vector representations), m is the number of input queries, and l is the number
of true underlying clusters. We set ‘0’ and ‘1’ at random on the n elements and we then generate l ran-
dom queries by sampling at random the space of all possible permutations of n elements. These initial
queries form the centers around which we build each one of the clusters. The task of the algorithms is
to rediscover the clustering model used for the data generation. Given a cluster center, each query from
the same cluster is generated by adding to the center a specified amount of noise of a specific type. We
consider two types of noise: swaps and shifts. The swap means that ‘0’ and ‘1’ elements from the initial
order are picked and their positions in the order are exchanged. For the shifts we pick a random element
and we move it to a new position, either earlier (or later) in the order. All elements that are between the
new and the old positions of the element are shifted on position down (or up). The amount of noise is
the number of swaps or shifts we make.
We experiment with datasets generated for the following parameters: n = 300, m = 600, l = {4, 8, 16,
32}, noise = {2, 4, 8,…, 128} for swaps. Figure 6 shows the performance of the algorithms as a function
of the amount of noise. The y axis is the ratio: F(A)/F(INP), for A = {Greedy, Greedy-Refine}, where
the Greedy- Refine algorithm is proposed in this chapter (Algorithm 3), while the Greedy algorithm
is proposed in [2]. We compare them here since they all aim at solving the same problem (clustering
problem). The F(A) is the total cost of the solution provided by algorithm A when the distance showed
in Equation (3) is used as a distance measure between queries. The F(INP) corresponds to the cost of
the clustering structure (Equation 4) used in the data generation process.
From Figure 6 we can see that: Greedy- Refine algorithm performs greatly better than Greedy algo-
rithm. The reason is that: the Greedy- Refine is executed on the queries which were arranged according

23
Automatic Categorization of Web Database Query Results

Figure 7. Execution time for Q1 to Q4

to their cost in pre-processing phrase and makes twice greedy selection in processing phrase, so that it
can obtain the near-globally optimization solution.

Performance Report

Figure 7 report the tree construction time of our algorithm for the 5 test queries (since the execution time
of Q5 is much longer than the first 4 queries, we do not show its histogram in the figure). Our algorithm
took no more than 2.4 second for the first 4 queries queries that returned several hundred results. It
took about 4 seconds for the 5th query that returned 16,213 tuples. Thus our algorithm can be used in an
interactive environment.

CONCLUSION

This chapter proposed a categorization approach to address diverse user preferences, which can help
users navigate many query results. This approach first summarized preferences of all users in the system
by clustering the query history, and then divided tuples into clusters using the different kinds of user
preferences. When a specific user issues a query, our approach create a category tree over the clusters
appearing in the results of the query to help users navigate these results. Our approach differs from the
several existing approaches in two aspects: (i) our approach does not require a user profile or a meaning-
ful query when deciding the user preferences for a specific user, and (ii) the category tree construction
algorithm proposed in this chapter considers both the cost of visiting intermediate nodes (including the
cost of visiting the tuples in intermediate nodes) and the cost of visiting the tuples in leaf nodes. In the
future, we will investigate how to accommodate the dynamic nature of user preferences and how to
integrate the ranking approach into our approach.

24
Automatic Categorization of Web Database Query Results

REFERENCES

Agrawal, R., Rantzau, R., & Terzi, E. (2006). Context-sensitive ranking. Proceedings of the ACM SIG-
MOD International Conference on Management of Data, (pp. 383-394).
Agrawal, S., Chaudhuri, S., Das, G., & Gionis, A. (2003). Automated ranking of database query results.
ACM Transactions on Database Systems, 28(2), 140–174.
Ahlberg, C., & Shneiderman, B. (1994). Visual information seeking: tight coupling of dynamic query
filters with starfield displays (pp. 313–317). Proceedings on Human Factors in Computing Systems.
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. (1984). Classification and regression trees. Boca
Raton, FL: CRC Press.
Bruno, N., Gravano, L., & Marian, A. (2002). Evaluating top-k queries over Web-accessible databases.
Proceedings of the 18th International Conference on Data Engineering, (pp. 369-380).
Card, S., MacKinlay, J., & Shneiderman, B. (1999). Readings in information visualization: using vision
to think. Morgan Kaufmann.
Chakrabarti, K., Chaudhuri, S., & Hwang, S. (2004). Automatic categorization of query results. Proceed-
ings of the ACM SIGMOD International Conference on Management of Data, (pp. 755–766).
Chaudhuri, S., Das, G., Hristidis, V., & Weikum, G. (2004). Probabilistic ranking of database query
results. Proceedings of the 30th International Conference on Very Large Data Base, (pp. 888–899).
Chen, Z. Y., & Li, T. (2007). Addressing diverse user preferences in SQL-Query-Result navigation.
Proceedings of the ACM SIGMOD International Conference on Management of Data, (pp. 641-652).
Chrobak, M., Keynon, C., & Young, N. (2005). The reverse greedy algorithm for the metric k-median
problem. Information Processing Letters, 97, 68–72. doi:10.1016/j.ipl.2005.09.009
Das, G., Hristidis, V., Kapoor, N., & Sudarshan, S. (2006). Ordering the attributes of query results.
Proceedings of the ACM SIGMOD International Conference on Management of Data, (pp. 395-406).
Dhillon, I. S., Mallela, S., & Kumar, R. (2002). Enhanced word clustering for hierarchical text classifica-
tion. Proceedings of the 8th ACM SIGKDD International Conference, (pp. 191–200).
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., et al. (2001). Placing
search in context: The concept revisited. Proceedings of the 9th International World Wide Web Confer-
ence, (pp. 406–414).
Geerts, F., Mannila, H., & Terzim, E. (2004). Relational link-based ranking. Proceedings of the 30th
International Conference on Very Large Data Base, (pp. 552-563).
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant
features. Proceedings of the European Conference on Machine Learning, (pp. 137–142).
Joachims, T. (2002). Optimizing search engines using clickthrough data. Proceedings of the ACM Con-
ference on Knowledge Discovery and Data Mining, (pp. 133–142).

25
Automatic Categorization of Web Database Query Results

Kießling, W. (2002). Foundations of preferences in database systems. Proceedings of the 28th Interna-
tional Conference on Very Large Data Bases, (pp. 311-322).
Koller, D., & Sahami, M. (1997). Hierarchically classifying documents using very few words. Proceed-
ings of the 14th International Conference on Machine Learning, (pp. 170–178).
Koutrika, G., & Ioannidis, Y. (2004). Personalization of queries in database systems. Proceedings of the
20th International Conference on Database Engineering, (pp. 597-608).
Liu, F., Yu, C., & Meng, W. (2002). Personalized Web search by mapping user queries to categories.
Proceedings of the ACM International Conference on Information and Knowledge Management, (pp.
558-565).
Meng, X. F., & Ma, Z. M. (2008). A context-sensitive approach for Web database query results ranking.
Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent
Technology, (pp. 836-839).
Mitchell, T. (1997). Machine learning. McGraw Hill.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. doi:10.1007/
BF00116251
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Francisco: Morgan Kaufmann Pub-
lishers Inc.
Roussos, Y., Stavrakas, Y., & Pavlaki, V. (2005). Towards a context-aware relational model. Proceedings
of the International Workshop on Context Representation and Reasoning, Paris, (pp. 101-106).
Rui, Y., Huang, T. S., & Merhotra, S. (1997). Content-based image retrieval with relevance feedback in
MARS. Proceedings of the IEEE International Conference on Image Processing, (pp. 815-818).
Shen, X., Tan, B., & Zhai, C. (2005). Context-sensitive information retrieval using implicit feedback.
Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development
in Information Retrieval, (pp. 43–50).
Sugiyama, K., Hatano, K., & Yoshikawa, M. (2004). Adaptive Web search based on user profile con-
structed without any effort from users. Proceedings of the 13th International World Wide Web Confer-
ence, (pp. 975-990).
Tweedie, L., Spence, R., Williams, D., & Bhogal, R. S. (1994). The attribute explorer. Proceedings of
the International Conference on Human Factors in Computing Systems, (pp. 435–436).
Wu, L., Faloutsos, C., Sycara, K., & Payne, T. (2000). FALCON: Feedback adaptive loop for content-based
retrieval. Proceedings of the 26th International Conference on Very Large Data Bases, (pp. 297-306).
Zeng, H. J., He, Q. C., Chen, Z., Ma, W. Y., & Ma, J. (2004). Learning to cluster Web search results.
Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development
in Information Retrieval, (pp. 210–217).

26
Discovering Diverse Content Through
Random Scribd Documents
Guatemala—Medina Defeated and Overthrown
—Céleo Arias Succeeds Him—His Liberal Policy
—He is Beset by the Conservatives—His Former
Supporters Depose Him—Ponciano Leiva
Becomes President—His Course Displeases
Barrios, Who Sets Medina against Him—He is
Forced to Resign—Marco Aurelio Soto Made
President by Barrios—Attempted Revolt of Ex-
president Medina—His Trial and Execution—
Soto's Administration—He Goes Abroad—His
Quarrel with Barrios, and Resignation—
President Bogran—Filibustering Schemes

CHAPTER XXIII.
POLITICAL AFFAIRS IN NICARAGUA.
1867-1885.
President Fernando Guzman—Insurrection—
Misconduct of Priests—Defeats of the
Insurgents—Foreign Mediation—Generosity of
the Government—President Vicente Quadra—
Inception of the Jesuits—Aims of Parties—
Internal and Foreign Complications—Costa
Rica's Hostility and Tinoco's Invasion—
Presidents Chamorro and Zavala—More Political
Troubles—Jesuits the Promoters—Their
Expulsion—Peace Restored—Progress of the
Country—President Adan Cárdenas—Resistance
to President Barrios' Plan of Forced 470
Reconstruction

CHAPTER XXIV.
INDEPENDENCE OF THE ISTHMUS.
1801-1822.
Administration under Spain—Influence of Events 488
in Europe and Spanish America on the Isthmus
—Hostilities in Nueva Granada—Constitutional
Government—General Hore's Measures to Hold
the Isthmus for Spain—MacGregor's Insurgent
Expedition at Portobello—Reëstablishment of
the Constitution—Captain-general Murgeon's
Rule—The Isthmus is Declared Independent—
Its Incorporation with Colombia—José Fábrega
in Temporary Command—José María Carreño
Appointed Intendente and Comandante General
—Abolition of African Slavery

CHAPTER XXV.
DIVERS PHASES OF SELF-GOVERNMENT.
1819-1863.
Panamá Congress—Provincial Organizations—
Alzuru's Rebellion and Execution—Secession
from Colombia and Reincorporation—
Differences with Foreign Governments—Crime
Rampant—Summary Treatment of Criminals—
Riots and Massacre of Foreign Passengers—
Attempts to Rob Treasure Trains—Neutrality
Treaties—Establishment of Federal System—
Panamá as a State—Revolutionary Era Begins—
A Succession of Governors—Seditious Character
of the Negro Population—Revolution against
Governor Guardia and his Death—Another
Political Organization—Estado Soberano de
Panamá—Liberal Party in Full Control—Stringent 510
Measures

CHAPTER XXVI.
FURTHER WARS AND REVOLUTIONS.
1863-1885.
Presidents Goitia, Santa Coloma, and Calancha— 532
Undue Interference of Federal Officials—
Colunje's Administration—President Olarte's
Energy—Enmity of the Arrabal's Negroes—Short
and Disturbed Rules of Diaz and Ponce—
President Correoso—Negro Element in the
Ascendent—Conservatives Rebel, and are
Discomfited—Armed Peace for a Time—Feverish
Rules of Neira, Miró, Aizpuru, Correoso, and
Casorla—Cervera's Long Tenure—Temporary
Rule of Vives Leon—President Santodomingo
Vila—Obtains Leave of Absence—Is Succeeded
by Pablo Arosemena—Aizpuru's Revolution—
Arosemena Flees and Resigns—Outrages at
Colon—American Forces Protect Panamá—
Collapse of the Revolution—Aizpuru and
Correoso Imprisoned—Chief Causes of
Disturbances on the Isthmus

CHAPTER XXVII.
CENTRAL AMERICAN INSTITUTIONS.
1886.
Extent of the Country—Climate—Mountains and
Volcanoes—Earthquakes—Rivers and Lakes—
Costa Rica's Area, Possessions, and Political
Division and Government—Her Chief Cities—
Nicaragua, her Territory, Towns, and Municipal
Administration—Honduras' Extent, Islands,
Cities, and Local Government—Salvador, her
Position, Area, Towns, and Civil Rule—
Guatemala's Extent and Possessions—Her Cities
and Towns—Internal Administration—Isthmus
of Panamá—Area, Bays, Rivers, and Islands—
Department and District Rule—The Capital and
Other Towns—Population—Character and
Customs—Education—Epidemics and Other 560
Calamities

CHAPTER XXVIII.
THE PEOPLE OF COSTA RICA, NICARAGUA, AND SALVADOR.
1800-1887.
Central American Population—Its Divisions—
General Characteristics and Occupations—Land
Grants—Efforts at Colonization—Failure of
Foreign Schemes—Rejection of American
Negroes—Character of the Costa Rican People
—Dwellings—Dress—Food—Amusements—
Nicaraguan Men and Women—Their Domestic
Life—How They Amuse Themselves—People of 587
Salvador—Their Character and Mode of Living

CHAPTER XXIX.
THE PEOPLE OF HONDURAS AND GUATEMALA.
1800-1887.
Amalgamation in Honduras—Possible War of
Races—Xicaques and Payas—Zambos or
Mosquitos—Pure and Black Caribs—
Distinguishing Traits—Ladinos—Their Mode of
Life—Guatemala and her People—Different
Classes—Their Vocations—Improved Condition
of the Lower Classes—Mestizos—Pure Indians—
Lacandones—White and Upper Class—Manners
and Customs—Prevailing Diseases—Epidemics— 608
Provision for the Indigent

CHAPTER XXX.
INTELLECTUAL ADVANCEMENT.
1800-1887.
Public Education—Early Efforts at Development— 621
Costa Rica's Measures—Small Success—
Education in Nicaragua—Schools and Colleges—
Nicaraguan Writers—Progress in Salvador and
Honduras—Brilliant Results in Guatemala—
Polytechnic School—Schools of Science, Arts,
and Trades—Institute for the Deaf, Dumb, and
Blind—University—Public Writers—Absence of
Public Libraries—Church History in Central
America and Panamá—Creation of Dioceses of
Salvador and Costa Rica—Immorality of Priests
—Their Struggles for Supremacy—Efforts to
Break their Power—Banishments of Prelates—
Expulsion of Jesuits—Suppression of Monastic
Orders—Separation of Church and State—
Religious Freedom

CHAPTER XXXI.
JUDICIAL AND MILITARY.
1887.
Judicial System of Guatemala—Jury Trials in the
Several States—Courts of Honduras—Absence
of Codes in the Republic—Dilatory Justice—
Impunity of Crime in Honduras and Nicaragua—
Salvador's Judiciary—Dilatory Procedure—
Codification of Laws in Nicaragua—Costa Rican
Administration—Improved Codes—Panamá
Courts—Good Codes—Punishments for Crime in
the Six States—Jails and Penitentiaries—Military
Service—Available Force of Each State—How
Organized—Naval—Expenditures—Military 638
Schools—Improvements

CHAPTER XXXII.
INDUSTRIAL PROGRESS.
1800-1887.
Early Agriculture—Protection of the Industry— 650
Great Progress Attained—Communal Lands—
Agricultural Wealth—Decay of Cochineal—
Development of Other Staples—Indigo, Coffee,
Sugar, Cacao, and Tobacco—Food and Other
Products—Precious Woods and Medicinal Plants
—Live-stock—Value of Annual Production in
Each State—Natural Products of Panamá—
Neglect of Agriculture—Mineral Wealth—Yield of
Precious Metals—Mining in Honduras, Salvador,
and Nicaragua—Deposits of Guatemala and
Costa Rica—Mints—Former Yield of Panamá—
Mining Neglected on the Isthmus—Incipiency of
Manufactures—Products for Domestic Use

CHAPTER XXXIII.
COMMERCE AND FINANCE.
1801-1887.
Early State of Trade—Continued Stagnation after
Independence—Steam on the Coasts—Its
Beneficial Effects—Variety of Staples—Ports of
Entry and Tariffs—Imports and Exports—Fairs—
Accessory Transit Company—Internal
Navigation—Highways—Money—Banking—
Postal Service—Panamá Railway Traffic—Local
Trade of the Isthmus—Pearl Fishery—Colonial
Revenue in Finances of the Federation—Sources
of Revenue of Each State—Their Receipts and 663
Expenditures—Foreign and Internal Debts

CHAPTER XXXIV.
INTEROCEANIC COMMUNICATION.
1801-1887.
Ancient Ideas on the North-west Passage—From 688
Peru to La Plata—Cape Horn Discovered—Arctic
Regions—McClure's Successful Voyage—
Crozier's Discovery—Franklin's Attempts—
Finding by Nordenskiöld of the North-east
Passage—Projects to Unite the Atlantic and
Pacific Oceans across the Isthmuses—Plans
about Tehuantepec—Explorations for a Ship-
canal Route in Nicaragua, Panamá, and Darien
—The Nicaragua Accessory Transit Company—
Construction of the Panamá Railway, and its
Great Benefits—Further Efforts for a Canal—
Organization of a French Company—A Ship-
canal under Construction across the Isthmus of
Panamá—Difficulties and Expectations—Central
American Railroads and Telegraphs—Submarine
Cables
HISTORY
OF
CENTRAL AMERICA.
CHAPTER I.
LAST DAYS OF SPANISH RULE.

1801-1818.

Popular Feeling in Central America—Effect of Events in Spain—Recognition of American

Equality—Representation in the Spanish Córtes—Delusive Reforms—End of Saravia's
Rule—President José Bustamante—His Despotic Course—Demands in the Córtes—
Constitutional Guarantees—Official Hostility—Campaign in Oajaca—Revolutionary
Movements in Salvador—War in Nicaragua—Conspiracy in Guatemala—Treatment of
the Insurgents—Disrespect to the Diputacion—The Constitution Revoked—Royal
Decrees.

The opening century was pregnant with important events both in

Europe and America. By 1808 affairs in Spain culminated in the
French emperor's detention of the king and other members of the
royal family at Bayonne, where he forced them finally to resign in his
favor their rights to the Spanish crown. The circle surrounding the
captain-general, audiencia, and archbishop of Guatemala was made
up, not only of European Spaniards, but of Guatemalans belonging
to the so-called noble families. Popular displeasure was manifested
both against the Spaniards and against the provincial aristocracy.[I-1]
The oligarchy was hated throughout the province of Guatemala
proper, and still more in the other provinces of the presidency.
However, when the news of Napoleon's usurpation reached
America, it caused a strong revulsion of feeling in Central America,
as well as elsewhere in the Spanish dominions, even among the
large class which had hitherto secretly fostered a warm desire for
independent national existence. Creoles of pure Spanish descent,
though yearning to be free from the old thraldom, could not bring
themselves to discard the country which gave them blood, religion,
and civilization. As to the educated Indians, who were also among
the wishers for independence, like all of their race, they looked up to
the ruling power with reverence and fear. Thus arose a struggle
between the old veneration and the love of freedom; a struggle
which was to last in Central America a few years longer, though the
people were becoming more and more impatient, while leaning to
the side of independent nationality. Circumstances seemed to
demand that the old connection should not be ruptured till 1821,
when decisive results in New Spain brought on the final crisis here.
When the news of Napoleon's acts of violence and usurpations
reached Guatemala, popular loyalty was aroused, and showed itself
in various ways. Manifestations by the authorities, expressive of
fealty to the mother country and the royal family, met with an
apparently hearty response from the people.

Advices came on the 30th of June, 1808, of the occurrences at

Aranjuez of March 19th.[I-2] July passed amid much anxiety about
affairs in Spain, and the public mind became depressed by
unfavorable news received on the 13th of August. Next day, at a
meeting of the authorities,[I-3] the state of affairs was anxiously
discussed. The mariscal de campo, Antonio Gonzalez Mollinedo y
Saravia, had succeeded Dolmas on the 28th of July, 1801, in the
offices of governor, captain-general, and president of the audiencia.
He had seen forty years of service in the royal armies,[I-4] and had
with him his wife, Micaela Colarte, and offspring.[I-5]
President Saravia read to the meeting a
SARAVIA AND
FERNANDO VII.
despatch from the viceroy of Mexico, and a copy
of the Gaceta giving an account of the abdication
of Fernando VII., and of the surrender by other members of the
royal family of their rights to the Spanish crown. After due
consideration, the meeting declared these acts to have resulted from
violence, being therefore illegal and unjust, and not entitled to
recognition. It was further resolved that the authorities and people
should renew their allegiance to the legitimate sovereign, continue
upholding the laws hitherto in force, and maintain unity of action, for
the sake of religion, peace, and good order. Instructions were
received[I-6] to raise the standard of Fernando VII., and swear
allegiance to him, which were duly carried out.[I-7]

The opportunity has now arrived for a radical change in the

political status of Spanish America. The colonies have hitherto had
no government, save that of rulers set over them by a monarch
whose will was absolute, whose edicts constituted their code of
laws; the subject being allowed no voice in public affairs, save
occasionally as a timid petitioner. But troubles beset Spain at this
time. Her king is powerless; the friends of constitutional government
have now the control, and proceed to establish the desired liberal
régime. In order to be consistent, and to some extent satisfy the
aspirations of their fellow-subjects in America, the provisional
government decrees, and the córtes upon assembling confirm, all
the rights claimed for Spaniards dwelling in Spain, together with
representation in the córtes and other national councils.

The Junta Suprema Central Gubernativa in the king's name

declares on the 22d of January, 1809, the Spanish possessions in
America to be, in fact, integral parts of the monarchy,[I-8] and,
approving the report of the council of the Indies of November 21,
1808, in favor of granting to the American dominions representation
near the sovereign, and the privilege of forming by deputies a part
of the aforesaid junta, issues to the president of Guatemala an order
to invite the people of the provinces to choose their deputy to reside
at court as a member of the governing junta.[I-9] On the 3d of
March, 1810, the electors assembled in Guatemala and chose for
deputy the colonel of militia, Manuel José Pavon y Muñoz.[I-10] The
powers given him by his constituents were general, but enjoined
allegiance to the king and permanent connection with the mother
country.[I-11]

The supreme government, early in 1810, in its

DIPUTACION
AMERICANA.
anxiety to be surrounded by the representatives of
the people, hastened the convocation of córtes
extraordinary. Fearing, however, that there might not be a sufficient
number chosen for their timely attendance at the opening of the
session, it apprised the provincial authorities, reiterating the decree
a little later,[I-12] that deficiencies would be temporarily supplied
until regularly elected deputies presented themselves to occupy their
seats in the chamber. Guatemala, in common with the rest of
America, was unable to send her deputies in time, and had to be
represented at the inauguration by suplentes, or proxies. These[I-13]
were Andrés del Llano, a post-captain, and Colonel Manuel del Llano.
One of the first acts of the córtes[I-14] was to confirm the principle
that all the Spanish dominions possessed the same rights, promising
to enact at an early day laws conducive to the welfare of the
American portion, and to fix the number and form of national
representation in both continents.

At the suggestion of the diputacion americana, as the body of

American members was called, a general amnesty for political
offences was decreed, with the expectation of its yielding the best
results in favor of peace and conciliation. Promises of reform, and of
better days for Central America, were held out, but the provincial
government paid little attention to them. Meanwhile a jealous and
restless police constantly watched the movements of suspected
persons. Informers and spies lurked everywhere, seeking for some
one against whom to bring charges.

The promised blessings proved delusive. Instead of reforms, the

people witnessed the installation of a tribunal de fidelidad, with large
powers, for the trial and punishment of suspected persons.[I-15] This
court was short lived, however, being suppressed about the middle
of the following year, under the order of the supreme government,
dated February 20, 1811. And thus Guatemala was kept quiet and
apparently loyal, when the greater part of Spanish America was in
open revolt.

Saravia's rule came to an end on the 14th of March, 1811. He

was promoted to the rank of lieutenant-general, and appointed by
the government at Cádiz to the command in chief of the forces in
Mexico. On his arrival in Oajaca, the viceroy, who was chagrined at
his powers having been thus curtailed, detained him at that place. In
November 1812, the city being captured by the independents,
Saravia was taken prisoner and shot.[I-16]

The successor of Saravia was Lieutenant-

BUSTAMANTE Y
GUERRA.
general José Bustamante y Guerra, appointed by
the supreme council of regency, and soon after
confirmed by the córtes generales extraordinarias. He was a naval
officer, and had made several important cruises in the cause of
science,[I-17] and latterly had been civil and military governor of
Montevideo, a position that he filled efficiently. His zeal against the
independents in that country pointed him out as the one best fitted
to retard the independence of Central America. On his return to
Spain from South America he refused to recognize Joseph
Bonaparte.

Bustamante is represented to have been an inflexible, vigilant,

and reticent ruler. He lost no time in adopting stringent measures to
check insurrections, and displayed much tact in choosing his agents
and spies. No intelligent native of the country was free from
mistrust, slight suspicion too often bringing upon the subject search
of domicile, imprisonment, or exile. He never hesitated to set aside
any lenient measures emanating from the home government in favor
of the suspected, and spared no means that would enable him, at
the expiration of his term, to surrender the country entire and at
peace to his superiors. He was successful, notwithstanding there
were several attempts at secession.

Meanwhile the American representatives had been permitted to

lift their voice in the national councils. They had called attention to
the grievances of their people. In a long memorial of August 1,
1811, to the córtes, they had refuted the oft-repeated charge that
the friends of independence in America were or had been under
Napoleonic influence. They set forth the causes of discontent,[I-18]
which they declared was of long standing, and called for a remedy.
Reference was made to Macanar's memorial to Felipe V.,[I-19]
wherein he stated that the Americans were displeased, not so much
because they were under subjection to Spain, as because they were
debased and enslaved by the men sent out by the crown to fill the
judicial and other offices.[I-20]

The organic code was finally adopted on the 18th of March,

1812.[I-21] The instrument consisted of ten titles, divided into
chapters, in their turn subdivided into sections, and might be
considered in two parts: 1st, general form of government for the
whole nation, namely, a constitutional monarchy; 2d, special plan for
the administration of the Indies.[I-22]

In lieu of the old ayuntamientos, which were

NEW ORGANIC
CODE.
made up of hereditary regidores, whose offices
might be transferred or sold, others were created,
their members to be chosen by electors who had been in their turn
chosen by popular vote. The ayuntamientos were to control the
internal police of their towns, their funds, public instruction within
their respective localities, benevolent establishments, and local
improvements. They were to be under the inspection of a diputacion
provincial, formed of seven members, elected by the above-
mentioned electors, in each province, under the presidency of the
chief civil officer appointed by the king; the chief and the diputacion
were jointly to have the direction of the economical affairs of the
province. No act of either corporation was final till approved by the
national córtes. In America and Asia, however, owing to great
distances, moneys lawfully appropriated might be used with the
assent of the chief civil authority; but a timely report was to be
made to the supreme government for the consideration of the
córtes. Such were the chief wheels in the machinery of provincial
and municipal administration. Now, as to popular rights, equality of
representation in the provinces of the Spanish peninsula, Asia, and
America was fully recognized. The descendants of Africans were
alone deprived of the rights of citizenship. This exclusion was
combated with forcible arguments by many of the American deputies
setting forth the faithful, efficient services colored men had
repeatedly rendered and were still rendering to the nation, and their
fitness for almost every position. Many of them, they said, had
received sacred orders, or had been engaged in other honorable
callings, in which they had made good records; besides which, they
comprised a considerable portion of the useful mining and
agricultural population. Unfortunately for the negro race, the
American deputies were not all of one mind. Larrazábal, from
Guatemala, probably acting both on his own judgment and on the
opinion expressed in 1810 by the real consulado, asserted the black
man's incapacity, advocating that persons of African blood should be
conceded only the privilege of voting at elections. This motion was
supported by a Peruvian deputy. The peninsular members favored
the admission to full rights of colored priests, and all colored men
serving in the royalist armies. The measure was lost, however; but
the article as passed authorized the admission to full political rights,
by special acts of the córtes, of colored men proving themselves
worthy by a remarkably virtuous life, good service to the country,
talents, or industriousness, provided they were born in wedlock, of
fathers who had been born free, married to free-born wives, and
were residents of Spanish possessions, practising some useful
profession and owning property.

Pursuant to the constitution, the córtes ordered, May 23, 1812,

elections for members to the ordinary córtes of 1813.[I-23]

The constitution was received at Guatemala on the 10th of

September, 1812, proclaimed on the 24th, and its support solemnly
sworn to by the authorities and people on the 3d of November, with
great satisfaction and evidences of loyalty. Gold and silver medals
were struck off to commemorate the event.[I-24]

The installation of the córtes took place, with the apparent

approval of Guatemala. The president, members of the audiencia,
and other dignitaries who had thriven under absolutism, looking on
Americans as 'our colonists,' became at once liberals and
constitutionalists, pretending to recognize the wisdom of the national
congress in declaring that the Americans were no longer colonists,
but citizens of one common country. Their manifestation of
September 15, 1812, was followed three days after by one from the
ayuntamiento of Guatemala to Deputy Larrazábal, in the same
strain, suggesting the creation of a board advisory to the córtes, on
the reino de Guatemala legislation.

EXPEDITION TO
OAJACA.
After the fall of Oajaca during the Mexican war
of independence, the patriot chief Morelos
regarded the rear of his military operations as secure. Sympathizing
messages had reached him from men of weight in Guatemala, which
lulled him into the belief that attack need not be apprehended from
this quarter. To Ignacio Rayon he wrote: "Good news from
Guatemala; they have asked for the plan of government, and I'll
send them the requisite information." It was all a mistake. His cause
had friends in Central America, and enemies likewise. Among the
most prominent of the latter were Captain-general Bustamante and
Archbishop Casaus. The ecclesiastic, with a number of Spanish
merchants from Oajaca who had sought refuge in Guatemala,
prompted the general, then anxious to avenge the execution of his
predecessor, to fit out an expedition, invade Oajaca, and harass the
insurgents even at the gates of the city.

About 700 men, mostly raw recruits, were accordingly put in the
field, early in 1813, under the command of Lieutenant-colonel
Dambrini, a man of little ability and unsavory record, and crossed
the line into Tehuantepec. Dambrini could not abandon his money-
making propensities; and having been led to believe he would
encounter but little or no resistance, took along a large quantity of
merchandise for trading. On the 25th of February a small insurgent
force was captured in Niltepec, and Dambrini had its commander,
together with a Dominican priest and twenty-eight others, shot the
next day. This was the usual treatment of prisoners by both
belligerents. But on April 20th the Guatemalans were flanked and
routed at Tonalá by the enemy under Matamoros. Dambrini fled, and
his men dispersed, leaving in the victors' possession their arms,
ammunition, and Dambrini's trading goods. The fugitives were
pursued some distance into Guatemalan territory.[I-25]

Germs of independence, as I have said, were fostered in secret

by the more intelligent, and slowly began to develop, the movement
being hastened by a few enthusiasts who were blind to the
foolhardiness of their attempt. The government tried all means to
keep the people in ignorance of the state of affairs in Mexico and
South America, and when unsuccessful, would represent the royalist
army as victorious. Other more questionable devices were also
resorted to.[I-26]

Undue restraint and ill treatment, as practised under the

stringent policy of Bustamante, soon began to produce effects.
Restiveness and despair seized a portion of the people; the hopes
for a government more consonant with the spirit of the age, which
had been held out from Spain, evaporated. Men were unwilling to
live longer under the heel of despotism; and the more high-spirited
in Salvador and Nicaragua resolved to stake their fortunes upon a
bold stroke for freedom. It was, indeed, a rash step, undertaken
without concert, and almost without resources. It could but end as it
did at every place where a revolutionary movement was initiated.

Matías Delgado and Nicolás Aguilar, curates of San Salvador,

Manuel and Vicente Aguilar, Juan Manuel Rodriguez, and Manuel
José Arce were the first to strike the blow for Central American
independence. Their plan was carried into execution on the 5th of
November, 1811, by the capture of 3,000 new muskets, and upwards
of $200,000 from the royal treasury at San Salvador. They were
supported by a large portion of the people of the city, and in
Metapan, Zacatecoluca, Usulutan, and Chalatenango. But other
places in the province of Salvador, namely, San Miguel, Santa Ana,
San Vicente, and Sonsonate, renewed their pledges of fealty to the
government, declaring the movement for freedom a sacrilege.[I-27]

The promoters of the revolt, which had been started in the

king's name, became disheartened and gave up further effort, and
with the dismissal of the intendente, Antonio Gutierrez Ulloa, and
other officials, peace was soon restored. San Salvador had been
quiet without other government than that of alcaldes during the
disturbance.

Upon the receipt of the news of these

AYCINENA IN
SALVADOR.
occurrences, Bustamante despatched Colonel José
de Aycinena with ample powers to take charge of
the intendencia, and restore quiet. He had been getting troops ready
to send down, but by the mediation of the ayuntamiento of
Guatemala he had suspended preparations, and had adopted the
former course. A member of that body, José María Peinado, was
associated with Aycinena.[I-28] They reached San Salvador on the 3d
of December, amid the acclamations of the fickle populace; their
presence and the exhortations of the missionaries checked all
revolutionary symptoms. The authors of the revolt were leniently
treated under a general amnesty.[I-29] Peinado was a short time
after appointed Aycinena's successor as acting intendente.[I-30]

Another and a still more serious attempt at revolution, which

may be called a sequel to that of Salvador, had its beginning in the
town of Leon, Nicaragua, on the 13th of December, 1811, when the
people deposed the intendente, José Salvador. This action was
seconded on the 22d at Granada, where the inhabitants, at a
meeting in the municipal hall, demanded the retirement of all the
Spanish officials. The insurgents, on the 8th of January, 1812, by a
coup-de-main captured Fort San Cárlos. The officials fled to Masaya.
Villa de Nicaragua—the city of Rivas in later times—and other towns
at once adopted the same course.

Early in 1812, after the first excitement had become somewhat

allayed, a board of government was organized in Leon, the members
of which were Francisco Quiñones, Domingo Galarza, Cármen
Salazar, and Basilio Carrillo. Bishop Fray Nicolás García Jerez was
recognized as gobernador intendente by all the towns, and his
authority was only limited in one point, namely, he was in no way to
favor the deposed officials. The people of Granada resolved to send
two deputies to the board.[I-31]

The royal officials at Masaya having called for

REVOLUTION IN
NICARAGUA.
assistance from Guatemala, Bustamante had
1,000 or more troops placed there under
command of Sargento Mayor Pedro Gutierrez. The people of Leon
had ere this accepted an amnesty from Bishop Jerez, and thereafter
took no part in movements against the crown. Granada, more firm of
purpose, resolved upon defence; caused intrenchments to be built to
guard all avenues leading to the plaza, and mounted thereon twelve
heavy cannon. A royalist force, under José M. Palomar, on the 21st
of April approached Granada to reconnoitre, and reached the
plazuela de Jalteva.[I-32] Early in the morning he opened a brisk fire
on the town, and kept it up for several hours. After a parley, next
day the citizens agreed to surrender, on Gutierrez solemnly pledging
the names of the king and Bustamante, as well as his own, that they
should in no wise be molested. But after the royal troops were
allowed to enter the city on the 28th, Bustamante, ignoring the
solemn guarantees pledged by his subordinate, ordered the arrest
and prosecution of the leaders. The governor accordingly named
Alejandro Carrascosa fiscal to prosecute the conspirators of Granada.
The proceedings occupied two years, at the end of which the fiscal
called for, and the court granted, the confiscation of the estates, in
addition to the penalties awarded to those found guilty. Sixteen of
the prisoners, as heads of the rebellion, were sentenced to be shot,
nine were doomed to the chain-gang for life, and 133 to various
terms of hard labor.[I-33] The sentence of death was not carried out,
however. The condemned were taken to Guatemala, and thence
transported to Spain, where the majority died as exiles. Four others
were removed as convicts to Omoa and Trujillo. The survivors were
finally released by a royal order of June 25, 1817.[I-34]

The conduct of the Leonese in leaving Granada to bear alone the

consequences of the revolution had, as I remarked, a bad effect
upon the country.[I-35] From that time dates a bitter feeling between
Leon and Granada, and between Managua and Masaya on the one
part and Granada on the other.[I-36]

Notwithstanding the existing grievances and the generally

depressed condition of business, the people did not fail to respond
to the calls from the home government upon all parts of the Spanish
dominions for pecuniary aid to meet the enormous expenses of the
war against Napoleon's forces, and other pressing demands. In 1812
there were collected and remitted as donations $43,538. The citizens
of San Salvador also agreed to give $12,000 for 1812, and an equal
sum in 1813, if they could obtain a certain reform for the benefit of
indigo-planters.[I-37]

FANATICISM.
We have seen how the first steps toward
independence failed. Nor could any other result have been expected
from the degraded condition, socially and intellectually, of the
masses. The people were controlled by fanaticism, in abject
submission to king and clergy. Absurd doctrines and miracles were
implicitly believed in; and every effort made to draw the ignorant
people out of that slough was in their judgment treason and
sacrilege, a violation of the laws of God, an attempt to rob the king
of his rights; certain to bring on a disruption of social ties, and the
wrath of heaven. The lower orders had been taught that freedom
signified the reign of immorality and crime, while fealty to the
sovereign was held a high virtue. Hence the daily exhibitions of
humble faithfulness, the kneeling before the images of the monarch
and before their bishops, and the more substantial proof of money
gifts to both church and crown.[I-38]

The first efforts on behalf of emancipation were not wholly lost,

as they led to definitive results in the near future. The next attempts
also met with failure, and brought upon their authors the heavy
hand of Bustamante. The first one, in 1813, was known as the
Betlen conspiracy, which derived its name from the convent where
the conspirators usually assembled. Much importance was given to
this affair by the government and the loyalists. The meetings were
presided over by the sub-prior Fray Ramon de la Concepcion, and
were sometimes held in his cell, and at others in the house of
Cayetano Bedoya, under the direction of Tomás Ruiz, an Indian.[I-39]
All were sworn to secrecy, and yet the government suspected the
plot, and arrested some persons who had the weakness to divulge
the plan and the names of their associates.[I-40]

The conspirators, all of whom were men of character and good

standing, soon found themselves in prison, excepting José Francisco
Barrundia, who remained concealed six years, and afterward was
one of the most prominent statesmen of Central America. Major
Antonio del Villar was commissioned fiscal to prosecute the
prisoners. He spared no one in his charges, and managed to bring
into the meshes of the prosecution several persons who were
innocent.[I-41] On the 18th of September, 1814, he asked the
military court for the penalty of death, by garrote, against Ruiz,
Víctor Castrillo, José Francisco Barrundia pro contumacia, and
Joaquin Yúdice, who were hidalgos; and the same penalty, by
hanging, against the sub-prior and ten others who were plebeians.[I-
42] Ten years of hard labor in the chain-gang of the African
possessions, and a life exile from America, were pronounced upon
others against whom no guilt was proved. The prisoners were all set
free, however, in 1819, under a royal order of the 28th of July, 1817.

Among the men regarded as the most

THE PLOT OF
BETLEN.
dangerous, and strongly suspected of being the
real managers of the Betlen plot, was Mateo
Antonio Marure, who had been confined two years in a dungeon for
the part he took in the disturbances of 1811.[I-43] Bustamante
dreaded his presence in Guatemala, and in 1814 despatched him as
a prisoner to the supreme council of regency in Spain, with his
reasons for this measure. After recounting the Betlen affair, and
naming Marure as the real instigator and manager of it, he adds that
the conspirators counted on him as a fearless man to carry it out,
and that his bold language and writings rendered his sojourn in
America a constant menace to Spanish interests.

Another and a worse planned attempt at revolution than the one

of 1811 occurred in Salvador in 1814. The government quelled it,
and the promoters were arrested, Manuel José Arce suffering an
imprisonment of several years.[I-44]

The reader's attention is now called to matters concerning the

capitanía general of Guatemala, which occupied the government
both here and in Europe immediately before King Fernando's coup-
d'état.

Bustamante, evidently hostile to constitutional government, and

loath to suffer readily any curtailment of his quasi-autocratic powers,
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookgate.com

Advanced Database Query Systems Techniques Applications and Technologies 1st Edition Li Yan instant download
No ratings yet
Advanced Database Query Systems Techniques Applications and Technologies 1st Edition Li Yan instant download
77 pages
Advanced Database Query Systems Techniques Applications and Technologies 1st Edition Li Yan instant download
No ratings yet
Advanced Database Query Systems Techniques Applications and Technologies 1st Edition Li Yan instant download
81 pages
Instant Download Advanced Database Query Systems Techniques Applications and Technologies 1st Edition Li Yan PDF All Chapters
100% (4)
Instant Download Advanced Database Query Systems Techniques Applications and Technologies 1st Edition Li Yan PDF All Chapters
81 pages
Advanced Database Query Systems Techniques Applications And Technologies 1st Edition Li Yan pdf download
No ratings yet
Advanced Database Query Systems Techniques Applications And Technologies 1st Edition Li Yan pdf download
76 pages
(Ebook) Advanced Database Query Systems: Techniques, Applications and Technologies by Li Yan ISBN 9781609604752, 160960475X instant download
100% (1)
(Ebook) Advanced Database Query Systems: Techniques, Applications and Technologies by Li Yan ISBN 9781609604752, 160960475X instant download
51 pages
160960475X
No ratings yet
160960475X
411 pages
(Ebook) Advanced Database Query Systems: Techniques, Applications and Technologies by Li Yan ISBN 9781609604752, 160960475X all chapter instant download
100% (1)
(Ebook) Advanced Database Query Systems: Techniques, Applications and Technologies by Li Yan ISBN 9781609604752, 160960475X all chapter instant download
81 pages
Intelligent multimedia databases and information retrieval advancing applications and technologies 1st Edition Li Yan - Download the ebook now for full and detailed access
100% (3)
Intelligent multimedia databases and information retrieval advancing applications and technologies 1st Edition Li Yan - Download the ebook now for full and detailed access
45 pages
Advanced Applications and Structures in Xml Processing Label Streams Semantics Utilization and Data Query Technologies Premier Reference Source 1st Edition Changqing Li - The ebook in PDF/DOCX format is ready for download now
100% (2)
Advanced Applications and Structures in Xml Processing Label Streams Semantics Utilization and Data Query Technologies Premier Reference Source 1st Edition Changqing Li - The ebook in PDF/DOCX format is ready for download now
72 pages
Intelligent multimedia databases and information retrieval advancing applications and technologies 1st Edition Li Yan download
100% (3)
Intelligent multimedia databases and information retrieval advancing applications and technologies 1st Edition Li Yan download
61 pages
Database Modeling for Industrial Data Management Emerging Technologies and Applications Zongmin Ma instant download
100% (1)
Database Modeling for Industrial Data Management Emerging Technologies and Applications Zongmin Ma instant download
67 pages
Intelligent multimedia databases and information retrieval advancing applications and technologies 1st Edition Li Yan instant download
100% (2)
Intelligent multimedia databases and information retrieval advancing applications and technologies 1st Edition Li Yan instant download
65 pages
Database Modeling for Industrial Data Management Emerging Technologies and Applications Zongmin Ma pdf download
No ratings yet
Database Modeling for Industrial Data Management Emerging Technologies and Applications Zongmin Ma pdf download
74 pages
Instant ebooks textbook Encyclopedia of Database Technologies and Applications Laura C. Rivero download all chapters
100% (4)
Instant ebooks textbook Encyclopedia of Database Technologies and Applications Laura C. Rivero download all chapters
81 pages
Get Encyclopedia of Database Technologies and Applications Laura C. Rivero PDF ebook with Full Chapters Now
No ratings yet
Get Encyclopedia of Database Technologies and Applications Laura C. Rivero PDF ebook with Full Chapters Now
91 pages
Selected Readings on Database Technologies and Applications Terry Halpin - Discover the ebook with all chapters in just a few seconds
100% (1)
Selected Readings on Database Technologies and Applications Terry Halpin - Discover the ebook with all chapters in just a few seconds
51 pages
Selected Readings on Database Technologies and Applications Terry Halpin instant download
No ratings yet
Selected Readings on Database Technologies and Applications Terry Halpin instant download
47 pages
(Ebook) Encyclopedia of Database Technologies and Applications by Laura C. Rivero, Jorge H. Doorn, Viviana E. Ferraggine ISBN 9781591405603, 9781591407959, 1591405602, 1591407958 pdf download
100% (1)
(Ebook) Encyclopedia of Database Technologies and Applications by Laura C. Rivero, Jorge H. Doorn, Viviana E. Ferraggine ISBN 9781591405603, 9781591407959, 1591405602, 1591407958 pdf download
58 pages
icse_Computer_Question paper-4
No ratings yet
icse_Computer_Question paper-4
8 pages
Encyclopedia of Database Technologies and Applications Laura C. Rivero instant download
100% (3)
Encyclopedia of Database Technologies and Applications Laura C. Rivero instant download
70 pages
Selected Readings On Database Technologies And Applications Terry Halpin pdf download
No ratings yet
Selected Readings On Database Technologies And Applications Terry Halpin pdf download
89 pages
Computerassisted Query Formulation Alvin Cheung Armando Solarlezama pdf download
No ratings yet
Computerassisted Query Formulation Alvin Cheung Armando Solarlezama pdf download
51 pages
(Ebook) Intelligent multimedia databases and information retrieval: advancing applications and technologies by Li Yan, Zongmin Ma ISBN 9781613501269, 9781613501276, 1613501269, 1613501277 - Read the ebook online or download it for the best experience
No ratings yet
(Ebook) Intelligent multimedia databases and information retrieval: advancing applications and technologies by Li Yan, Zongmin Ma ISBN 9781613501269, 9781613501276, 1613501269, 1613501277 - Read the ebook online or download it for the best experience
56 pages
Database Modeling for Industrial Data Management Emerging Technologies and Applications Zongmin Ma all chapter instant download
100% (4)
Database Modeling for Industrial Data Management Emerging Technologies and Applications Zongmin Ma all chapter instant download
81 pages
Database Modeling for Industrial Data Management Emerging Technologies and Applications Zongmin Ma - Quickly download the ebook to never miss important content
No ratings yet
Database Modeling for Industrial Data Management Emerging Technologies and Applications Zongmin Ma - Quickly download the ebook to never miss important content
73 pages
(Ebook) Database and Data Communication Network Systems, Three-Volume Set: Techniques and Applications by Cornelius T. Leondes ISBN 9780080530284, 9780124438958, 0080530281, 0124438954 All Chapters Instant Download
100% (5)
(Ebook) Database and Data Communication Network Systems, Three-Volume Set: Techniques and Applications by Cornelius T. Leondes ISBN 9780080530284, 9780124438958, 0080530281, 0124438954 All Chapters Instant Download
71 pages
Data Mining In Time Series Databases 1st edition by Mark Last, Abraham Kandel, Horst Bunke ISBN 9812382909 Â 978-9812382900 - The full ebook version is available, download now to explore
No ratings yet
Data Mining In Time Series Databases 1st edition by Mark Last, Abraham Kandel, Horst Bunke ISBN 9812382909 Â 978-9812382900 - The full ebook version is available, download now to explore
53 pages
Next Generation Search Engines Advanced Models for Information Retrieval 1st Edition Christophe Jouis instant download
No ratings yet
Next Generation Search Engines Advanced Models for Information Retrieval 1st Edition Christophe Jouis instant download
58 pages
Encyclopedia Of Database Technologies And Applications Illustrated Edition Laura C Rivero pdf download
No ratings yet
Encyclopedia Of Database Technologies And Applications Illustrated Edition Laura C Rivero pdf download
80 pages
Encyclopedia of Database Technologies and Applications Laura C. Rivero download
100% (1)
Encyclopedia of Database Technologies and Applications Laura C. Rivero download
71 pages
Instant Access to (Ebook) Database Modeling for Industrial Data Management: Emerging Technologies and Applications by Zongmin Ma ISBN 9781591406846, 1591406846 ebook Full Chapters
100% (1)
Instant Access to (Ebook) Database Modeling for Industrial Data Management: Emerging Technologies and Applications by Zongmin Ma ISBN 9781591406846, 1591406846 ebook Full Chapters
71 pages
Selected Readings on Database Technologies and Applications Terry Halpin instant download
100% (1)
Selected Readings on Database Technologies and Applications Terry Halpin instant download
44 pages
42880
No ratings yet
42880
52 pages
Idea - Database Modeling For Industrial Data Management, Emerging Technologies and Applications (2006)
100% (1)
Idea - Database Modeling For Industrial Data Management, Emerging Technologies and Applications (2006)
391 pages
Database Modeling For Industrial Data Management - Emerging Technologies and Applications - (Zongmin - Ma) PDF
100% (1)
Database Modeling For Industrial Data Management - Emerging Technologies and Applications - (Zongmin - Ma) PDF
391 pages
Information and Database Quality Information and Database Quality
No ratings yet
Information and Database Quality Information and Database Quality
239 pages
Data Processing On FPGAs
100% (1)
Data Processing On FPGAs
120 pages
Database System Concepts 6th Edition, (Ebook PDF) download pdf
100% (6)
Database System Concepts 6th Edition, (Ebook PDF) download pdf
51 pages
Database and Data Communication Network Systems Three Volume Set Techniques and Applications 1st Edition Cornelius T. Leondes download
No ratings yet
Database and Data Communication Network Systems Three Volume Set Techniques and Applications 1st Edition Cornelius T. Leondes download
47 pages
Full download Advanced Applications and Structures in Xml Processing Label Streams Semantics Utilization and Data Query Technologies Premier Reference Source 1st Edition Changqing Li pdf docx
No ratings yet
Full download Advanced Applications and Structures in Xml Processing Label Streams Semantics Utilization and Data Query Technologies Premier Reference Source 1st Edition Changqing Li pdf docx
91 pages
Intelligent Systems for Automated Learning and Adaptation Emerging Trends and Applications Premier Refernece Source 1st Edition Raymond Chiong download
100% (1)
Intelligent Systems for Automated Learning and Adaptation Emerging Trends and Applications Premier Refernece Source 1st Edition Raymond Chiong download
66 pages
Complete Database and Data Communication Network Systems Three Volume Set Techniques and Applications 1st Edition Cornelius T. Leondes ebook download PDF & DOCX
100% (3)
Complete Database and Data Communication Network Systems Three Volume Set Techniques and Applications 1st Edition Cornelius T. Leondes ebook download PDF & DOCX
87 pages
Intelligent Systems for Automated Learning and Adaptation Emerging Trends and Applications Premier Refernece Source 1st Edition Raymond Chiong pdf download
No ratings yet
Intelligent Systems for Automated Learning and Adaptation Emerging Trends and Applications Premier Refernece Source 1st Edition Raymond Chiong pdf download
66 pages
Database Modeling For Industrial Data Management Emerging Technologies And Applications Illustrated Edition Zongmin Ma download
No ratings yet
Database Modeling For Industrial Data Management Emerging Technologies And Applications Illustrated Edition Zongmin Ma download
73 pages
Database and Data Communication Network Systems Three Volume Set Techniques and Applications 1st Edition Cornelius T. Leondes download
No ratings yet
Database and Data Communication Network Systems Three Volume Set Techniques and Applications 1st Edition Cornelius T. Leondes download
42 pages
Redbook Readings Only in DB Systems 5th Edition
No ratings yet
Redbook Readings Only in DB Systems 5th Edition
3 pages
Visual Analytics and Interactive Technologies Data Text and Web Mining Applications Premier Reference Source 1st Edition Qingyu Zhang - The ebook is available for instant download, no waiting required
No ratings yet
Visual Analytics and Interactive Technologies Data Text and Web Mining Applications Premier Reference Source 1st Edition Qingyu Zhang - The ebook is available for instant download, no waiting required
48 pages
Advanced Database Indexing
No ratings yet
Advanced Database Indexing
17 pages
(Ebook) Database Modeling for Industrial Data Management: Emerging Technologies and Applications by Zongmin Ma ISBN 9781591406846, 1591406846 download
No ratings yet
(Ebook) Database Modeling for Industrial Data Management: Emerging Technologies and Applications by Zongmin Ma ISBN 9781591406846, 1591406846 download
50 pages
Artificial Intelligence for Maximizing Content Based Image Retrieval 1st Edition Zongmin Ma 2024 scribd download
100% (1)
Artificial Intelligence for Maximizing Content Based Image Retrieval 1st Edition Zongmin Ma 2024 scribd download
67 pages
Database Tuning Principles Experiments and Troubleshooting Techniques 1st Edition Dennis Shasha pdf download
100% (2)
Database Tuning Principles Experiments and Troubleshooting Techniques 1st Edition Dennis Shasha pdf download
57 pages
[Ebooks PDF] download Intelligent Information and Database Systems: Recent Developments Maciej Huk full chapters
100% (2)
[Ebooks PDF] download Intelligent Information and Database Systems: Recent Developments Maciej Huk full chapters
62 pages
Computational Modeling and Simulation of Intellect 1st Edition by Boris Igelnik ISBN 1609605527 9781609605520 - Own the complete ebook with all chapters in PDF format
100% (7)
Computational Modeling and Simulation of Intellect 1st Edition by Boris Igelnik ISBN 1609605527 9781609605520 - Own the complete ebook with all chapters in PDF format
84 pages
Part One Relational Databases
No ratings yet
Part One Relational Databases
9 pages
Visual Analytics and Interactive Technologies Data Text and Web Mining Applications Premier Reference Source 1st Edition Qingyu Zhang download
100% (2)
Visual Analytics and Interactive Technologies Data Text and Web Mining Applications Premier Reference Source 1st Edition Qingyu Zhang download
61 pages
Advanced Database Technology and Design 1st Edition by Mario Piattini, Oscar Diaz ISBN 0890063958 9780890063958 download
100% (2)
Advanced Database Technology and Design 1st Edition by Mario Piattini, Oscar Diaz ISBN 0890063958 9780890063958 download
45 pages
Databases Information Systems And Peertopeer Computing Second International Workshop Dbisp2p 2004 Toronto Canada August 2930 2004 Revised Selected Papers 1st Edition Bo Xu pdf download
No ratings yet
Databases Information Systems And Peertopeer Computing Second International Workshop Dbisp2p 2004 Toronto Canada August 2930 2004 Revised Selected Papers 1st Edition Bo Xu pdf download
85 pages
ElektroModulatorDva 1 1 1
No ratings yet
ElektroModulatorDva 1 1 1
31 pages
Data Mining Concepts and Techniques 2nd Edition by Jiawei Han, Jian Pei, Micheline Kamber ISBN 1558609016 9781558609013 instant download
100% (4)
Data Mining Concepts and Techniques 2nd Edition by Jiawei Han, Jian Pei, Micheline Kamber ISBN 1558609016 9781558609013 instant download
52 pages
Distributed Storage Networks: Architecture, Protocols and Management
From Everand
Distributed Storage Networks: Architecture, Protocols and Management
Thomas C. Jepsen
No ratings yet
Careers with a conscience how to make corporate social responsibility part of your job 1ST Edition Elaine Appleton Grant pdf download
100% (2)
Careers with a conscience how to make corporate social responsibility part of your job 1ST Edition Elaine Appleton Grant pdf download
70 pages
All One Wicca A Study in The Universal Eclectic Tradition of Wicca Revised and Expanded Kaatryn Macmorgan pdf download
100% (1)
All One Wicca A Study in The Universal Eclectic Tradition of Wicca Revised and Expanded Kaatryn Macmorgan pdf download
81 pages
Drawing France Joel E. Vessels download
100% (1)
Drawing France Joel E. Vessels download
84 pages
Short Textbook of Surgery With Focus on Clinical Skills 1st Edition Himansu Roy download
100% (1)
Short Textbook of Surgery With Focus on Clinical Skills 1st Edition Himansu Roy download
14 pages
Instant Japanese 2nd Edition Boye Lafayette De Mente instant download
100% (1)
Instant Japanese 2nd Edition Boye Lafayette De Mente instant download
77 pages
Speech and Language Processing. Daniel Jurafsky James H. Martin
No ratings yet
Speech and Language Processing. Daniel Jurafsky James H. Martin
25 pages
SNSW Unit - 2
No ratings yet
SNSW Unit - 2
23 pages
COMPSCI 752 - 2023 Semester One - Big Data Management
No ratings yet
COMPSCI 752 - 2023 Semester One - Big Data Management
6 pages
Cse Btech IV Yr Vii Sem Scheme Syllabus July 2022
No ratings yet
Cse Btech IV Yr Vii Sem Scheme Syllabus July 2022
25 pages
Jena API - Introduction
No ratings yet
Jena API - Introduction
14 pages
Ontologies with Python: Programming OWL 2.0 Ontologies with Python and Owlready2 1st Edition Lamy Jean-Baptiste All Chapters Instant Download
No ratings yet
Ontologies with Python: Programming OWL 2.0 Ontologies with Python and Owlready2 1st Edition Lamy Jean-Baptiste All Chapters Instant Download
40 pages
Jena - A Java API For RDF
No ratings yet
Jena - A Java API For RDF
19 pages
XML For Cim Model Exchange: A. Devos, Member Ieee S.E. Widergren, Sr. Member, Ieee J. Zhu, Member Ieee
No ratings yet
XML For Cim Model Exchange: A. Devos, Member Ieee S.E. Widergren, Sr. Member, Ieee J. Zhu, Member Ieee
7 pages
38 Building A Paper Report With Ref Cursors
No ratings yet
38 Building A Paper Report With Ref Cursors
23 pages
Etch
No ratings yet
Etch
2 pages
Describing Web Resources in RDF
No ratings yet
Describing Web Resources in RDF
120 pages
Knowledge Graphs: Data in Context for Responsive Businesses Jesus Barrasa - Get the ebook instantly with just one click
100% (1)
Knowledge Graphs: Data in Context for Responsive Businesses Jesus Barrasa - Get the ebook instantly with just one click
66 pages
Semantic Web Services Theory Tools and Applications Jorge Cardoso pdf download
No ratings yet
Semantic Web Services Theory Tools and Applications Jorge Cardoso pdf download
81 pages
Joseph Shea Resume
No ratings yet
Joseph Shea Resume
4 pages
Getting Started With LINQ To RDF
No ratings yet
Getting Started With LINQ To RDF
15 pages
Ch1. knowledge graph
No ratings yet
Ch1. knowledge graph
66 pages
Introduction To Semantic Web
No ratings yet
Introduction To Semantic Web
27 pages
SWSN Ans
No ratings yet
SWSN Ans
10 pages
Semantic Technologies For The Internet of Things
No ratings yet
Semantic Technologies For The Internet of Things
96 pages
Bb-Mastering Structured Data On The Semantic Web
No ratings yet
Bb-Mastering Structured Data On The Semantic Web
244 pages
Resource Description Framework (RDF)
No ratings yet
Resource Description Framework (RDF)
34 pages
Semantic Web (RDF)
No ratings yet
Semantic Web (RDF)
14 pages
Advance Database Chap One
No ratings yet
Advance Database Chap One
53 pages
Web Intelligence and Artificial Intelligence in Education: Vladan Devedžić
No ratings yet
Web Intelligence and Artificial Intelligence in Education: Vladan Devedžić
12 pages
Floor Plan (12-26F)
No ratings yet
Floor Plan (12-26F)
2,504 pages
Carstairs-Mccarthy'S Morphological Rules of English Language in RDFCFL Graphs
No ratings yet
Carstairs-Mccarthy'S Morphological Rules of English Language in RDFCFL Graphs
7 pages
CIM Standards Overview CIM U Windsor Part 1
No ratings yet
CIM Standards Overview CIM U Windsor Part 1
56 pages
Data Warehousing And Knowledge Discovery Third International Conference Dawak 2001 Munich Germany September 57 2001 Proceedings 1st Edition Larry Kerschberg Auth pdf download
100% (3)
Data Warehousing And Knowledge Discovery Third International Conference Dawak 2001 Munich Germany September 57 2001 Proceedings 1st Edition Larry Kerschberg Auth pdf download
80 pages
IV CSM SW II- mid objective paper MAY- 2024
No ratings yet
IV CSM SW II- mid objective paper MAY- 2024
5 pages
CP5074 - SNA Unit III Notes
No ratings yet
CP5074 - SNA Unit III Notes
27 pages