Complete Download Intelligent Web Data Management Software Architectures and Emerging Technologies 1st Edition Kun Ma PDF All Chapters
Complete Download Intelligent Web Data Management Software Architectures and Emerging Technologies 1st Edition Kun Ma PDF All Chapters
com
https://textbookfull.com/product/intelligent-web-data-
management-software-architectures-and-emerging-
technologies-1st-edition-kun-ma/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/big-data-emerging-technologies-and-
intelligence-national-security-disrupted-1st-edition-miah-hammond-
errey/
textboxfull.com
https://textbookfull.com/product/textbook-on-scar-management-state-of-
the-art-management-and-emerging-technologies-luc-teot/
textboxfull.com
https://textbookfull.com/product/campus-network-architectures-and-
technologies-1st-edition-ningguo-shen/
textboxfull.com
https://textbookfull.com/product/big-data-analytics-for-intelligent-
healthcare-management-1st-edition-nilanjan-dey/
textboxfull.com
https://textbookfull.com/product/pragmatic-evaluation-of-software-
architectures-1st-edition-jens-knodel/
textboxfull.com
Kun Ma
Ajith Abraham
Bo Yang
Runyuan Sun
Volume 643
Series editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
e-mail: [email protected]
About this Series
The series “Studies in Computational Intelligence” (SCI) publishes new develop-
ments and advances in the various areas of computational intelligence—quickly and
with a high quality. The intent is to cover the theory, applications, and design
methods of computational intelligence, as embedded in the fields of engineering,
computer science, physics and life sciences, as well as the methodologies behind
them. The series contains monographs, lecture notes and edited volumes in
computational intelligence spanning the areas of neural networks, connectionist
systems, genetic algorithms, evolutionary computation, artificial intelligence,
cellular automata, self-organizing systems, soft computing, fuzzy systems, and
hybrid intelligent systems. Of particular value to both the contributors and the
readership are the short publication timeframe and the worldwide distribution,
which enable both wide and rapid dissemination of research output.
123
Kun Ma Bo Yang
Shandong Provincial Key Laboratory Shandong Provincial Key Laboratory
of Network Based Intelligent Computing, of Network Based Intelligent Computing,
School of Information Science School of Information Science
and Engineering and Engineering
University of Jinan University of Jinan
Jinan, Shandong Jinan, Shandong
China China
The goal of this book is to present the methods of intelligent Web data manage-
ment, including novel software architectures and emerging technologies and then
validate this architecture using experimental data and real-world applications.
Furthermore, the extensibility mechanisms are discussed. This book is organized to
blend in with the research findings of the author in the past few years.
The contents of this book are focused on four popular thematic categories of
intelligent Web data management: cloud computing, social networking, monitoring
and literature management. There are a number of applications in these areas, but
there is a lack of mature software architecture. Having participated in more than 20
software projects in the past 10 years, we have some interesting experience to share
with readers. Therefore, this book attempts to introduce some new intelligent Web
data management methods, including software architectures and emerging tech-
nologies. The book is organized into four parts as detailed below.
Part I introduces intelligent Web data management in the area of cloud computing.
This part emphasizes some software architectures of cloud computing.
Chapter 1 deals with intelligent Web data management of multi-tenant data
middleware. This chapter introduces intelligent Web data management of a trans-
parent data middleware to support multi-tenancy. This approach is transparent to
the developers of cloud applications.
Chapter 2 presents intelligent Web data management of NoSQL data warehouse.
This chapter introduces intelligent Web data management of NoSQL data ware-
house, which is used to address the issue of formulating no redundant data ware-
house with small amount of storage space for the purpose of their composition in a
way that utilizes the MapReduce framework. The experiments are illustrated to
successfully build the NoSQL data warehouse reducing data redundancy compared
with document with timestamp and lifecycle tag solutions.
v
vi Preface
Part II of this book introduces intelligent Web data management in the area of social
networking. This part emphasizes some software architectures for social
networking.
Chapter 3 presents intelligent Web data management of social question
answering. This chapter introduces intelligent Web data management of a question
answering system, which aims at improving the success ratio of the question
answering process with a multi-tenant architecture.
Chapter 4 deals with intelligent Web data management of content syndication
and recommendation. This chapter introduces intelligent Web data management of
a content syndication and recommendation system. The experimental result depicts
that the developed architecture speeds up the search and synchronization process,
and provides friendly user experience.
Part III of this book introduces intelligent Web data management in the area of
monitoring. This part emphasizes some software architectures for intelligent
monitoring.
Chapter 5 presents intelligent Web data management infrastructure and software
monitoring. This chapter introduces intelligent Web data management of a light-
weight module-centralized and aspect-oriented monitoring system. This framework
performs end-to-end measurements at infrastructure and software in the cloud. It
monitors the quality of service (QoS) parameters of the Infrastructure as a Service
(IaaS) and Software as a Service (SaaS) layer in the form of plug-in bundles. The
experiments provide insight into the modules of cloud monitoring. All the modules
constitute the entire proposed framework to improve the performance in hybrid
clouds.
Chapter 6 deals with intelligent Web data management of WebSocket-based
real-time monitoring. This chapter introduces intelligent Web data management of a
WebSocket-based real-time monitoring system for remote intelligent buildings. The
monitoring experimental results show that the average latency time of the devel-
oped WebSocket monitoring is generally lower than polling, FlashSocket and
Socket solution, and the storage experimental results show that our storage model
has low redundancy rate, storage space and latency.
Preface vii
Part IV of this book introduces intelligent Web data management in the area of
literature management. This part emphasizes some software architectures of liter-
ature management.
Chapter 7 illustrates intelligent Web data management for literature validation.
This chapter introduces intelligent Web data management of a literature validation
system, which aims at validating the literature by the author name from the
third-party integrated system and the metadata from the DOI content negotiation
proxy. The analysis of application’s effect shows the ability to verify the authen-
ticity of the literature by the author name from the system and the metadata from
our DOI content negotiation proxy.
Chapter 8 presents intelligent Web data management for literature sharing. This
chapter introduces intelligent Web data management of a bookmarklet-triggered
unified literature sharing system. This architecture allows easy manipulation of the
literature sharing and academic exchange, which are used frequently and are very
often necessary in scientific activity such as research, writing chapters and disser-
tations, and preparing reports.
This book is written primarily for academic researchers who are interested in
intelligent Web data management of some emerging software systems, or software
architects who are interested in developing intelligent software architecture in the
aspect of Web data management. However, it was also written keeping in mind the
postgraduates who are studying Web data management. We assume basic famil-
iarity with the concepts of Web data management, but also provide pointers to
sources of information to fill in the background.
Many people have collaborated to shape the technical contents of this book. Our
thanks to our colleagues for the wonderful feedback, which helped us to enhance
the quality of the manuscript. We also thank the Springer Series on Studies on
Computational Intelligence Editorial Team: Prof. Dr. Janusz Kacprzyk, Dr. Thomas
Ditzinger and Mr. Holger Schaepe for the wonderful support to publish this book
very quickly.
We hope the readers will enjoy the contents and we await for further feedback to
further improve the work.
Kun Ma
Ajith Abraham
Bo Yang
Runyuan Sun
Contents
ix
x Contents
1.1 Introduction
1.1.1 Background
There are some challenges of current multi-tenant data techniques. First, how to
make the data middleware transparent to the developers is more challenging. That is
to say that the legacy application is assured to migrate to the multitenant one with
minimum modification of the source codes. Second, how to minimize the cost and
impact of the database performance is also challenging.
To address these challenges, we introduce the architecture of a transparent data
middleware to support multi-tenancy. The architecture of this data middleware is
discussed in detail. The contributions of this data middleware are several folds.
First, the data middleware is transparent to the developers. It is easy to make the
legacy application to support multi-tenancy without re-architecting the entire sys-
tem from the ground up. Second, some auxiliary optimized measures of the
architecture are added to make this data middleware more extensive and scalable.
This Section introduces the related work and techniques on multi-tenant data
middleware.
Broadly speaking, SaaS application maturity can be expressed using a model with
four distinct levels [5]. Each level is distinguished from the previous one by the
addition of scalability, multi-tenancy, and configuration. Figure 1.1 shows the SaaS
maturity model.
Level 1: Ad Hoc/Custom
The first level of maturity is similar to the traditional application service provider
(ASP) model of software delivery, dating back to the 1990s. At this level, each
tenant has its own customized version of the hosted application, and runs its own
instance of the application on the host’s servers. Architecturally, software at this
maturity level is very similar to traditionally-sold line-of-business software.
1.2 Related Work and Emerging Techniques 5
The distinction between shared data and isolated data is not binary. Instead, it is
more of a continuum, with many variations that are possible between the two
extremes. Therefore, there are mainly three SaaS data models from the balance
between isolation and sharing [6–8]. Figure 1.2 shows the current SaaS data
models.
Model A: Separate application and separate database
Separate application and separate database uses different separate applications and
databases for each tenant, which is the simplest approach to data model.
Unfortunately, this approach tends to lead to higher costs for maintaining equipment
and backing up tenant data. The number of tenants that can be housed on a given
database server is limited by the number of databases that the server can support.
Model B: Shared application and separate database
In this model, computing resources and application code are generally shared
between all the tenants on a server, but each tenant has its own set of data that
remains logically isolated from data that belongs to all other tenants. Metadata
1.2 Related Work and Emerging Techniques 7
Isolated Shared
application application
associates each database with the correct tenant, and database security prevents any
tenant from accidentally or maliciously accessing other tenants’ data.
Giving each tenant its own database makes it easy to extend the application’s
data model to meet tenants’ individual needs, and restoring a tenant’s data from
backups in the event of a failure is a relatively simple procedure. Unfortunately, this
approach tends to lead to higher costs for maintaining equipment and backing up
tenant data. Hardware costs are also higher than they are under alternative
approaches, as the number of tenants that can be housed on a given database server
is limited by the number of databases that the server can support.
Model C: Shared application and shared database
From the aspect of fine-grained partition of shared data model, there are two shared
SaaS data models: separate schema and shared schema.
Model C 1: Shared database and separate schema
This data model involves housing multiple tenants in the same database, with each
tenant having its own set of tables that are grouped into a schema created
8 1 Intelligent Web Data Management …
Tenant 1 Tenant 2
Tenant 3
specifically for the tenant. Figure 1.3 shows this data model. The provisioning
database creates a discrete set of tables for the tenant and associates it with the
tenant’s own schema. Although the tenants’ data are in the same database, but with
a discrete set of tables, views, stored procedure and triggers. Like the isolated
approach, the separate schema approach is relatively easy to implement. This
approach offers a moderate degree of logical data isolation for security-conscious
tenants, though not as much as a completely isolated system would. It can support a
larger number of tenants per database server. A significant drawback of the separate
schema approach is that tenant data is harder to restore in the event of a failure. If
each tenant has its own database, restoring a single tenant’s data means simply
restoring the database from the most recent backup. With a separate schema
application, restoring the entire database would mean overwriting the data of every
tenant on the same database with backup data, regardless of whether each one has
experienced any loss or not. Therefore, to restore a single customer’s data, the
database administrator may have to restore the database to a temporary server, and
then import the customer’s tables into the production server.
Model C 2: Shared database and shared schema
A second approach involves using the same database and the same schema that is
composed of a set of tables to host multiple tenants’ data. Figure 1.4 shows this data
model. A given table can include records from multiple tenants stored in any order.
Therefore, a tenant ID column is added to associates every record with the
appropriate tenant.
Of the two approaches explained here, the shared schema approach has the
lowest hardware and backup costs, because it allows you to serve the largest
number of tenants per database server. However, because multiple tenants share the
1.2 Related Work and Emerging Techniques 9
3 2
3
same database tables, this approach may incur additional development effort in the
area of security to ensure that tenants can never access other tenants’ data, even in
the event of unexpected bugs or attacks. In this context, a multi-tenant data mid-
dleware is well designed to optimize and minimize the development work to the
utmost. That is the motivation of the proposed multi-tenant data middleware.
1.3 Requirements
1.4 Architecture
SQL interceptor is used to intercept the SQLs that are transmitted to SQL parser. An
simple implementation of SQL interceptor is using JDBC proxy, which captures all
the SQLs in the database driver layer.
SQL parser is used to parse the fine-grained predicates of SQL statements, such as
select predicates, aggregation predicates, where predicates, order predicates, and
group predicates.
1.4 Architecture 11
Data Request
SQL
Interceptor
Sql
SQL Parser
SQL Restorer
Sql
SQL Router
Hit or miss
cache
hit
Data Node read miss
Master read
replication
Slave
SQL restorer is used to restore the new SQL to the physical sharing data. The new
SQL is reorganized from the original SQL predicates and tenantID discriminator.
The restoring process is denoted as a mapping: sql(u)->pre(TenantID) [ T(sql(u),
TenantID) [ post(TenantID), where pre(TenantID) is the pre-personalized
12 1 Intelligent Web Data Management …
SQL router sends the reorganized SQL requests to the data node or the cache. The
cache is deployed to accelerate the read process. If the data of one column of the
query hit the cache, they are obtained from the cache. If the data of one column of
the query miss the cache, they are obtained from the master/slave data nodes.
Read/write splitting techniques are applied to improve the scalability and perfor-
mance of the database. The basic concept is that a master data node handles the
transactional operations, and slaves handle the non-transactional queries. The
identification of transaction depends on the parse of SQLs.
The master/slave nodes are applied in the architecture. Replication enables data
from the master to be replicated to one or more slaves. Replication is based on the
master server keeping track of all changes to its databases in its binary log. The
binary log serves as a written record of all events that modify database structure or
data from the moment the server was started. The before and after images are both
recorded in the binary log with low impact on the performance of the database.
Each slave that connects to the master requests a copy of the binary log. That is, it
pulls the data from the master, rather than the master pushing the data to the slave.
The slave also executes the events from the binary log that it receives. This has the
effect of repeating the original changes just as they were made on the master. Tables
1.4 Architecture 13
column
Data node Cache
column column
Log-based listener
are created or their structure modified, and data is inserted, deleted, and updated
according to the changes that were originally made on the master.
1.4.6 Cache
The architecture of the cache is shown in Fig. 1.7, which is a part of the data
middleware. In our solution, we adopt the cache to optimize the architecture of the
data middleware.
Step 1: Log-based replication from the data node to the cache
The changes of the data node are parsed from the binary log store. This process is
called log-based replication in the presence of updates. For a new type of the data
node, the only component that needs to change in this architecture is the concrete
parser of the binary log. In the case of the insert transactional operation of the data
node, it will insert the data of this column into the cache when this column exists in
the cache. In the case of the delete transactional operation of the data node, it will
delete the corresponding cache when this column exists in the cache. In the case of
the update transactional operation of the data node, it will rectify the data in this
column in the existence of the cache.
Step 2: Cache listener
One purpose of the access counter is designed to observe the usage of the column in
the cache. If the current column access frequency rate is smaller than the average
column access frequency rate, the column access frequency count need to be
abated. If the subtraction from the current column access frequency rate to the
average one descends a negative threshold, we should remove the data in this
column in the cache.
Step 3: Cache replacement strategies
If the data in this column of the query hit the cache, it returns the results from the
column oriented NoSQL cache. On the contrary, if the data in this column of the
query miss the cache, it returns the results from the original data node. If the
subtraction from the current column access frequency rate to the average one
exceeds a threshold, the data of this column in the data node need to be dynamically
translated into the cache. In the case of the hit or miss of the cache, the column
access frequency count needs the rectification.
14 1 Intelligent Web Data Management …
1.5 Evaluation
We evaluate the proposed multi-tenant data middleware using cost analysis [10].
The goal is to find the cost-minimal solution for the considered multi-tenant
application. Different reengineering measures of varying complexity are necessary
for fulfilling this requirement.
This cost of different multi-tenant data models is mainly composed of two major
aspects: initial reengineering cost and monthly ongoing cost. The breakeven point
of the data model is calculated as:
X
TimetoBreakEven ¼ InitRECosts= MonthlyOngoingCosts
If the service is alive forever, the data model with the lowest incremental
monthly ongoing cost is always the best. However, we rather assume that a service
1.5 Evaluation 15
Cost
Shared
Data Middleware
Seperate
Time elapse
gets replaced sooner or later. Thus, the question is whether the time period is
sufficiently long to justify huge investments (i.e. service usage time > time to break
even).
Figure 1.8 shows the empirical cost of different multi-tenant models. It is
indicated that the higher initial reconstruction cost is reasonable for the sake of the
low monthly ongoing costs. It is noted that these cost functions in reality might not
be linear. Due to the relative complexity of developing a shared architecture,
applications that are optimized for a shared approach tend to require a larger
development effort than applications that are designed using an isolated approach.
The monthly ongoing costs of shared approaches tend to be lower, since they can
support more tenants per server. It is shown that our approach using multi-tenant
data middleware is a transparent and loose coupled solution for SaaS applications.
The cost of our data middleware is between isolated and shared data model at the
beginning, but it falls back close to the shared model in the long term.
1.6 Discussion
1.6.1 Extensibility
The requirements of tenant vary from person to person, although they share the
similar structure of database. The data middleware is designed to support the
personalization of the tenants. There are three approaches to support the person-
alization [9].
An intuitive approach is using single wide table, which is shown in Fig. 1.9.
Single wide table, as its name suggests, stores all the tenant data in the same table
with the maximum number of fields. The data model is simply extended to create a
preset number of custom fields in every table you wish. This approach often causes
the waste of column if the tenant dos not customize this column. This issue is
16 1 Intelligent Web Data Management …
tenantID Key A1 A2 … An
core horizontal part
generally called scheme null issue. In this solution, each tenant that uses at least one
or more custom fields gets a row in the combined table, with null fields representing
available custom fields that the tenant has not used.
Another improved version of single wide table is single wide table with vertical
scalability. This model extracts the personalized data from wide table, and then
describes it using extended vertical metadata. Each row in the extended vertical
metadata is a key/value pair, which is used to store the personalization of tenants to
fulfill the requirements of different tenants. The single wide table with vertical
scalability is shown in Fig. 1.10. In the case that the personalization of tenants is
identical, the extended vertical metadata can be omitted. The advantage of this
approach is that it can reduce the waste of data resources efficiently.
The last approach is multiple wide tables with vertical scalability, which is
shown in Fig. 1.11. In the context of multiple wide tables, tenants’ data are spread
over different single wide tables. That is to say that multiple wide tables with
tenantID Key A1 A2 … An
core horizontal part
tenantID Key B1 B2 … Bm
tenantID Key C1 C2 … Ck
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com